from Hacker News

Ask HN: How do you check a url is valid?

by lookingfj on 10/20/17, 5:23 PM with 7 comments

So it is trivial to check a url is valid using regex, but if you wanted to take this one step further and make sure it is a valid domain name that is registered and actually in use...how would you do this? I have a few ideas for how to achieve this with a Microservice but I feel like others may have solved this problem before and there may be better solutions out there.
  • by pwg on 10/20/17, 5:31 PM

    First, you need to be precise in what you mean by "valid".

    "Valid" can encompass at least these four possibilities:

    1) the url follows the correct syntax for url's;

    2) the url is valid as per #1 and further the "host" portion of the url (when it contains a name) can be resolved to an IP address;

    3) the url is valid as per #2 and further there is a server located at the host (and optional port) value encoded in the URL that responds to requests;

    4) The url is valid as per #3 and further the path and/or query and fragment parts defines a valid path on the server running at the host:port encoded in the url.

    #1 you can do yourself, as it is just a check that the syntax is correct.

    All of numbers 2-4 require some form of 'lookup' occur from some other system in order to verify 'validity'.

  • by lookingfj on 10/20/17, 6:11 PM

    So I think in this instance I would deem valid to be: 1) the url is the correct format 2) the url resolves to an ip address 3) the url is registered and is in use. By this I mean it's not one of the "this domain name is for sale" pages.

    Number 3 is the novel and challenging piece of this.

  • by icebraining on 10/20/17, 5:29 PM

    Check Whois, DNS and make an HTTP request?

    This feels like an XY Problem, though. What are you trying to achieve by checking if the URL is valid?

  • by ultrablue on 10/20/17, 5:26 PM

    curllib will tell you whether there's something there, presuming the network is available.

    In fact, a simple HEAD request will suffice for that.

    That would also prove that the domain is registered, presuming DNS is working.