• (disco) in reply to mott555

    Is it weird that I'm more upset I screwed up one of the examples than that people didn't like the writing overall? :laughing:

  • (disco) in reply to Yamikuronue

    I figure that if nobody is complaining about the writing, then nobody is reading it to begin with.

  • (disco) in reply to mott555
    mott555:
    This feels like a challenge. What can I get past the editors and do to get all the front page commenters worked up? :laughing: :trollface:

    I think I still have the crown for that one.

  • (disco)

    Why don't we talk about something less controversial, like politics, or religion...

  • (disco) in reply to uiron
    uiron:
    hah was not expecting such backlash was it this rude?

    I just thought your description of this website was hilarious.

    uiron:
    I genuinely apologize for the comment then, no excuses.

    Hmmm....I'd say you must be new here, but you really are.

  • (disco) in reply to Jaloopa
    Jaloopa:
    Validating a URL should use the same philosophy as validating an email: check if it exists.

    Fires off a request, if it's not 404* you're good to go

    *probably other codes, too. I dunno, do I look like a web guy?

    This is a terrible idea for several reasons:

    1. Could easily introduce a massive slowdown in code execution (how long do you wait for a response? 5 seconds? 30? What if the server is having issues at the time, or causes a slow, expensive operation to be performed by the target?)
    2. URLs may require authentication, be restricted by IP, not respond to GET requests, only be available on non-standard ports, etc.
    3. A request may cause data to change on the target system.
    4. Has the potential to get your server blocked or blacklisted, as some admins don't appreciate being spammed with automated requests.
    5. Fucks up analytics reports.
    6. Assumes an appropriate response code will be returned (hint - this is often not the case).
    7. A 404 doesn't mean the URL is invalid. Neither does any other response code.
    8. Allows remote exploits to be carried out by your system.

    ...probably a bunch of others....

  • (disco) in reply to monkeyArms

    10. When handed an invalid URL, in 99% of the cases, you're not getting any response code. A response code means there actually is a HTTP server over there, which is a pretty lucky occurrence if you're just going in a random direction.

  • (disco) in reply to Jaloopa

    410 is my favorite reply code, it's supposed to be used to say that the page has deliberately been removed. Not an error, just Gone.

  • (disco) in reply to uiron
    uiron:
    Regexes are irreplaceable in some cases, and far from requiring a cryptic spaghetti to solve the problem.

    I am sorry -- but for non-trivial tasks -- you're better off with a full set of parser combinators; they're far more expressive and readable, especially if they mimic EBNF syntax.

  • (disco) in reply to henke37

    Like useful parts of Discourse!

  • (disco) in reply to henke37
    henke37:
    4108 is my favorite reply code, it's supposed to be used to say that the page has deliberately been removedbrewed. Not an error, just GoneTea.

    FTFY

  • (disco) in reply to Yamikuronue
    Yamikuronue:
    They can't all be winners.

    Sincerely, Jane Bailey

    Keep Calm and Ignore the Front-page Trolls (sorry, can't be arsed to update avatar for this).

  • (disco)
    - www.google.com (no protocol) - http://www.⌘.ws/

    Both of these aren't valid URIs (I'm going to substitute "URI" for "URL", since there's no general mechanism to determine whether a URI is an URL).

    I'm pretty sure the language of URIs is regular, though. Here's a script I wrote a while ago that uses a regular expression derived from the official grammar to match URIs (and relative URI references): https://gist.github.com/SpecLad/4514342.

    And here's the expression itself: http://pastebin.com/SgMi0xRp. It's a bit longer than it needs to be, because I used /x.

  • (disco) in reply to antiquarian

    Honestly, the state of the data setup in my work's demo environment rankles me far more XD

  • (disco) in reply to henke37
    henke37:
    Not an error, just Gone.

    http://i.ytimg.com/vi/-DT7bX-B1Mg/hqdefault.jpg

  • (disco)

    Don't get me started on regex validating of emails.

    After years of people googling the best regex, there are now many sites that won't allow the new TLDs or restricted them to {2,4} between two and four characters.

    [email protected] is not well liked.

  • (disco) in reply to charliemaggot
    charliemaggot:
    Don't get me started on regex validating of emails.

    Too late: http://what.thedailywtf.com/t/til-plus-email-validation-and-people-parts/7613/. I'm not sure where in the conversation email validation comes up; maybe 1/4 of the way.

  • (disco) in reply to charliemaggot

    I want to stick a comment in an email address and see how far that flies in today's world...

    Filed under: but the RFC has syntax for it!

  • (disco) in reply to aliceif
    aliceif:
    .aa doesn't exist, yes. However, xy.i.de would be valid.

    Given that there are far more ways to make it wrong than make it right, and the framework's already done the heavy lifting for you, I've very confused as to why someone wouldn't just create a System.Uri initialized to the string to test it for validity... given that it's going to be done that way later anyway. If you get an exception, you have your answer! If you only support http, then you can test the scheme once you have the Uri.

    Someone had a screwdriver in his hand and still insisted on grabbing the hammer when he saw a screw.

  • (disco) in reply to chubertdev

    I want a teapot that says ERROR 418 on it. I've tried to look for one but all I could find is a t-shirt. There must be someone selling such a thing, surely?

  • (disco) in reply to CarrieVS
    CarrieVS:
    I want a teapot that says ERROR 418 on it. I've tried to look for one but all I could find is a t-shirt. There must be someone selling such a thing, surely?

    It probably wouldn't be hard for a local shop to put that text on a regular teapot.

  • (disco) in reply to chubertdev
    chubertdev:
    It probably wouldn't be hard for a local shop to put that text on a regular teapot.
    It would feel wrong. Like it was cheating or something. Utterly irrational I know, but seeing as I don't especially want a teapot for its own sake (quite happy with tea made in the mug), just for the joke, I want a 'real' one.
  • (disco) in reply to CarrieVS
    CarrieVS:
    I want a teapot that says ERROR 418 on it. I've tried to look for one but all I could find is a t-shirt. There must be someone selling such a thing, surely?

    If so, Google doesn't seem to know about it. I want one, too.

  • (disco) in reply to Spectre

    The second becomes a valid URI after Punycode conversion, which browsers will do automatically.

    Also, I'm pretty sure everyone missed another false match case:

    ftps://example.com/file

    You probably don't have a uri handler for that installed.

  • (disco) in reply to aliceif

    Valid as an abbreviation for an HTTP URL maybe. But not a URL. RFC 3986 says all URLs contain the ':' character.

  • (disco) in reply to Jaloopa
    Jaloopa:
    Validating a URL should use the same philosophy as validating an email: check if it exists.

    Fires off a request, if it's not 404* you're good to go

    That is a bad idea on so many different levels, most of which @monkeyArms covered. But, you can get at least one person to support it. Paging @Rhywden.

  • (disco) in reply to Polygeekery

    Because this is not in any way abusable as a CSRF vector in the same way, say, using a URL minifier is?

    But again, you can still do basic validation before you even do that.

  • (disco)

    When you see it matches htftp://www.google.com , you know the validation is not doing it's work.

    EDIT: Edit to see whether it shows all the time of edit.

  • (disco) in reply to JBert
    JBert:
    Depending on the context, it could be a valid URL (you could buttume it's http like most browsers do nowadays).

    It's never a valid URL regardless of context. You can add http:// to make it a valid URL if that suits your purposes, as most browsers do, but it's not a valid URL until you do that.

    Dragnslcr:
    Why isn't it a valid URL?

    No scheme.

  • (disco) in reply to riking
    riking:
    You probably don't have a uri handler for that installed.
    Oh, but I do. And so does everyone with OS X. Or Gnome. Or Windows 10.
  • (disco) in reply to TwelveBaud

    .. Dang, I was pretty sure it was sftp but it seems that they both (ftps and sftp) exist.

  • (disco) in reply to cheong
    cheong:
    When you see it matches htftp://www.google.com , you know the validation is not doing it's work.

    It doesn't, though. Why would you say it does? It's pretty clear from the regex itself that it doesn't (well, "clear", relative to regexes, anyway) so regex 101 wasn't required but it's a quick way to show it.

  • (disco) in reply to Jarry
    Jarry:
    [BNF][1]
    Weirdly enough, that grammar seems wrong. E.g., http://google.com:80 won't parse, because you can't add a port to a path. Anyway, if anyone is interested, that grammar is equivalent this regular expression:
    ^[a-zA-Z]([a-zA-Z0-9$_@.&!*"'(),-]|%[0-9a-fA-F][0-9a-fA-F])+?://(|(([a-zA-Z0-9$_@.&!*"'(),-]|%[0-9a-fA-F][0-9a-fA-F])|\+)+)(/(|(([a-zA-Z0-9$_@.&!*"'(),-]|%[0-9a-fA-F][0-9a-fA-F])|\+)+))*(\?([a-zA-Z0-9$_@.&!*"'(),-]|%[0-9a-fA-F][0-9a-fA-F])+(\+([a-zA-Z0-9$_@.&!*"'(),-]|%[0-9a-fA-F][0-9a-fA-F])+)*)?$
    
  • (disco) in reply to cheong
    cheong:
    When you see it matches htftp://www.google.com , you know the validation is not doing it's work.

    It's a valid URL. You probably don't have a handler for the htftp scheme though.

  • (disco) in reply to FroshKiller
    FroshKiller:
    Maybe I'm using a URL validator to make sure that the new URL someone entered for a new deployment is a valid URL. There is a difference between an invalid URL and a valid URL that just doesn't currently resolve to anything

    I think you may have missed the joke, or maybe the emoji aren't showing on your browser or something....


    chubertdev:
    Why don't we talk about something less controversial, like politics, or religion...

    Because @lucas buggered off?


    tarunik:
    I want to stick a comment in an email address and see how far that flies in today's world...

    Is the comment about @ signs?

  • (disco) in reply to CarrieVS
    CarrieVS:
    There's no + after the alphanumeric character class. I'm not the most fluent regex-mangler in the world and I think this is Java which I've not been near for a while, so correct me if I'm wrong but I believe it's precisely one alphanumeric character followed by (any character whatsoever, but they probably meant a dot).

    Of course since the . wasn't escaped ww would match, and the remaining w. would match the next part, so the special case is still redundant.

    Good Lord, you're right. I completely missed the fact that the + was missing. Or my head filled it in because it didn't seem right that it wasn't there.(1)

    (1) This reminds me of something I saw many, many years ago (some time like 1984), in upstate New York. At that time NY allowed custom license plates with up to eight letters/digits, and a guy obviously worked nights a lot - he had NITESHFT as his plate, and you had to look twice to see that SHFT wasn't SHIFT. (It's easier to see on a computer screen than it was on the plate itself.)

  • (disco) in reply to Steve_The_Cynic
    Steve_The_Cynic:
    he had NITESHFT as his plate, and you had to look twice to see that SHFT wasn't SHIFT. (It's easier to see on a computer screen than it was on the plate itself.)
    [image]

    (source)

  • (disco) in reply to PJH

    More or less, although I remember it as looking denser than that (which helped the illusion).

  • (disco) in reply to Polygeekery
    Polygeekery:
    That is a bad idea on so many different levels, most of which @monkeyArms covered. But, you can get at least one person to support it. Paging @Rhywden.

    Why are you parsing a URL if it is not going to be used? I take on board everything that has been said by monkeyArms, but why would I want to store a list of URLs some of which don't point to anything? The only reason for the list is that at some point someone is going to try to use it. Of course there are many, many problems with physically testing URLs, such as getting bizarre pages back from wireless access points and the like, but just storing untested but semantically valid URLs has its own gotchas - for instance if an attacker was trying to load a list of URLs which were intended for later registration to facilitate malware distribution.

  • (disco) in reply to riking
    riking:
    .. Dang, I was pretty sure it was sftp but it seems that they both (ftps and sftp) exist

    two entirely different protocols. (never use ftps if you can avoid it. sftp is far more secure)

  • (disco) in reply to Hanzo
    Hanzo:
    Weirdly enough

    it's W3. what did you expect?

  • (disco) in reply to Jarry
    Jarry:
    it's W3. what did you expect?

    Excessive weird?

  • (disco) in reply to dkf

    to begin with.

  • (disco)

    http://www=www:99999:99999:99999 is a valid URL? Now that's new to me....

  • (disco) in reply to mzedeler

    Here are a few valid ones:

    • aaa://host.example.com:1813;transport=udp;protocol=radius
    • ymsgr:sendIM?ElParrotMuerto
    • unreal://192.168.1.0:27831/
    • spotify:The+Beatles
    • skype:001555900200
  • (disco) in reply to Hanzo

    Format of a valid URL:

    some stuff followed by some other stuff.

  • (disco) in reply to tar

    Make sure your stuff doesn't contain spaces, though. They are dangerous. If we allow spaces in URLs, the terrorists will have won. And it's bad for the children and the polar bears.

  • (disco) in reply to Hanzo

    ##Dr .%20 or: how I learned to stop worrying and love having yet another method to encode problematic characters.

  • (disco) in reply to mzedeler
    mzedeler:
    http://www=www:99999:99999:99999 is a valid URL? Now that's new to me....

    That's not valid, because though it satisfies the generic URL interpretation, it fails the HTTP-specific interpretation. Each scheme interprets the scheme-specific part in its own way.

    If you'd written “foobar://www=www:99999:99999:99999”, you'd probably have been OK. :D

Leave a comment on “How to Validate a URL”

Log In or post as a guest

Replying to comment #:

« Return to Article