The Daily WTF: Curious Perversions in Information Technology

Yamikuronue · 2015-02-05 Reply Admin

Is it weird that I'm more upset I screwed up one of the examples than that people didn't like the writing overall? :laughing:

mott555 · 2015-02-05 Reply Admin

I figure that if nobody is complaining about the writing, then nobody is reading it to begin with.

chubertdev · 2015-02-05 Reply Admin

mott555:
This feels like a challenge. What can I get past the editors and do to get all the front page commenters worked up? :laughing: :trollface:

I think I still have the crown for that one.

chubertdev · 2015-02-05 Reply Admin

Why don't we talk about something less controversial, like politics, or religion...

boomzilla · 2015-02-05 Reply Admin

uiron:
hah was not expecting such backlash was it this rude?

I just thought your description of this website was hilarious.

uiron:
I genuinely apologize for the comment then, no excuses.

Hmmm....I'd say you must be new here, but you really are.

monkeyArms · 2015-02-05 Reply Admin

Jaloopa:
Validating a URL should use the same philosophy as validating an email: check if it exists.
Fires off a request, if it's not 404* you're good to go

*probably other codes, too. I dunno, do I look like a web guy?

This is a terrible idea for several reasons:

Could easily introduce a massive slowdown in code execution (how long do you wait for a response? 5 seconds? 30? What if the server is having issues at the time, or causes a slow, expensive operation to be performed by the target?)
URLs may require authentication, be restricted by IP, not respond to GET requests, only be available on non-standard ports, etc.
A request may cause data to change on the target system.
Has the potential to get your server blocked or blacklisted, as some admins don't appreciate being spammed with automated requests.
Fucks up analytics reports.
Assumes an appropriate response code will be returned (hint - this is often not the case).
A 404 doesn't mean the URL is invalid. Neither does any other response code.
Allows remote exploits to be carried out by your system.

...probably a bunch of others....

Maciejasjmj · 2015-02-05 Reply Admin

10. When handed an invalid URL, in 99% of the cases, you're not getting any response code. A response code means there actually is a HTTP server over there, which is a pretty lucky occurrence if you're just going in a random direction.

henke37 · 2015-02-05 Reply Admin

410 is my favorite reply code, it's supposed to be used to say that the page has deliberately been removed. Not an error, just Gone.

tarunik · 2015-02-05 Reply Admin

uiron:
Regexes are irreplaceable in some cases, and far from requiring a cryptic spaghetti to solve the problem.

I am sorry -- but for non-trivial tasks -- you're better off with a full set of parser combinators; they're far more expressive and readable, especially if they mimic EBNF syntax.

boomzilla · 2015-02-05 Reply Admin

Like useful parts of Discourse!

chubertdev · 2015-02-05 Reply Admin

henke37:
4108 is my favorite reply code, it's supposed to be used to say that the page has deliberately been ~~removed~~brewed. Not an error, just ~~Gone~~Tea.

FTFY

antiquarian · 2015-02-05 Reply Admin

Yamikuronue:
They can't all be winners.
Sincerely, Jane Bailey

Keep Calm and Ignore the Front-page Trolls (sorry, can't be arsed to update avatar for this).

Spectre · 2015-02-05 Reply Admin

- www.google.com (no protocol) - http://www.⌘.ws/

Both of these aren't valid URIs (I'm going to substitute "URI" for "URL", since there's no general mechanism to determine whether a URI is an URL).

I'm pretty sure the language of URIs is regular, though. Here's a script I wrote a while ago that uses a regular expression derived from the official grammar to match URIs (and relative URI references): https://gist.github.com/SpecLad/4514342.

And here's the expression itself: http://pastebin.com/SgMi0xRp. It's a bit longer than it needs to be, because I used /x.

Yamikuronue · 2015-02-05 Reply Admin

Honestly, the state of the data setup in my work's demo environment rankles me far more XD

hungrier · 2015-02-05 Reply Admin

henke37:
Not an error, just Gone.

http://i.ytimg.com/vi/-DT7bX-B1Mg/hqdefault.jpg

charliemaggot · 2015-02-05 Reply Admin

Don't get me started on regex validating of emails.

After years of people googling the best regex, there are now many sites that won't allow the new TLDs or restricted them to {2,4} between two and four characters.

[email protected] is not well liked.

HardwareGeek · 2015-02-05 Reply Admin

charliemaggot:
Don't get me started on regex validating of emails.

Too late: http://what.thedailywtf.com/t/til-plus-email-validation-and-people-parts/7613/. I'm not sure where in the conversation email validation comes up; maybe 1/4 of the way.

tarunik · 2015-02-05 Reply Admin

I want to stick a comment in an email address and see how far that flies in today's world...

Filed under: but the RFC has syntax for it!

foxyshadis · 2015-02-05 Reply Admin

aliceif:
.aa doesn't exist, yes. However, xy.i.de would be valid.

Given that there are far more ways to make it wrong than make it right, and the framework's already done the heavy lifting for you, I've very confused as to why someone wouldn't just create a System.Uri initialized to the string to test it for validity... given that it's going to be done that way later anyway. If you get an exception, you have your answer! If you only support http, then you can test the scheme once you have the Uri.

Someone had a screwdriver in his hand and still insisted on grabbing the hammer when he saw a screw.

CarrieVS · 2015-02-05 Reply Admin

I want a teapot that says ERROR 418 on it. I've tried to look for one but all I could find is a t-shirt. There must be someone selling such a thing, surely?

chubertdev · 2015-02-05 Reply Admin

CarrieVS:
I want a teapot that says ERROR 418 on it. I've tried to look for one but all I could find is a t-shirt. There must be someone selling such a thing, surely?

It probably wouldn't be hard for a local shop to put that text on a regular teapot.

CarrieVS · 2015-02-05 Reply Admin

chubertdev:
It probably wouldn't be hard for a local shop to put that text on a regular teapot.

It would feel wrong. Like it was cheating or something. Utterly irrational I know, but seeing as I don't especially want a teapot for its own sake (quite happy with tea made in the mug), just for the joke, I want a 'real' one.

HardwareGeek · 2015-02-05 Reply Admin

CarrieVS:
I want a teapot that says ERROR 418 on it. I've tried to look for one but all I could find is a t-shirt. There must be someone selling such a thing, surely?

If so, Google doesn't seem to know about it. I want one, too.

riking · 2015-02-05 Reply Admin

The second becomes a valid URI after Punycode conversion, which browsers will do automatically.

Also, I'm pretty sure everyone missed another false match case:

ftps://example.com/file

You probably don't have a uri handler for that installed.

pedantic_git · 2015-02-05 Reply Admin

Valid as an abbreviation for an HTTP URL maybe. But not a URL. RFC 3986 says all URLs contain the ':' character.

Polygeekery · 2015-02-05 Reply Admin

Jaloopa:
Validating a URL should use the same philosophy as validating an email: check if it exists.
Fires off a request, if it's not 404* you're good to go

That is a bad idea on so many different levels, most of which @monkeyArms covered. But, you can get at least one person to support it. Paging @Rhywden.

Arantor · 2015-02-05 Reply Admin

Because this is not in any way abusable as a CSRF vector in the same way, say, using a URL minifier is?

But again, you can still do basic validation before you even do that.

cheong · 2015-02-06 Reply Admin

When you see it matches htftp://www.google.com , you know the validation is not doing it's work.

EDIT: Edit to see whether it shows all the time of edit.

another_sam · 2015-02-06 Reply Admin

JBert:
Depending on the context, it could be a valid URL (you could buttume it's http like most browsers do nowadays).

It's never a valid URL regardless of context. You can add http:// to make it a valid URL if that suits your purposes, as most browsers do, but it's not a valid URL until you do that.

Dragnslcr:
Why isn't it a valid URL?

No scheme.

TwelveBaud · 2015-02-06 Reply Admin

riking:
You probably don't have a uri handler for that installed.

Oh, but I do. And so does everyone with OS X. Or Gnome. Or Windows 10.

riking · 2015-02-06 Reply Admin

.. Dang, I was pretty sure it was sftp but it seems that they both (ftps and sftp) exist.

BobbyTables · 2015-02-06 Reply Admin

cheong:
When you see it matches htftp://www.google.com , you know the validation is not doing it's work.

It doesn't, though. Why would you say it does? It's pretty clear from the regex itself that it doesn't (well, "clear", relative to regexes, anyway) so regex 101 wasn't required but it's a quick way to show it.

Hanzo · 2015-02-06 Reply Admin

Jarry:
[BNF][1]

Weirdly enough, that grammar seems wrong. E.g., http://google.com:80 won't parse, because you can't add a port to a path. Anyway, if anyone is interested, that grammar is equivalent this regular expression:

^[a-zA-Z]([a-zA-Z0-9$_@.&!*"'(),-]|%[0-9a-fA-F][0-9a-fA-F])+?://(|(([a-zA-Z0-9$_@.&!*"'(),-]|%[0-9a-fA-F][0-9a-fA-F])|\+)+)(/(|(([a-zA-Z0-9$_@.&!*"'(),-]|%[0-9a-fA-F][0-9a-fA-F])|\+)+))*(\?([a-zA-Z0-9$_@.&!*"'(),-]|%[0-9a-fA-F][0-9a-fA-F])+(\+([a-zA-Z0-9$_@.&!*"'(),-]|%[0-9a-fA-F][0-9a-fA-F])+)*)?$

dkf · 2015-02-06 Reply Admin

cheong:
When you see it matches htftp://www.google.com , you know the validation is not doing it's work.

It's a valid URL. You probably don't have a handler for the htftp scheme though.

PJH · 2015-02-06 Reply Admin

FroshKiller:
Maybe I'm using a URL validator to make sure that the new URL someone entered for a new deployment is a valid URL. There is a difference between an invalid URL and a valid URL that just doesn't currently resolve to anything

I think you may have missed the joke, or maybe the emoji aren't showing on your browser or something....

chubertdev:
Why don't we talk about something less controversial, like politics, or religion...

Because @lucas buggered off?

tarunik:
I want to stick a comment in an email address and see how far that flies in today's world...

Is the comment about @ signs?

Steve_The_Cynic · 2015-02-06 Reply Admin

CarrieVS:
There's no + after the alphanumeric character class. I'm not the most fluent regex-mangler in the world and I think this is Java which I've not been near for a while, so correct me if I'm wrong but I believe it's precisely one alphanumeric character followed by (any character whatsoever, but they probably meant a dot).
Of course since the . wasn't escaped ww would match, and the remaining w. would match the next part, so the special case is still redundant.

Good Lord, you're right. I completely missed the fact that the + was missing. Or my head filled it in because it didn't seem right that it wasn't there.(1)

(1) This reminds me of something I saw many, many years ago (some time like 1984), in upstate New York. At that time NY allowed custom license plates with up to eight letters/digits, and a guy obviously worked nights a lot - he had NITESHFT as his plate, and you had to look twice to see that SHFT wasn't SHIFT. (It's easier to see on a computer screen than it was on the plate itself.)

PJH · 2015-02-06 Reply Admin

Steve_The_Cynic:
he had NITESHFT as his plate, and you had to look twice to see that SHFT wasn't SHIFT. (It's easier to see on a computer screen than it was on the plate itself.)

[image]

(source)

Steve_The_Cynic · 2015-02-06 Reply Admin

More or less, although I remember it as looking denser than that (which helped the illusion).

kupfernigk · 2015-02-06 Reply Admin

Polygeekery:
That is a bad idea on so many different levels, most of which @monkeyArms covered. But, you can get at least one person to support it. Paging @Rhywden.

Why are you parsing a URL if it is not going to be used? I take on board everything that has been said by monkeyArms, but why would I want to store a list of URLs some of which don't point to anything? The only reason for the list is that at some point someone is going to try to use it. Of course there are many, many problems with physically testing URLs, such as getting bizarre pages back from wireless access points and the like, but just storing untested but semantically valid URLs has its own gotchas - for instance if an attacker was trying to load a list of URLs which were intended for later registration to facilitate malware distribution.

accalia · 2015-02-06 Reply Admin

riking:
.. Dang, I was pretty sure it was sftp but it seems that they both (ftps and sftp) exist

two entirely different protocols. (never use ftps if you can avoid it. sftp is far more secure)

Jarry · 2015-02-06 Reply Admin

Hanzo:
Weirdly enough

it's W3. what did you expect?

dkf · 2015-02-06 Reply Admin

Jarry:
it's W3. what did you expect?

Excessive weird?

Jarry · 2015-02-06 Reply Admin

to begin with.

mzedeler · 2015-02-06 Reply Admin

http://www=www:99999:99999:99999 is a valid URL? Now that's new to me....

Hanzo · 2015-02-06 Reply Admin

Here are a few valid ones:

aaa://host.example.com:1813;transport=udp;protocol=radius
ymsgr:sendIM?ElParrotMuerto
unreal://192.168.1.0:27831/
spotify:The+Beatles
skype:001555900200

tar · 2015-02-06 Reply Admin

Format of a valid URL:

some stuff followed by some other stuff.

Hanzo · 2015-02-06 Reply Admin

Make sure your stuff doesn't contain spaces, though. They are dangerous. If we allow spaces in URLs, the terrorists will have won. And it's bad for the children and the polar bears.

tar · 2015-02-06 Reply Admin

##Dr .%20 or: how I learned to stop worrying and love having yet another method to encode problematic characters.

dkf · 2015-02-06 Reply Admin

mzedeler:
http://www=www:99999:99999:99999 is a valid URL? Now that's new to me....

That's not valid, because though it satisfies the generic URL interpretation, it fails the HTTP-specific interpretation. Each scheme interprets the scheme-specific part in its own way.

If you'd written “foobar://www=www:99999:99999:99999”, you'd probably have been OK. :D

How to Validate a URL

Leave a comment on “How to Validate a URL”