The Daily WTF: Curious Perversions in Information Technology

2023-11-30 Reply Admin

Frist? Sure

2023-11-30 Reply Admin

It is as sure as XKCD 221 - https://xkcd.com/221/ - Are you sure? Ja, ich hab' Schuhe.

2023-11-30 Reply Admin

As someone who has written a regexp to validate the syntax email addresses, I object to the statement that they cannot be parsed with a regex. Perl regular expression have allowed for recursion for more than a decade. Anything for which there is a BNF grammar, like email addresses, can be turned into a regular expression almost mechanically.

2023-11-30 Reply Admin

Sad part is JS has a perfectly good URL class that includes validation on the constructor. You could just do:

    function isValidURL(potentialUrl:string): boolean {
        try {
            new URL(potentialUrl);
            return true;
        } catch(e) {
            return false;
        }
    }

2023-11-30 Reply Admin

In practice, one can validate email addresses using a regular expression: https://stackoverflow.com/questions/201323/how-can-i-validate-an-email-address-using-a-regular-expression

Some people might say "but my Punycode!", which is halfway valid: IDN works (and people expect it to), but must be preprocessed before it becomes an SMTP-type email address. Less bright people might say "but RFC2822 comments!", which is not valid because they're definitionally not part of the email address.

There was a fad some decades ago to assert, as this article does, that email addresses are not a regular language -- but I thought that myth had been dispelled.

2023-11-30 Reply Admin

Are you ${isValidUrl()}?

2023-11-30 Reply Admin

Clearly, a brillant Paula bean solution.

Mr. TA · 2023-11-30 Reply Admin

The problem here is the uncalled-for decisiveness. A developer with a philosophical mindset would do something like this instead:

function isValidUrl() {
    return "maybe";
}

Bananafish · 2023-11-30 Reply Admin

Sure is a great way to validate URLs!

2023-11-30 Reply Admin

I dunno about using a REGEX for email -- the simplest thing is just if the thing includes an "@". Every time I crack open an RFC and then compare it to what's actually sent, I always see "bad" data sent over.

In Gopher, lots of servers don't correctly end lines with CR LF and ignore the "last line needs a dot"
In email, I've seen a major email vendor use SMTP type auth on their IMAP server.
I've seen more incorrect URLs than I care to remember (I was the PM for the URL parser and networking team for .NET)
non-ASCII domain names are far more touchy than one would like, especially when you look at enterprise name resolvers

Fun EMAIL fact: email addresses in RFC 561 are things like "fred AT place"

2023-11-30 Reply Admin

"Are those URLs? Sure! Is this a terrible approach? Sure! Does the fact that it's been like this for years and nobody actually complained imply that they didn't need URL validation in the first place? Sure!"

Now that's just funny. I'm still chuckling. Thumbs up!

dkf · 2023-11-30 Reply Admin

Checking for just one @ and not in either the first or last place covers the really silly problems (you aren't going to be supporting non-SMTP naming schemes in this day and age). More than that... need to see if you can actually send an email to the mailbox and have someone pick it up and respond sensibly to it. You simply aren't going to validate all possible current user mailboxes on all email systems without doing something completely beyond regexps.

2023-11-30 Reply Admin

https://github.com/Perl/perl5/blob/blead/t/re/reg_email.t

2023-11-30 Reply Admin

I know I shouldn't, but I kinda like this for some reason

2023-11-30 Reply Admin

There's even a URL.canParse(url, [base]) static method for this purpose. Although it's not supported by older JS engines.

welcor · 2023-11-30 Reply Admin

I agree - I regularly use comments to distinguish between [email protected] and [email protected] to make sure I can see who sold my address. And most regexes are too stupid to accept this, so I am a big proponent for the "send them a mail on the given address. If they answer, it was valid. Otherwise, ignore it."

2023-11-30 Reply Admin

Surely the correct implementation is:

    return "file not found";
}```

2023-11-30 Reply Admin

Maybe there really was validation code there once and it kept failing, so somebody trollishly replaced it with that one day?

Steve_The_Cynic · 2023-11-30 Reply Admin

"send them a mail on the given address. If they answer, it was valid. Otherwise, ignore it."

In the end, for most(1) purposes, this is the right answer. The website operator shouldn't care about some abstract concept of what's valid, e.g. I remember seeing someone on a forum somewhere who commented on having worked for the UK's overall domain name system operator, where his email address was (I may have his given name wrong) "john@uk" - good luck getting that past any website-hosted email address "validator"...

Instead, concentrate on what's important: can the site use the address to send an email to the person? Yes, then it's valid, otherwise complain in some way.

2023-11-30 Reply Admin

I would even argue that if you are looking for a complicated, exact validator you are most likely doing something wrong. Either you are looking for people to enter any possible valid value, in which case you are sure to be wrong at some point and restrict a valid input, like when the first person tries to post a URL attachment with an emoji in your 2005 forum software. Or, more likely, you have actual system requirements that mean supporting the entire standard would be nonsense. Your SSO redirect URL doesn't need to accept hex-IPs served over gopher with Arabic diacritics in the query string, that is nothing more than a security vulnerability waiting to happen. Azure AD allows anything that starts with https, uses the 443 port and a domain with a dot in it or localhost - that's perfectly fine for their use case and security profile. @.* will allow any practical address that can be served over the web and will catch most typos. Keep it simple and secure.

2023-11-30 Reply Admin

You must be new here.

Ralf · 2023-12-01 Reply Admin

RFC 561 had to be some kind of April joke. RFC 524 clearly uses @.

WTFGuy · 2023-12-01 Reply Admin

@Steve The Cynic ref:

The website operator shouldn't care about some abstract concept of what's valid, e.g. I remember seeing someone on a forum somewhere who commented on having worked for the UK's overall domain name system operator, where his email address was (I may have his given name wrong) "john@uk" - good luck getting that past any website-hosted email address "validator"...

Back in about 1993 I got my own domain for my company: MyCompanyName.US. Being based in the United States that seemed a good choice at the time. So my email address became [email protected]. No special characters: just some letters, one @ and one dot. Little did I know the trouble I unleashed upon myself.

"US" became a valid ccTLD in 1985, about 8 years before I got my domain. I was not an early adopter. But in 1993 my shiny new email address choked (30+%) of website email validators. Most choked if your TLD wasn't 3 letters, or worse yet wasn't "com" , "net" or "org". Ouch. Even now in 2023 I occasionally encounter websites whose email validators don't like [email protected].

The idjits are many among us.

d-coder · 2023-12-02 Reply Admin

I agree - I regularly use comments to distinguish between [email protected] and [email protected] to make sure I can see who sold my address.

I use panix.com, which has a workaround: I can use any number of email addresses of the form (eg) "[email protected]", since so many places don't understand the "+" format. They all end up at [email protected], but I can still filter on "thedailywtf" in the address. I've found at least one email database leak that way.

Disclaimer: Just a happy customer.

2023-12-02 Reply Admin

You also send emails for verification when someone opens an account. With a big warning “don’t do anything if you didn’t try to open an account”.

So yo have a real good chance to not only figure out the address is valid, and that it exists, but also that it is the correct one.

2023-12-02 Reply Admin

"valid e-mail" - Has nothing to do wit RFC or otherwise.... simply "Does a server exist which can receive and e-mail addressed in this way, and if not, then could such a server be created and put on the network [note, I did not specifically say internet]

2023-12-03 Reply Admin

As an iOS/macos developer: I take the string and call the OS to create a URL object from it. If the result is nil then it’s not a valid URL. But that’s not enough. There are absolute and relative URLs. Relative URLs need to be turned into an absolute URL by telling what they are relative to. There are file system URLs, they are actually the preferred way to access files. But maybe not what you want. There are schemes like mailto: or phone: or application specific ones, or ftp: or http: or https:

So you would have a more specific function name and more specific checks.

Scarlet_Manuka · 2023-12-06 Reply Admin

I dunno about using a REGEX for email -- the simplest thing is just if the thing includes an "@".

I do a little more than that - I look for [email protected] where each "z" represents any string of at least one character. It's about the minimum requirement for something that we're actually going to be able to send an email to.

Looking in our database the main benefit over just looking for an @ seems to be that we catch the ones who omit the domain part altogether (myemail@), or think the domain part is just "yahoo", "gmail", "hotmail", etc., or typo it in some way (usually xxcom or xx,com instead of xx.com). Though I have to give props to "lexus of brighton" as a domain, and the very special people who put the first part of the domain at the beginning (or the middle) of the username part instead of after the @.

Addendum 2023-12-06 05:19: No, I'm not too worried about the user@validTLD edge case. If we ever get one in our system and someone asks why that person isn't getting marketing emails, I'll happily add in an exception.

Input Validation is a Sure Thing

Leave a comment on “Input Validation is a Sure Thing”