• my name (unregistered)

    Frist? Sure

  • Officer Johnny Holzkopf (unregistered)

    It is as sure as XKCD 221 - https://xkcd.com/221/ - Are you sure? Ja, ich hab' Schuhe.

  • Abigail (unregistered)

    As someone who has written a regexp to validate the syntax email addresses, I object to the statement that they cannot be parsed with a regex. Perl regular expression have allowed for recursion for more than a decade. Anything for which there is a BNF grammar, like email addresses, can be turned into a regular expression almost mechanically.

  • Scragar (unregistered)

    Sad part is JS has a perfectly good URL class that includes validation on the constructor. You could just do:

        function isValidURL(potentialUrl:string): boolean {
            try {
                new URL(potentialUrl);
                return true;
            } catch(e) {
                return false;
            }
        }
    
  • Michael P (unregistered)

    In practice, one can validate email addresses using a regular expression: https://stackoverflow.com/questions/201323/how-can-i-validate-an-email-address-using-a-regular-expression

    Some people might say "but my Punycode!", which is halfway valid: IDN works (and people expect it to), but must be preprocessed before it becomes an SMTP-type email address. Less bright people might say "but RFC2822 comments!", which is not valid because they're definitionally not part of the email address.

    There was a fad some decades ago to assert, as this article does, that email addresses are not a regular language -- but I thought that myth had been dispelled.

  • Sauron (unregistered)

    Are you ${isValidUrl()}?

  • my name is missing (unregistered)

    Clearly, a brillant Paula bean solution.

  • (nodebb)

    The problem here is the uncalled-for decisiveness. A developer with a philosophical mindset would do something like this instead:

    function isValidUrl() {
        return "maybe";
    }
    
  • (nodebb)

    Sure is a great way to validate URLs!

  • Peter Smith (github) in reply to Abigail

    I dunno about using a REGEX for email -- the simplest thing is just if the thing includes an "@". Every time I crack open an RFC and then compare it to what's actually sent, I always see "bad" data sent over.

    • In Gopher, lots of servers don't correctly end lines with CR LF and ignore the "last line needs a dot"
    • In email, I've seen a major email vendor use SMTP type auth on their IMAP server.
    • I've seen more incorrect URLs than I care to remember (I was the PM for the URL parser and networking team for .NET)
    • non-ASCII domain names are far more touchy than one would like, especially when you look at enterprise name resolvers

    Fun EMAIL fact: email addresses in RFC 561 are things like "fred AT place"

  • Pauller (unregistered)

    "Are those URLs? Sure! Is this a terrible approach? Sure! Does the fact that it's been like this for years and nobody actually complained imply that they didn't need URL validation in the first place? Sure!"

    Now that's just funny. I'm still chuckling. Thumbs up!

  • (nodebb) in reply to Peter Smith

    Checking for just one @ and not in either the first or last place covers the really silly problems (you aren't going to be supporting non-SMTP naming schemes in this day and age). More than that... need to see if you can actually send an email to the mailbox and have someone pick it up and respond sensibly to it. You simply aren't going to validate all possible current user mailboxes on all email systems without doing something completely beyond regexps.

  • Abigail (unregistered) in reply to Peter Smith

    https://github.com/Perl/perl5/blob/blead/t/re/reg_email.t

  • Fizzlecist (unregistered)

    I know I shouldn't, but I kinda like this for some reason

  • BaT (unregistered) in reply to Scragar

    There's even a URL.canParse(url, [base]) static method for this purpose. Although it's not supported by older JS engines.

  • (nodebb) in reply to dkf

    I agree - I regularly use comments to distinguish between [email protected] and [email protected] to make sure I can see who sold my address. And most regexes are too stupid to accept this, so I am a big proponent for the "send them a mail on the given address. If they answer, it was valid. Otherwise, ignore it."

  • Anonymouse coward (unregistered)

    Surely the correct implementation is:

        return "file not found";
    }```
    
  • PluM (unregistered)

    Maybe there really was validation code there once and it kept failing, so somebody trollishly replaced it with that one day?

  • (nodebb) in reply to welcor

    "send them a mail on the given address. If they answer, it was valid. Otherwise, ignore it."

    In the end, for most(1) purposes, this is the right answer. The website operator shouldn't care about some abstract concept of what's valid, e.g. I remember seeing someone on a forum somewhere who commented on having worked for the UK's overall domain name system operator, where his email address was (I may have his given name wrong) "john@uk" - good luck getting that past any website-hosted email address "validator"...

    Instead, concentrate on what's important: can the site use the address to send an email to the person? Yes, then it's valid, otherwise complain in some way.

  • guest (unregistered) in reply to dkf

    I would even argue that if you are looking for a complicated, exact validator you are most likely doing something wrong. Either you are looking for people to enter any possible valid value, in which case you are sure to be wrong at some point and restrict a valid input, like when the first person tries to post a URL attachment with an emoji in your 2005 forum software. Or, more likely, you have actual system requirements that mean supporting the entire standard would be nonsense. Your SSO redirect URL doesn't need to accept hex-IPs served over gopher with Arabic diacritics in the query string, that is nothing more than a security vulnerability waiting to happen. Azure AD allows anything that starts with https, uses the 443 port and a domain with a dot in it or localhost - that's perfectly fine for their use case and security profile. @.* will allow any practical address that can be served over the web and will catch most typos. Keep it simple and secure.

  • Michael R (unregistered) in reply to Scragar

    You must be new here.

  • (nodebb) in reply to Peter Smith

    RFC 561 had to be some kind of April joke. RFC 524 clearly uses @.

  • (nodebb)

    @Steve The Cynic ref:

    The website operator shouldn't care about some abstract concept of what's valid, e.g. I remember seeing someone on a forum somewhere who commented on having worked for the UK's overall domain name system operator, where his email address was (I may have his given name wrong) "john@uk" - good luck getting that past any website-hosted email address "validator"...

    Back in about 1993 I got my own domain for my company: MyCompanyName.US. Being based in the United States that seemed a good choice at the time. So my email address became [email protected]. No special characters: just some letters, one @ and one dot. Little did I know the trouble I unleashed upon myself.

    "US" became a valid ccTLD in 1985, about 8 years before I got my domain. I was not an early adopter. But in 1993 my shiny new email address choked (30+%) of website email validators. Most choked if your TLD wasn't 3 letters, or worse yet wasn't "com" , "net" or "org". Ouch. Even now in 2023 I occasionally encounter websites whose email validators don't like [email protected].

    The idjits are many among us.

  • [email protected] (unregistered)
    Comment held for moderation.
  • (nodebb) in reply to welcor

    I agree - I regularly use comments to distinguish between [email protected] and [email protected] to make sure I can see who sold my address.

    I use panix.com, which has a workaround: I can use any number of email addresses of the form (eg) "[email protected]", since so many places don't understand the "+" format. They all end up at [email protected], but I can still filter on "thedailywtf" in the address. I've found at least one email database leak that way.

    Disclaimer: Just a happy customer.

  • Gnasher729 (unregistered) in reply to welcor

    You also send emails for verification when someone opens an account. With a big warning “don’t do anything if you didn’t try to open an account”.

    So yo have a real good chance to not only figure out the address is valid, and that it exists, but also that it is the correct one.

  • TheCPUWizard (unregistered)

    "valid e-mail" - Has nothing to do wit RFC or otherwise.... simply "Does a server exist which can receive and e-mail addressed in this way, and if not, then could such a server be created and put on the network [note, I did not specifically say internet]

  • Gnasher729 (unregistered)
    Comment held for moderation.
  • (nodebb) in reply to Peter Smith

    I dunno about using a REGEX for email -- the simplest thing is just if the thing includes an "@".

    I do a little more than that - I look for [email protected] where each "z" represents any string of at least one character. It's about the minimum requirement for something that we're actually going to be able to send an email to.

    Looking in our database the main benefit over just looking for an @ seems to be that we catch the ones who omit the domain part altogether (myemail@), or think the domain part is just "yahoo", "gmail", "hotmail", etc., or typo it in some way (usually xxcom or xx,com instead of xx.com). Though I have to give props to "lexus of brighton" as a domain, and the very special people who put the first part of the domain at the beginning (or the middle) of the username part instead of after the @.

    Addendum 2023-12-06 05:19: No, I'm not too worried about the user@validTLD edge case. If we ever get one in our system and someone asks why that person isn't getting marketing emails, I'll happily add in an exception.

Leave a comment on “Input Validation is a Sure Thing”

Log In or post as a guest

Replying to comment #:

« Return to Article