• Zatapatique (unregistered)

    This is outrageous! I'm not allowed to use my .porn email address on this site!

  • (nodebb)

    I'm extremely pleased that ".az" for Azerbaijan is on the list.

  • Hanzito (unregistered)

    I'm somewhat confused as to why an ipv4 address would qualify too.

  • (nodebb)

    Oh dear, poor john@uk is out of luck again...

  • Don Wright (unregistered)

    Which is why almost everyone validates emails by actually sending an email and asking the user to enter (or reply) with the random code found within. If thirty new ccTLDs are created overnight, poof!, they are supported already.

  • RLB (unregistered)

    There's another reason: you can, in theory, validate that an email address is technically correct, but you can't validate that it actually exists, is the right one, and isn't over its limits. You can only check that by sending an email and getting an answer back.

  • (nodebb) in reply to Don Wright

    And that's the only sensible way beyond a basic check for an @ and maybe a lexically correct domain name that is worth doing.

  • Sauron (unregistered)

    How to test that the code works:

    Solution 1: Have an automated test framework test various common and uncommon cases, including tricky edge cases.

    Solution 2: The automated test framework has been broken for the last 3 years during an update, and never repaired. Have the testers test a few common cases manually.

    Solution 3: Automated test frameworks have been forbidden by the company policy "for security" for the last 10 years. The testers have all been laid off last year, because management decided testing was an unnecessary waste of time and money. Testing is now done by deploying the tickets straight to production and waiting for the users to report bugs.

  • (nodebb)

    I have always wondered what horrors would ensue if someone input a reasonable-length plain ASCII string as an email that was pure garbage and the website or app just accepted it and stored it in their database or whatever.

    I could never get the business side to articulate just what terrible thing would happen.

    But by howdy did the BA's want to validate those things.

  • Anonymous') OR 1=1; DROP TABLE wtf; -- (unregistered)

    This regex also has a slightly buggy check for IP addresses at the end:

    ...|(([0-9][0-9]?|[0-1][0-9][0-9]|[2][0-4][0-9]|[2][5][0-5])\.){3}([0-9][0-9]?|[0-1][0-9][0-9]|[2][0-4][0-9]|[2][5][0-5]))$
    

    This would allow leading 0's in each octet in some cases, so e.g. 01.023.07.089 would be accepted. (Some of these would even be accepted by inet_ntoa() by converting to octal! But that's a completely different WTF.)

    Of course we shouldn't be trying to validate this in the first place beyond some basic sanity checks, as Remy noted in the article & HTML comments.

  • wiseguy (unregistered)

    But no validation is going to solve the actual problem: user typoing there email or entering a wrong one.

  • (nodebb)

    I do not understand why people object to the use of a simple regex to validate an email. The only way to completely validate the email is to wait for a reply and the regex, although not complete, is very simple and reduces bad user input.

  • (author) in reply to Rick

    A simple regex is the key word, there. And there are some very simple ones that cover 90% of cases. But you can go even simpler than a regex: has it got exactly 1 "@" in it?

    The reality is that the kinds of errors that can be made when entering an email address aren't going to get picked up by any realistic regex. You're catching the worst-case "they put their human name in, instead of their email address," which is worth doing, sure. But no regex is going to tell the difference between [email protected] and [email protected].

  • EmptyJay (unregistered) in reply to Mr. TA

    As is .al for Albania, which I use for my personal domain.

  • ZZartin (unregistered)

    If your custom email validation is more complicated than <some string>@<some string>.<some string> you're doing it wrong.

  • TS (unregistered)

    The current vogue among email collectors seems to be doing a DNS lookup to make sure the domain exists. Which means instead of getting [email protected], they get [email protected].

  • (nodebb) in reply to Remy Porter

    But you can go even simpler than a regex: has it got exactly 1 "@" in it?

    I'm pretty sure a spec-compliant email address can technically have more than one "@". Any extras have to be in the local part, and the local part has to be in quotes.

    ...Now, whether you will ever encounter a legitimate address like that in the wild that actually routes to a mailbox...

    Addendum 2024-04-15 10:52: Note: I acknowledge that this site of all places likely has a relatively high incidence of readers (and writers) who Already Knew That

  • Sou Eu (unregistered)

    When I was in college, the internet was still fairly new. I was told to do quick validation on the client side, but the server-side should always validate (in case the browser disabled JavaScript, for example). For this reason, validating bother server-side and client-side isn't seen as a WTF in my opinion.

    The REGEX used to validate an email is bad and means the code must be updated (in two places) every time a new TLD is approved or whitelisted for this site.

    In modern HTML, I would use the email input type and let the browser validate it cliient-side; the server should still validate, but use a standard library or something.

  • Mailbox part is incorrect (unregistered)

    Not to mention that the mailbox part of the validator (the part of the address before the @) is incorrect. Even without dealing with non-US characters, the "+" is a valid char, yet it is not allowed by this bad regex.

  • (nodebb)

    The correct regex for email addresses is by the way "@". Yop, nearly everything is allowed; you can have localhost and any other host names as endpoint and pretty much any character before the endpoint.

    And yes, the correct way to validate email addresses is sending a mail with a confirmation link. Everything else is pointless.

    Addendum 2024-04-15 11:55: For example this is a correct email address:

    "@"@[IPv6:2001:db8::1]

    Addendum 2024-04-15 12:18: This one is also a valid mail address:

    ß@_imap

  • (nodebb) in reply to Remy Porter

    There is a specification for email addresses. Either use it, or be extremely careful to ensure you do not exclude valid addresses. Far too much validation is done by code written by programmers who have seen a bunch of examples of something, figured out their own idea of what the rules are, and implemented their own wrong rules. In the case of email addresses, if you can safely assume they are non-local, then your rule should be that there is at least one @. If you need a better guarantee (e.g. so that you can email someone an important thing later) you must test actual operation.

    @Anonymous... Leading zeros are perfectly legitimate in IP addresses - see RFC790 for a whole table of them. The idea of using it to denote octal, however, is a bug and an immediately obvious one to anyone looking at the table of assigned network numbers. This is similar to my previous point: implementation should be by reading the specification and then faithfully following it - not making stuff up because you've seen many examples and had a guess at the underlying rule.

  • Anonymous') OR 1=1; DROP TABLE wtf; -- (unregistered) in reply to Charles-2

    The dotted quad text representation of IP addresses was never fully specified in any RFC. RFC790 and its successors that obsolete it give some examples with leading zeroes, sure, but they don't define the format.

    https://stackoverflow.com/questions/25543126/is-192-056-2-01-a-valid-representation-of-an-v4-ip goes into more detail about the history here. We can thank 4.2BSD for introducing octal/hexadecimal IP address parsing in its inet_aton() implementation, and now we're stuck with that in every major OS for compatibility. Whether or not they're officially legal, they should be avoided.

    In any case though, as was mentioned in the original article and in many comments, we shouldn't attempt to validate the email address by regex, whether containing a real domain name or an IP address. Just a basic sanity check that it contains an @, and then send an email to see if it gets accepted or rejected.

  • RLB (unregistered) in reply to Sou Eu

    For this reason, validating bother server-side and client-side isn't seen as a WTF in my opinion.

    Definitely not a WTF. You need both, and mostly for different reasons. Client-side to save the user from his own typos, server-side to save your database from malicious users.

  • (nodebb)

    When I found out email addresses are allowed to contain comments I lost all hope

    john(hello @(world!) )@example.com and john@(my)example.com both mean [email protected]

  • (not a)[email protected](or is it) (unregistered) in reply to Richard Tingle
    Comment held for moderation.

Leave a comment on “A Top Level Validator”

Log In or post as a guest

Replying to comment #:

« Return to Article