Email Validation Validity

  • Someone 2011-04-27 10:02
  • NYCNetworker 2011-04-27 10:10
    hmmm

    guess I'm gonna change my email address to

    hello?????@???.??

  • XXXXX 2011-04-27 10:14
    So what you're saying is, that I should return "CHAIN"?

    I suppose maintaining this code is better than getting stuck on a ^([l]{2,60})([@])([A-Za-z0-9\.|-|_]{1,60})(\.)([A-Za-z]{2,5})$ gang
  • derula 2011-04-27 10:15
    NYCNetworker:
    hmmm

    guess I'm gonna change my email address to

    hello?????@???.??


    Why not simply ".@ "?
  • Splognosticus 2011-04-27 10:15
    (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])


    So easy a child could do it.
  • operagost 2011-04-27 10:15
    OK, maybe those of us who don't administer email systems haven't heard of plus-addressing. But not ever encountering a valid email address with a hyphen in it? Or more than two periods?
  • RobY 2011-04-27 10:20
    Not sure about before the @, but most school systems in the area I work all follow the common pattern of name@xxx.k12.in.us .
  • jnewton 2011-04-27 10:24
    RobY:
    Not sure about before the @, but most school systems in the area I work all follow the common pattern of name@xxx.k12.in.us .


    <firstName>.<lastName>@<domain>.com is pretty popular as well
  • Bryan the K 2011-04-27 10:26
    The WTF is that he used REGEX, right?
  • alnite 2011-04-27 10:27
    looks like somebody did not know what a regex is.
  • kongr45gpen 2011-04-27 10:29
    This is the most correct way of proving if an e-mail address is valid or not:


    '([^\\x00-\\x20\\x22\\x28\\x29\\x2c\\x2e\\x3a-\\x3c'.
    '\\x3e\\x40\\x5b-\\x5d\\x7f-\\xff]+|\\x22([^\\x0d'.
    '\\x22\\x5c\\x80-\\xff]|\\x5c[\\x00-\\x7f])*\\x22)'.
    '(\\x2e([^\\x00-\\x20\\x22\\x28\\x29\\x2c\\x2e'.
    '\\x3a-\\x3c\\x3e\\x40\\x5b-\\x5d\\x7f-\\xff]+|'.
    '\\x22([^\\x0d\\x22\\x5c\\x80-\\xff]|\\x5c\\x00'.
    '-\\x7f)*\\x22))*\\x40([^\\x00-\\x20\\x22\\x28'.
    '\\x29\\x2c\\x2e\\x3a-\\x3c\\x3e\\x40\\x5b-\\x5d'.
    '\\x7f-\\xff]+|\\x5b([^\\x0d\\x5b-\\x5d\\x80-\\xff'.
    ']|\\x5c[\\x00-\\x7f])*\\x5d)(\\x2e([^\\x00-\\x20'.
    '\\x22\\x28\\x29\\x2c\\x2e\\x3a-\\x3c\\x3e\\x40'.
    '\\x5b-\\x5d\\x7f-\\xff]+|\\x5b([^\\x0d\\x5b-'.
    '\\x5d\\x80-\\xff]|\\x5c[\\x00-\\x7f])*\\x5d))*'
    (http://www.iamcal.com/publish/articles/php/parsing_email/)
  • Anon 2011-04-27 10:32
    When I was in grad school I had an e-mail address with a + in it. I often found websites that would barf on accepting that as a valid e-mail address.
  • Paul 2011-04-27 10:34
    The easiest, and most correct way to validate an email address is to send an email to it.

    The most incorrect way is to use some trivial little regex written by someone who hasn't even heard of RFC822, and just intuits what they think an email address might be.

    I have never seen a regex in the wild, that correctly validates email addresses.

    I would much rather read through the version of in the post, isValidEmailAddress to debug it, than a regular expression complex enough to be comprehensive.
  • dtobias 2011-04-27 10:36
    I've run into sites that refuse to accept my perfectly valid .name and .info addresses because they think that TLDs shouldn't be more than three letters.
  • wow 2011-04-27 10:38
    Paul:
    The easiest, and most correct way to validate an email address is to send an email to it.


    Wow. Just, well, wow. This is possibly the most amazingly dumb thing I have seen posted in a very long time, for more reasons than I can count.
  • Jeff 2011-04-27 10:39
    Well, yes, sending a test email (or nuking from orbit) is the only way to be sure. But short of that, I think you could just send an XML message to a web service somewhere and let it all be Somebody Else's Problem.
  • Bill G. 2011-04-27 10:40
    dtobias:
    I've run into sites that refuse to accept my perfectly valid .name and .info addresses because they think that TLDs shouldn't be more than three letters.
    Back when I invented the internet, 3 letters was enough for anybody.
  • Someone 2011-04-27 10:41
    A surprising number of websites also barf on firstname@lastname.name style addresses. I mean, .name has only been around 10 years, so I can see how they haven't had time to adjust to non 3-character TLDs. Lord knows that if it doesn't conform to [a-z]+@[a-z]+\.[a-z]{3} then it must be invalid!
  • PedanticCurmudgeon 2011-04-27 10:42
    NYCNetworker:
    hmmm

    guess I'm gonna change my email address to

    hello?????@???.??

    A polite troll leaves a good hint for his target. With that in mind, this email would be a better choice:

    u.fail@regex
  • Zolcos 2011-04-27 10:43
    Email validation by regex is practically a canonical example of some problems just not having a good solution that everyone can agree on.
    If you validate down to the letter of the RFCs, you will accept addresses that are technically correct but possibly unreachable, like those based on internal subdomains and not fully qualified.
    Even once you get past the hurdle of defining exactly what constitutes validity there are still many weird gotchas and it is hard to guarantee you've considered them all.

    PHP to the rescue, I guess. Every standard library needs an equivalent to:
    filter_var($x, FILTER_VALIDATE_EMAIL)
  • Pat L 2011-04-27 10:45
    wow:
    Paul:
    The easiest, and most correct way to validate an email address is to send an email to it.


    Wow. Just, well, wow. This is possibly the most amazingly dumb thing I have seen posted in a very long time, for more reasons than I can count.


    Rare is the situation where you only care if an email address is theoretically valid, and not whether it actually exists.= and can be sent mail.
  • Justin 2011-04-27 10:46
    So I guess I'm safe with my aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa@bbb.cc?
  • HellKarnassus 2011-04-27 10:46
    wow:
    Paul:
    The easiest, and most correct way to validate an email address is to send an email to it.


    Wow. Just, well, wow. This is possibly the most amazingly dumb thing I have seen posted in a very long time, for more reasons than I can count.

    Why? I think it is a useful idea, you ask the user to input an e-mail address and to confirm it, then you send an activation/validation mail to finish whatever process the app must do. It is less annoying than a script or method that tells you the address is incorrect even though it does exist.
  • Power Troll 2011-04-27 10:49
    Ah, OK. TRWTF is that the first example isn't really appropriate for Java. Here's a more OO version.


    public final class EmailAddress {
    private final String address;

    public EmailAddress (String address) throws IllegalEmailAddressException {

    // a null string is invalid
    if (address == null)
    throw new IllegalEmailAddressException("This is an invalid email address.");

    // a string without a "@" is an invalid email address
    if (address.indexOf("@") < 0)
    throw new IllegalEmailAddressException("This is an invalid email address.");

    // a string without a "." is an invalid email address
    if (address.indexOf(".") < 0)
    throw new IllegalEmailAddressException("This is an invalid email address.");

    if (lastEmailFieldTwoCharsOrMore(address) == false)
    throw new IllegalEmailAddressException("This is an invalid email address.");

    if (address.indexOf("!") > 0)
    throw new IllegalEmailAddressException("This is an invalid email address.");

    if (address.indexOf("#") > 0)
    throw new IllegalEmailAddressException("This is an invalid email address.");

    if (address.indexOf("$") > 0)
    throw new IllegalEmailAddressException("This is an invalid email address.");

    if (address.indexOf("%") > 0)
    throw new IllegalEmailAddressException("This is an invalid email address.");

    if (address.indexOf("&") > 0)
    throw new IllegalEmailAddressException("This is an invalid email address.");

    if (address.indexOf("*") > 0)
    throw new IllegalEmailAddressException("This is an invalid email address.");

    if (address.indexOf("+") > 0)
    throw new IllegalEmailAddressException("This is an invalid email address.");

    if (address.indexOf("-") > 0)
    throw new IllegalEmailAddressException("This is an invalid email address.");

    if (address.indexOf("~") > 0)
    throw new IllegalEmailAddressException("This is an invalid email address.");

    if (address.indexOf("ä") > 0)
    throw new IllegalEmailAddressException("This is an invalid email address.");

    if (address.indexOf("ö") > 0)
    throw new IllegalEmailAddressException("This is an invalid email address.");

    if (address.indexOf("å") > 0)
    throw new IllegalEmailAddressException("This is an invalid email address.");

    if (address.indexOf(";") > 0)
    throw new IllegalEmailAddressException("This is an invalid email address.");


    this.address = address;

    }

    private static final boolean lastEmailFieldTwoCharsOrMore(String emailAddress) {
    if (emailAddress == null)
    return false;
    StringTokenizer st = new StringTokenizer(emailAddress, ".");
    String lastToken = null;
    while (st.hasMoreTokens()) {
    lastToken = st.nextToken();
    }

    if (lastToken.length() >= 2) {
    return true;
    } else {
    return false;
    }
    }

    public String getAddress() {
    return this.address;
    }
    }
  • SeySayux 2011-04-27 10:49
    Actually, indexOf usually is a lot faster than regexes. This'd be my first stab at it, without regexes:


    bool isValidMail(String s) {
    return ((sidx_t i = s.indexOf('@')) != -1 &&
    s.indexOf('.',i) != -1 &&
    s.lastIndexOf('.') inr(s.length-1,s.length-3) &&
    [&s](){
    for(uchar c : s) {
    if(!(isalpha(c) || isnum(c) ||
    c ina({'@','.','+'})) return false;
    }
    return true;
    }());
    }


    Of course, this one wouldn't prevent adresses such as foo.@.stuff.bar, but hey, it'd make a great T-shirt!(*) :P

    - SeySayux

    (*) Not that that regex wouldn't.
  • XXXXX 2011-04-27 10:54
    Paul:
    The easiest, and most correct way to validate an email address is to send an email to it.

    The most incorrect way is to use some trivial little regex written by someone who hasn't even heard of RFC822, and just intuits what they think an email address might be.

    I have never seen a regex in the wild, that correctly validates email addresses.

    I would much rather read through the version of in the post, isValidEmailAddress to debug it, than a regular expression complex enough to be comprehensive.


    I hope no one is still writing code against RFC 822. It was superseded 10 years ago by 2822.
  • Paul 2011-04-27 10:58
    Try and enumerate a few, please.

    If you are actually writing the actual email-sending part of an actual email-sending application, then I agree, you can't send an email to validate the address.

    If you are doing anything else, then the only reliable (and DRY) way to validate an email address is to try and use it as an email address.

    Why reinvent the wheel by doing some other stupid "validation" beforehand? You make the wheel less and less round every time you reinvent it. What if (as is normally the case) your validation incorrectly discards good addresses that your server can cope with? What if your validation is correct, but your email server is wrong? What if your validation deliberately matches your email server's faulty validation, but someone else fixes the server, leaving you with a pointlessly faulty validation?

    Even if you do get your syntactic validation correct, (which appears highly unlikely, given the prevalence of incorrect validation in the world); all you know is that it looks like an email address, you have no idea whether it is an email address (i.e. one that resolves to a mailbox)
  • frits 2011-04-27 11:01
    For those who don't already know, here's one way to verify whether an email address exists: http://www.labnol.org/software/verify-email-address/18220/.
    All this nonsense about input validation and sending actual messages isn't much better than the code in the article.
  • Paul 2011-04-27 11:02
    XXXXX:

    I hope no one is still writing code against RFC 822. It was superseded 10 years ago by 2822.


    Whoops, Good point.
  • riiiiiiight 2011-04-27 11:02
    Justin:
    So I guess I'm safe with my spaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaace@spa.ce?


    FTFY
  • SG_01 2011-04-27 11:04
    dtobias:
    I've run into sites that refuse to accept my perfectly valid .name and .info addresses because they think that TLDs shouldn't be more than three letters.


    I think you need to try with the .museum and .travel TLDs then :D
  • grzlbrmft 2011-04-27 11:05
    wow:
    Paul:
    The easiest, and most correct way to validate an email address is to send an email to it.


    Wow. Just, well, wow. This is possibly the most amazingly dumb thing I have seen posted in a very long time, for more reasons than I can count.


    In case you can count that far, name three, please.
  • dude 2011-04-27 11:14
    frits:
    For those who don't already know, here's one way to verify whether an email address exists: <snip>.
    All this nonsense about input validation and sending actual messages isn't much better than the code in the article.


    The only difference between sending an email and what was done in the article is actually sending data to be delivered. It's a lot more work than just sending an email, and you have no way of verifying that the user actually got the email because you didn't send one...
  • Erik 2011-04-27 11:18
    Hmm, in the past I've had email address with ! and % in them.

    I've got domains with - in them. I use + all the time (and still see breakage).

    However, I was once <user>@<tld> . That one was fun. :)
  • jngeist@gmail.com 2011-04-27 11:20
    [code]
    else {
    return "C-C-C-COMBO BREAKER!";
    }
  • jngeist@gmail.com 2011-04-27 11:20

    else {
    return "C-C-C-COMBO BREAKER!";
    }
  • N. Tufnel 2011-04-27 11:23
    Paul:
    The easiest, and most correct way to validate an email address is to send an email to it.


    In what sense does that validate anything, if you don't have access to that address's mailbox?
  • Zylon 2011-04-27 11:24
    wow:
    Paul:
    The easiest, and most correct way to validate an email address is to send an email to it.


    Wow. Just, well, wow. This is possibly the most amazingly dumb thing I have seen posted in a very long time, for more reasons than I can count.

    I was going to say this, but about Alex's regex comment. As others have pointed out, checking all possible valid email addresses using a regex is borderline impossible.

    http://www.regular-expressions.info/email.html

    Also fuck you Akismet, this isn't spam.
  • frits 2011-04-27 11:27
    dude:
    frits:
    For those who don't already know, here's one way to verify whether an email address exists: <snip>.
    All this nonsense about input validation and sending actual messages isn't much better than the code in the article.


    The only difference between sending an email and what was done in the article is actually sending data to be delivered. It's a lot more work than just sending an email, and you have no way of verifying that the user actually got the email because you didn't send one...


    How are you verifying again?
  • Paul 2011-04-27 11:39
    N. Tufnel:
    Paul:
    The easiest, and most correct way to validate an email address is to send an email to it.


    In what sense does that validate anything, if you don't have access to that address's mailbox?


    It validates that whatever you are using to actually send mail can parse the address.

    It validates that the address can be used to reach a mail server.

    Depending on the setup of the receiving server, it validates that mail can be sent to the specified mailbox (rather than returning an error). If there is a catch-all, then this doesn't work.

    As others have mentioned, sending a validation message ensures that the address entered reaches a mailbox that the intended recipient can read.
  • Andrew 2011-04-27 11:44
    It seems that 90% of sites restrict valid emails, I usually use something like this

    .*[^ ]+.*@[^ ]+\.[^ ]
  • jonnyq 2011-04-27 11:47
    N. Tufnel:
    Paul:
    The easiest, and most correct way to validate an email address is to send an email to it.


    In what sense does that validate anything, if you don't have access to that address's mailbox?


    One would hope that the person providing the email address would have access to it for confirmation...

    I mean seriously... two people this dumb?

    In my opinion, if you're just trying to prevent typing mistakes: anything, followed by an @, followed by anything, followed by a ., followed by anything. That's all I ever use. If you're trying to validate that an email address is real - send a confirmation email. Anything in between is prone to mistakes and probably isn't helping anyone.
  • Jaime 2011-04-27 11:48
    Paul:
    N. Tufnel:
    Paul:
    The easiest, and most correct way to validate an email address is to send an email to it.


    In what sense does that validate anything, if you don't have access to that address's mailbox?


    It validates that whatever you are using to actually send mail can parse the address.

    It validates that the address can be used to reach a mail server.

    Depending on the setup of the receiving server, it validates that mail can be sent to the specified mailbox (rather than returning an error). If there is a catch-all, then this doesn't work.

    As others have mentioned, sending a validation message ensures that the address entered reaches a mailbox that the intended recipient can read.
    nobody@mailinator.com

    Mailinator's whole purpose in life is to accept mail and junk it after a very short time. EMail address validation is stupid as a general idea, but makes sense if you have a specific purpose for doing so. Examples:

    If you need to validate that the email address belongs to the user -- Send a one-off email that the user must respond to.

    If you are trying to help a user enter data on a form -- do any half-assed validation and throw up a kindly worded warning if it doesn't match. Allow the user to continue on validation failure.

    Notice in both cases that it is not necessary to have a robust validation procedure. I can't think of a reason, other than while writing mail router software, to strictly validate the format of an email address.
  • David Martensson 2011-04-27 11:48
    This one is actually better than most I have seen.

    I have actually tried to build a test that was 100 % rfc compliant and the result was horrifyingly large, and we still found existing, working emails that was denied ;)

    We finally went with a commercial component that so far have correctly handled every address we feed it.

    For any lesser ambition, just check the very simplest format and if you need more, use a verification where you actually send an email to the address and require them to click a link.

    With the new Asian letters coming I would not even try to build my own again =)
  • hoodaticus 2011-04-27 11:53
    Paul:
    The easiest, and most correct way to validate an email address is to send an email to it.

    The most incorrect way is to use some trivial little regex written by someone who hasn't even heard of RFC822, and just intuits what they think an email address might be.

    I have never seen a regex in the wild, that correctly validates email addresses.

    I would much rather read through the version of in the post, isValidEmailAddress to debug it, than a regular expression complex enough to be comprehensive.
    I agree 100%. The best way is to contact the purported email server and see if the account is on there.

    Of course, you still have to parse out the server from the email address.
  • evilspoons 2011-04-27 12:07
    (I had to add a bunch of spaces to this to make the damn spam filter let me post, argh.)

    Gmail lets you arbitrarily use periods in your email address:

    bob dot frank at gmail dot com is the same inbox as bobfrank at gmail.com and b dot o dot b dot f dot r dot a dot n dot k at gmail dot com.

    Gmail also lets you use plus signs to assign labels to incoming messages - bob dot frank + noodles at gmail dot com will assign the message the label 'noodles' in your gmail inbox.

    I use it all the time for spam filtering (myaddress + nameofoffendingsite at gmail dot com) but I have run into many, many instances where it thinks my plus sign invalidates the address. Sigh.

    In the mid 90s I had an email address from a free webmail service that allowed you to choose your domain name. I think the site was "My Own Email" or something.

    I signed up with my.name at imatrekkie dot com (yeah yeah) and used it for a few months, signed up for a lot of newsletters. Then I had it forward all my messages to my new-fangled pre-Microsoft Hotmail account.

    Then My Own Email had a major overhaul. Their new software didn't accept email addresses with periods in them. My address was still valid and receiving newsletters and forwarding them to my Hotmail account... but I couldn't log in.

    My old email then got on a spam list and my Hotmail account was absolutely overwhelmed with forwarded crap... and I had no way to log in to delete my account, turn off forwarding, or anything. It was pretty special.
  • hoodaticus 2011-04-27 12:09
    N. Tufnel:
    Paul:
    The easiest, and most correct way to validate an email address is to send an email to it.


    In what sense does that validate anything, if you don't have access to that address's mailbox?
    You don't know much about SMTP, do you?
  • coop 2011-04-27 12:12
    This assumes the e-mail address is accessible on the internet, or is at least on the same network. Consider the case of networks that are completely disconnected from each other...
  • socknet 2011-04-27 12:22
    [quote user="jonnyq"][quote user="N. Tufnel"]

    One would hope that the person providing the email address would have access to it for confirmation...

    I mean seriously... two people this dumb?

    In my opinion, if you're just trying to prevent typing mistakes: anything, followed by an @, followed by anything, followed by a ., followed by anything. That's all I ever use. If you're trying to validate that an email address is real - send a confirmation email. Anything in between is prone to mistakes and probably isn't helping anyone.[/quote]

    Please provide contact details for your legal representation:

    Name: _____
    Phone: _____
    Email: ______


    Please provide contact details for your IT Support team:

    Phone: _____
    Email: ______


    Please provide your preferred email address to be created"

    Email: ______


    etc.

    There are MANY cases where you might ask someone to provide an email address which they may not have access to.

    Sending real emails to these addresses is pretty silly. 99.9% of the time, the simple validation will suffice. Whether or not you care about that missing 0.1%, and how you do so, would be completely dependent on context.
  • socknet 2011-04-27 12:22
    jonnyq:

    One would hope that the person providing the email address would have access to it for confirmation...

    I mean seriously... two people this dumb?

    In my opinion, if you're just trying to prevent typing mistakes: anything, followed by an @, followed by anything, followed by a ., followed by anything. That's all I ever use. If you're trying to validate that an email address is real - send a confirmation email. Anything in between is prone to mistakes and probably isn't helping anyone.


    Please provide contact details for your legal representation:

    Name: _____
    Phone: _____
    Email: ______


    Please provide contact details for your IT Support team:

    Phone: _____
    Email: ______


    Please provide your preferred email address to be created"

    Email: ______


    etc.

    There are MANY cases where you might ask someone to provide an email address which they may not have access to.

    Sending real emails to these addresses is pretty silly. 99.9% of the time, the simple validation will suffice. Whether or not you care about that missing 0.1%, and how you do so, would be completely dependent on context.
  • Validate This 2011-04-27 12:31
    socknet:
    jonnyq:

    One would hope that the person providing the email address would have access to it for confirmation...

    I mean seriously... two people this dumb?

    In my opinion, if you're just trying to prevent typing mistakes: anything, followed by an @, followed by anything, followed by a ., followed by anything. That's all I ever use. If you're trying to validate that an email address is real - send a confirmation email. Anything in between is prone to mistakes and probably isn't helping anyone.


    Please provide contact details for your legal representation:

    Name: _____
    Phone: _____
    Email: ______


    Please provide contact details for your IT Support team:

    Phone: _____
    Email: ______


    Please provide your preferred email address to be created"

    Email: ______


    etc.

    There are MANY cases where you might ask someone to provide an email address which they may not have access to.

    Sending real emails to these addresses is pretty silly. 99.9% of the time, the simple validation will suffice. Whether or not you care about that missing 0.1%, and how you do so, would be completely dependent on context.


    So how are you validating that they're entering the correct name and phone number? Or are you assuming that any (insert country specific format) sequence of digits is _the_ correct phone number for that person?

    In each of those cases the difference between entering a validly formatted but incorrect e-mail address and an invalidly formatted e-mail address is zilch. You still have garbage data either way.
  • Mason Wheeler 2011-04-27 12:37
    All I can say is, these are both quite a bit less ugly than PERL's version: http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html

  • Yazeran 2011-04-27 12:43
    Splognosticus:
    (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])


    So easy a child could do it.


    Fail (according to the perl module Mail::RFC822::Address: regexp-based address validation):

    (?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
    )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:
    \r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(
    ?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[
    \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0
    31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\
    ](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+
    (?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:
    (?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
    |(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)
    ?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\
    r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
    \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)
    ?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t]
    )*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[
    \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*
    )(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
    )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)
    *:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+
    |\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r
    \n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
    \r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t
    ]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031
    ]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](
    ?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?
    :(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?
    :\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?
    :(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?
    [ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\]
    \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|
    \\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>
    @,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"
    (?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t]
    )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
    ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?
    :[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[
    \]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-
    \031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(
    ?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;
    :\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([
    ^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\"
    .\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\
    ]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\
    [\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\
    r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\]
    \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]
    |\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0
    00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\
    .|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,
    ;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?
    :[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*
    (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
    \[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[
    ^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]
    ]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*(
    ?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
    ".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(
    ?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[
    \["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t
    ])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t
    ])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?
    :\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|
    \Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:
    [^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\
    ]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)
    ?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["
    ()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)
    ?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>
    @,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[
    \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,
    ;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t]
    )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
    ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?
    (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
    \[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:
    \r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[
    "()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])
    *))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])
    +|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\
    .(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
    |(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(
    ?:\r\n)?[ \t])*))*)?;\s*)

    And even that is only with comments removed......
  • Rick 2011-04-27 12:59
    Paul:
    Try and enumerate a few, please.
    ...

    Why reinvent the wheel by doing some other stupid "validation" beforehand?

    ...

    In case you really don't know the answer, algorithmic validation can return a response in milliseconds without a context switch for the user. Full validation takes minutes.
  • redundantman 2011-04-27 13:01
    The better way is to pop up not one, but TWO confirmation dialogs.

    #1 Are you sure that "hugmy@bum.com" is your e-mail address?

    #2 Are you REEAALLY sure?

    And then the user will be all like "zOMG I can't believe I typed it wrong" and they'll totally fix it.

    /true story
  • trtrwtf 2011-04-27 13:02
    jonnyq:
    N. Tufnel:
    Paul:
    The easiest, and most correct way to validate an email address is to send an email to it.


    In what sense does that validate anything, if you don't have access to that address's mailbox?


    One would hope that the person providing the email address would have access to it for confirmation...

    I mean seriously... two people this dumb?

    In my opinion, if you're just trying to prevent typing mistakes: anything, followed by an @, followed by anything, followed by a ., followed by anything. That's all I ever use. If you're trying to validate that an email address is real - send a confirmation email. Anything in between is prone to mistakes and probably isn't helping anyone.


    I think there's some confusion about the meaning of "validate" here. Some people seem to think it means "make sure this email address is really the one that belongs to this person", others believe (correctly) that it means "make sure that this email address is a well-formed address".

    The former case might be a common scenario, but when you "validate" something you're checking to see if it's valid - not whether it's correct.
  • trtrwtf 2011-04-27 13:06
    Rick:
    Paul:
    Try and enumerate a few, please.
    ...

    Why reinvent the wheel by doing some other stupid "validation" beforehand?

    ...

    In case you really don't know the answer, algorithmic validation can return a response in milliseconds without a context switch for the user. Full validation takes minutes.


    And if you happen to be doing something other than real-time interaction with a single user checking a single email address, validation by send-an-email-and-see-if-they-bother-to-reply is not going to do you a lot of good.

    But of course, nobody would ever have occasion to check a list of addresses, or anything silly like that.
  • socknet 2011-04-27 13:17
    Validate This:

    So how are you validating that they're entering the correct name and phone number? Or are you assuming that any (insert country specific format) sequence of digits is _the_ correct phone number for that person?


    Not really relevant to this discussion on email validation, but perhaps there are common methods which people use for names and phone numbers, feel free to google it if you are interested.

    Validate This:

    In each of those cases the difference between entering a validly formatted but incorrect e-mail address and an invalidly formatted e-mail address is zilch. You still have garbage data either way.


    Correct, but doesn't really add anything to the conversation. As mentioned, most of the time you don't have to prove with absolute certainty that an email address is correct, being 'reasonably sure' is usually close enough (asking people to input an email address twice seems to be common nowdays and probably catches a lot of entry mistakes). When you do need to be 100% sure on the email address, that is when it is a good time to do things such as send validation emails which require a response.
  • socknet 2011-04-27 13:19
    redundantman:
    The better way is to pop up not one, but TWO confirmation dialogs.

    #1 Are you sure that "hugmy@bum.com" is your e-mail address?

    #2 Are you REEAALLY sure?

    And then the user will be all like "zOMG I can't believe I typed it wrong" and they'll totally fix it.

    /true story


    It is even better if you have a 3rd box which says: "so you are saying your email is hugym@bum.com ?" and then if they click 'yes', you can have a 4th saying "liar!"
  • Optimus Dime 2011-04-27 13:22
    trtrwtf:
    Rick:
    Paul:
    Try and enumerate a few, please.
    ...

    Why reinvent the wheel by doing some other stupid "validation" beforehand?

    ...

    In case you really don't know the answer, algorithmic validation can return a response in milliseconds without a context switch for the user. Full validation takes minutes.


    And if you happen to be doing something other than real-time interaction with a single user checking a single email address, validation by send-an-email-and-see-if-they-bother-to-reply is not going to do you a lot of good.

    But of course, nobody would ever have occasion to check a list of addresses, or anything silly like that.


    What, exactly, would you be 'checking' for on this mythical occasion.

    captcha: acsi - The only true and validated character set.
  • JB 2011-04-27 13:24
    How come the domain part doesn't allow a final dot like URIs?

    http://www.google.com./search?q=uri
  • Splognosticus 2011-04-27 13:32
    Yazeran:
    Splognosticus:
    (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])


    So easy a child could do it.


    Fail (according to the perl module Mail::RFC822::Address: regexp-based address validation):

    (?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
    )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:
    \r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(
    ?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[
    \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0
    31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\
    ](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+
    (?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:
    (?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
    |(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)
    ?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\
    r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
    \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)
    ?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t]
    )*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[
    \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*
    )(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
    )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)
    *:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+
    |\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r
    \n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
    \r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t
    ]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031
    ]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](
    ?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?
    :(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?
    :\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?
    :(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?
    [ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\]
    \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|
    \\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>
    @,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"
    (?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t]
    )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
    ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?
    :[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[
    \]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-
    \031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(
    ?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;
    :\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([
    ^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\"
    .\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\
    ]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\
    [\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\
    r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\]
    \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]
    |\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0
    00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\
    .|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,
    ;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?
    :[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*
    (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
    \[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[
    ^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]
    ]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*(
    ?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
    ".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(
    ?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[
    \["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t
    ])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t
    ])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?
    :\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|
    \Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:
    [^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\
    ]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)
    ?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["
    ()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)
    ?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>
    @,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[
    \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,
    ;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t]
    )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
    ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?
    (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
    \[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:
    \r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[
    "()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])
    *))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])
    +|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\
    .(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
    |(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(
    ?:\r\n)?[ \t])*))*)?;\s*)

    And even that is only with comments removed......


    Heyyyy... Hard-coding every possible email address in an obfuscated expression is cheating.
  • SlainVeteran 2011-04-27 13:35
    domain_help+td-wtf@tk is a perfectly valid email address. I doubt many email address validators in the wild would OK it.
  • trtrwtf 2011-04-27 13:39
    Optimus Dime:
    trtrwtf:
    Rick:
    Paul:
    Try and enumerate a few, please.
    ...

    Why reinvent the wheel by doing some other stupid "validation" beforehand?

    ...

    In case you really don't know the answer, algorithmic validation can return a response in milliseconds without a context switch for the user. Full validation takes minutes.


    And if you happen to be doing something other than real-time interaction with a single user checking a single email address, validation by send-an-email-and-see-if-they-bother-to-reply is not going to do you a lot of good.

    But of course, nobody would ever have occasion to check a list of addresses, or anything silly like that.


    What, exactly, would you be 'checking' for on this mythical occasion.

    captcha: acsi - The only true and validated character set.


    One scenario might be checking data entry - someone types in a bunch of email addresses (from a handwritten sign-up sheet, perhaps) - and you want to verify that they haven't fat-fingered any of the addresses. Granted, you won't catch jeo.smith@foo.com, but you'd get joe,smith

    Or you might want to scan a document for potential email addresses (to automatically make them mail-to links, or to suck addresses into an address book, or whatnot). Being able to recognize a valid email address might be useful in that circumstance, no?

    Or you might want to make sure a user isn't just mashing the keyboard when "email address" is a required field on your form. Again, they could just enter joe.smith@foo.com, but you make them work a bit more. (this is not a bulk validation case, I know)

    Point is, "validate" does not mean the same thing as "verify".
  • Anonymous 2011-04-27 13:40
    dtobias:
    I've run into sites that refuse to accept my perfectly valid .name and .info addresses because they think that TLDs shouldn't be more than three letters.


    I've seen code that rejected 2-letters ccTLDs. Since all TLD are 3-letters and noone heard of places called France or United Kingdom ;)
  • trtrwtf 2011-04-27 13:40
    Splognosticus:
    Yazeran:
    Splognosticus:
    (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])


    So easy a child could do it.


    Fail (according to the perl module Mail::RFC822::Address: regexp-based address validation):

    (?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
    )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:
    \r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(
    ?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[
    \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0
    31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\
    ](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+
    (?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:
    (?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
    |(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)
    ?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\
    r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
    \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)
    ?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t]
    )*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[
    \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*
    )(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
    )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)
    *:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+
    |\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r
    \n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
    \r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t
    ]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031
    ]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](
    ?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?
    :(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?
    :\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?
    :(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?
    [ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\]
    \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|
    \\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>
    @,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"
    (?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t]
    )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
    ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?
    :[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[
    \]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-
    \031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(
    ?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;
    :\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([
    ^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\"
    .\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\
    ]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\
    [\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\
    r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\]
    \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]
    |\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0
    00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\
    .|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,
    ;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?
    :[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*
    (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
    \[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[
    ^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]
    ]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*(
    ?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
    ".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(
    ?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[
    \["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t
    ])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t
    ])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?
    :\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|
    \Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:
    [^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\
    ]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)
    ?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["
    ()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)
    ?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>
    @,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[
    \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,
    ;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t]
    )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
    ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?
    (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
    \[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:
    \r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[
    "()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])
    *))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])
    +|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\
    .(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
    |(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(
    ?:\r\n)?[ \t])*))*)?;\s*)

    And even that is only with comments removed......


    Heyyyy... Hard-coding every possible email address in an obfuscated expression is cheating.


    It's perl. If it's not obfuscated, what's the point?
  • frits 2011-04-27 13:42
    JB:
    How come the domain part doesn't allow a final dot like URIs?

    http://www.google.com./search?q=uri


    Because it's covered by a different RFC?
  • Paul 2011-04-27 13:50
    trtrwtf:

    And if you happen to be doing something other than real-time interaction with a single user checking a single email address, validation by send-an-email-and-see-if-they-bother-to-reply is not going to do you a lot of good.

    But of course, nobody would ever have occasion to check a list of addresses, or anything silly like that.


    See-if-they-bother-to-reply is an added bonus feature of sending a confirmation email, which lets you know that it's a real mailbox.

    The first response, which will be marginally slower than comes from your email-sending function stating that it managed to parse the address to extract enough information to be able to send it.

    One advantage of this, over a regular expression, is that the validation will only pass for valid addresses, and will only fail for invalid addresses.

    Another advantage is that it actually checks against real-world usage. See my post about reinventing the wheel. If you insist on using a regular expression that does not conform to the RFC, how do you guarantee that it matches the foibles your mail sending application?

    Look at it this way - If you have to check a list of addresses, what is the point of running it through a validator that doesn't work?
  • Dan 2011-04-27 13:55
    Still better than just guessing the rules.

    The most annoying thing is unsubscribe pages that use different validation rules than the form that got you on the list - they tell you you can't unsubscribe because you are not giving them a valid email address, for an address they are sending daily emails to.
  • PoPSiCLe 2011-04-27 13:57
    Hm. I've been using this (found somewhere, origin unknown) with a fair amount of success on several websites - I haven't really looked too closely, and it's probably failing some valid emails.

    function check_email_address($email) {
    // First, we check that there's one @ symbol, and that the lengths are right
    if (!ereg("^[^@]{1,64}@[^@]{1,255}$", $email)) {
    // Email invalid because wrong number of characters in one section, or wrong number of @ symbols.
    return false;
    }
    // Split it into sections to make life easier
    $email_array = explode("@", $email);
    $local_array = explode(".", $email_array[0]);
    for ($i = 0; $i < sizeof($local_array); $i++) {
    if (!ereg("^(([A-Za-z0-9!#$%&'*+/=?^_`{|}~-][A-Za-z0-9!#$%&'*+/=?^_`{|}~\.-]{0,63})|(\"[^(\\|\")]{0,62}\"))$", $local_array[$i])) {
    return false;
    }
    }
    if (!ereg("^\[?[0-9\.]+\]?$", $email_array[1])) { // Check if domain is IP. If not, it should be valid domain name
    $domain_array = explode(".", $email_array[1]);
    if (sizeof($domain_array) < 2) {
    return false; // Not enough parts to domain
    }
    for ($i = 0; $i < sizeof($domain_array); $i++) {
    if (!ereg("^(([A-Za-z0-9][A-Za-z0-9-]{0,61}[A-Za-z0-9])|([A-Za-z0-9]+))$", $domain_array[$i])) {
    return false;
    }
    }
    }
    return true;
    }
  • trtrwtf 2011-04-27 13:59
    [quote user="Paul]
    The first response, which will be marginally slower than comes from your email-sending function stating that it managed to parse the address to extract enough information to be able to send it.
    [snip]
    Look at it this way - If you have to check a list of addresses, what is the point of running it through a validator that doesn't work?
    [/quote]

    So the mail processing software manages to parse an email address, and always does it correctly, but no other software is capable of this task?
    Hm. I smell magic.

    I agree with you that "I'll just whip up an email validator" is probably wtf thinking, but this is one of those functions that ought to live in a library. I'm pretty sure I don't like requiring all email validation to generate spam as a side effect.
  • trtrwtf 2011-04-27 14:15
    PoPSiCLe:
    Hm. I've been using this (found somewhere, origin unknown) with a fair amount of success on several websites - I haven't really looked too closely, and it's probably failing some valid emails.

    function check_email_address($email) {
    // First, we check that there's one @ symbol, and that the lengths are right
    if (!ereg("^[^@]{1,64}@[^@]{1,255}$", $email)) {
    // Email invalid because wrong number of characters in one section, or wrong number of @ symbols.
    return false;
    }
    // Split it into sections to make life easier
    $email_array = explode("@", $email);
    $local_array = explode(".", $email_array[0]);
    for ($i = 0; $i < sizeof($local_array); $i++) {
    if (!ereg("^(([A-Za-z0-9!#$%&'*+/=?^_`{|}~-][A-Za-z0-9!#$%&'*+/=?^_`{|}~\.-]{0,63})|(\"[^(\\|\")]{0,62}\"))$", $local_array[$i])) {
    return false;
    }
    }
    if (!ereg("^\[?[0-9\.]+\]?$", $email_array[1])) { // Check if domain is IP. If not, it should be valid domain name
    $domain_array = explode(".", $email_array[1]);
    if (sizeof($domain_array) < 2) {
    return false; // Not enough parts to domain
    }
    for ($i = 0; $i < sizeof($domain_array); $i++) {
    if (!ereg("^(([A-Za-z0-9][A-Za-z0-9-]{0,61}[A-Za-z0-9])|([A-Za-z0-9]+))$", $domain_array[$i])) {
    return false;
    }
    }
    }
    return true;
    }



    Correct me if I'm wrong but doesn't this pass something like joe.smith@12243234.345345.4563465.34524353.23452354.2345?

  • Nagesh 2011-04-27 14:19
    in some email system they sometime use ' character to screw thing up.

    Just ask Sharon D'Souza!!!
  • Validate This 2011-04-27 14:20
    trtrwtf:
    Correct me if I'm wrong but doesn't this pass something like joe.smith@12243234.345345.4563465.34524353.23452354.2345?


    Numerical domain names are valid
  • Nagesh 2011-04-27 14:28

    public static boolean isValidEmailAddress(String aEmailAddress){
    if (aEmailAddress == null) return false;
    boolean result = true;
    try {
    InternetAddress emailAddr = new InternetAddress(aEmailAddress);
    if ( ! hasNameAndDomain(aEmailAddress) ) {
    result = false;
    }
    }
    catch (AddressException ex){
    result = false;
    }
    return result;
    }

    private static boolean hasNameAndDomain(String aEmailAddress){
    String[] tokens = aEmailAddress.split("@");
    return
    tokens.length == 2 &&
    Util.textHasContent( tokens[0] ) &&
    Util.textHasContent( tokens[1] ) ;
    }

    //..elided
    }


    Here's a simple function that will work for everyone.
  • O'Brien 2011-04-27 14:43
    Nagesh:
    in some email system they sometime use ' character to screw thing up.

    Just ask Sharon D'Souza!!!


    You! You're the incompetent retard that makes half the damn websites on the Internet fuck up my name!
  • Eevee 2011-04-27 14:52
    Given that "#\"@\"#"@[IPv6:::ffff:173.230.158.172] is a perfectly valid email address, I'd say the most reliable way to verify validity is:

    $email =~ /@/


    And then, yeah, just send an email to it.
  • Worf 2011-04-27 14:58
    SlainVeteran:
    domain_help+td-wtf@tk is a perfectly valid email address. I doubt many email address validators in the wild would OK it.


    Anyone remember ye olde uucp format as well with the ! in the email addresses? And I think # is also valid as well, but I suspect not too many validators accept it...
  • EvanED 2011-04-27 15:00
    socknet:
    Please provide contact details for your legal representation:

    Name: _____
    Phone: _____
    Email: ______


    This and the next case may make sense: you're giving the system an email address to be used later.

    Please provide your preferred email address to be created"

    Email: ______


    This, however, doesn't IMO. After all, I can't put 'billg@microsoft.com' in there, and that'll pass your validation.

    First, it would almost always be better to just put a 'username' box there, and add on the @domain yourself. If there's a choice of multiple domains, then give a drop-down box of them. Only if there are a ton of choices (e.g. you'll let them create an arbitrary subdomain too) does it make sense to have them provide a full email address.

    Second, 'valid email' doesn't imply a valid entry for that box either. It has to not exist -- so you have the same problem as before, except the reverse. If you have to check that anyway, why not just feed whatever the user enters to the system and let it fail? Why are you doing extra work? (Of course, you have to if you want to impose constraints that your backing system wouldn't, e.g. you want to disallow . from your email addresses or something. But then you're doing something different anyway.)

    99.9% of the time, the simple validation will suffice. Whether or not you care about that missing 0.1%, and how you do so, would be completely dependent on context.


    See, I disagree. I think it'd be much closer to 99.9% the other way. How many times have you entered someone else's email address in a form somewhere vs your own? And almost all the time you enter your own, I think it makes sense to send a confirmation email.
  • Eevee 2011-04-27 15:01
    Worf:
    Anyone remember ye olde uucp format as well with the ! in the email addresses? And I think # is also valid as well, but I suspect not too many validators accept it...

    I tried, briefly, to use # as the local-part delimiter in my email address. It doesn't need quoting and it's not reserved for anything.

    I gave up when even TDWTF rejected it.
  • Jaime 2011-04-27 15:03
    trtrwtf:
    jonnyq:
    N. Tufnel:
    Paul:
    The easiest, and most correct way to validate an email address is to send an email to it.


    In what sense does that validate anything, if you don't have access to that address's mailbox?


    One would hope that the person providing the email address would have access to it for confirmation...

    I mean seriously... two people this dumb?

    In my opinion, if you're just trying to prevent typing mistakes: anything, followed by an @, followed by anything, followed by a ., followed by anything. That's all I ever use. If you're trying to validate that an email address is real - send a confirmation email. Anything in between is prone to mistakes and probably isn't helping anyone.


    I think there's some confusion about the meaning of "validate" here. Some people seem to think it means "make sure this email address is really the one that belongs to this person", others believe (correctly) that it means "make sure that this email address is a well-formed address".

    The former case might be a common scenario, but when you "validate" something you're checking to see if it's valid - not whether it's correct.
    That's because there are two different groups that have this problem. One group is going to sell the address to a data-farmer and they want to guard against the user lying to them. The other group is collecting email addresses for the user's benefit and they want to help the user fill out the form accurately. Every solution proposed by one group will be rejected by the other.

    Let's just face the simple fact that email validation isn't a big deal in the general case and probably doesn't even deserve a library method. In the specific case of knowing that the email address is truely valid and controlled by that person, a library method wouldn't be sufficient.

    Doing it "right" is almost always wrong. Almost nobody cares if an email address meets RFC2822. They either care that it connects to a person or that there is a hint that the address may have been mistyped. A verification email solves the former and a half-assed validation regex solves the latter. "Proper" validation does neither as many bad email addresses are technically valid and almost all valid email addresses are not in use.
  • XXXXX 2011-04-27 15:26
    Nagesh:

    try {
    InternetAddress emailAddr = new InternetAddress(aEmailAddress);
    }
    catch (AddressException ex){
    result = false;
    }



    Here's a simple function that will work for everyone.

    Becuase everyone uses java & JavaMail.
    Any other language/platform/API it won't work very well.
  • Meep 2011-04-27 15:34
    Paul:
    The easiest, and most correct way to validate an email address is to send an email to it.


    Oh, great, so you're going to get an email address like:


    little bobby mails
    DATA
    .
    MAIL From: sirspamsalot@auto
    RCPT To: yourmother@ho.com
    DATA
    ...
  • Machtyn 2011-04-27 15:37
    Correct me if I'm wrong, but it appears that the first validator is trying to block the use of special characters. However, if that character is the first character in the address, it will be accepted. (i.e. *myname@domain.com will pass). Funny the difference between x<0 to x<=0 and x>0 to x>=0.
  • socknet 2011-04-27 15:39
    EvanED:

    99.9% of the time, the simple validation will suffice. Whether or not you care about that missing 0.1%, and how you do so, would be completely dependent on context.


    See, I disagree. I think it'd be much closer to 99.9% the other way. How many times have you entered someone else's email address in a form somewhere vs your own? And almost all the time you enter your own, I think it makes sense to send a confirmation email.


    You misunderstood my statement here. I am saying that 99.9% of the time, a very basic validation check will be sufficient to ensure an email address is valid and that it is only 0.1% of the time that you have oddball cases (such as '+' characters in the address).

    I was in no way referring to the likelihood that a person is entering their own email address vs the email address of another user.
  • socknet 2011-04-27 15:45
    Machtyn:
    Correct me if I'm wrong, but it appears that the first validator is trying to block the use of special characters. However, if that character is the first character in the address, it will be accepted. (i.e. *myname@domain.com will pass). Funny the difference between x<0 to x<=0 and x>0 to x>=0.



    looks like "@...@@@@" would be valid too..
  • pfft 2011-04-27 15:56
    socknet:
    EvanED:

    99.9% of the time, the simple validation will suffice. Whether or not you care about that missing 0.1%, and how you do so, would be completely dependent on context.


    See, I disagree. I think it'd be much closer to 99.9% the other way. How many times have you entered someone else's email address in a form somewhere vs your own? And almost all the time you enter your own, I think it makes sense to send a confirmation email.


    You misunderstood my statement here. I am saying that 99.9% of the time, a very basic validation check will be sufficient to ensure an email address is valid and that it is only 0.1% of the time that you have oddball cases (such as '+' characters in the address).

    I was in no way referring to the likelihood that a person is entering their own email address vs the email address of another user.

    So what you're saying is that your validation code doesn't validate anything and possibly excludes valid email addresses?
  • Shishire 2011-04-27 16:04
    http://code.google.com/p/isemail/source/browse/trunk/is_email.php?r=6

    That is a link to what is quite possibly the only truly correct email validator out there. It takes into account rfc3696, rfc2822, rfc5322, rfc5321, rfc4291, and rfc1123, including errata. Rather than try to regex the whole thing (which is provably impossible), it separates everything into pieces and validates components. It also contains a number of flags to allow you block emails that are probably incorrect, or decide to allow everything valid, if nonsensical. For example,
    x@x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x23456789.x234
    is a valid email address, but anyone who purports to have such an address is either lying or on a subnet. Also, you can have an email such as pope@va is a valid (and at some point in the past, used) address.
  • socknet 2011-04-27 16:05
    pfft:
    socknet:
    EvanED:

    99.9% of the time, the simple validation will suffice. Whether or not you care about that missing 0.1%, and how you do so, would be completely dependent on context.


    See, I disagree. I think it'd be much closer to 99.9% the other way. How many times have you entered someone else's email address in a form somewhere vs your own? And almost all the time you enter your own, I think it makes sense to send a confirmation email.


    You misunderstood my statement here. I am saying that 99.9% of the time, a very basic validation check will be sufficient to ensure an email address is valid and that it is only 0.1% of the time that you have oddball cases (such as '+' characters in the address).

    I was in no way referring to the likelihood that a person is entering their own email address vs the email address of another user.

    So what you're saying is that your validation code doesn't validate anything and possibly excludes valid email addresses?


    No, most people could see that's not what I'm saying - not sure what logic you are using to think that. Please elaborate on how you reach that conclusion from my statement.
  • dnm 2011-04-27 16:06
    Why is this dumb?
  • dnm 2011-04-27 16:07
    wow:
    Paul:
    The easiest, and most correct way to validate an email address is to send an email to it.


    Wow. Just, well, wow. This is possibly the most amazingly dumb thing I have seen posted in a very long time, for more reasons than I can count.


    Why is this dumb?
  • pfft 2011-04-27 16:08
    socknet:
    pfft:
    socknet:
    EvanED:

    99.9% of the time, the simple validation will suffice. Whether or not you care about that missing 0.1%, and how you do so, would be completely dependent on context.


    See, I disagree. I think it'd be much closer to 99.9% the other way. How many times have you entered someone else's email address in a form somewhere vs your own? And almost all the time you enter your own, I think it makes sense to send a confirmation email.


    You misunderstood my statement here. I am saying that 99.9% of the time, a very basic validation check will be sufficient to ensure an email address is valid and that it is only 0.1% of the time that you have oddball cases (such as '+' characters in the address).

    I was in no way referring to the likelihood that a person is entering their own email address vs the email address of another user.

    So what you're saying is that your validation code doesn't validate anything and possibly excludes valid email addresses?


    No, most people could see that's not what I'm saying - not sure what logic you are using to think that. Please elaborate on how you reach that conclusion from my statement.


    Can you please cite this "most people" study?
  • csharptest.net 2011-04-27 16:13
    There are a lot of different ways to validate an email address input field. The easiest – and mostly correct – method is to use a regular expression.


    That couldn't be farther from the truth. Regex is NOT a valid way to verify an email address. You might as well use this regex:

    .+@.+

    That's as close as you can come. Anything else is just plain wrong. You might as well do an if index of '@' and be done with it. What? You don't believe me?

    http://tools.ietf.org/html/rfc2822#section-3.4.1
    3.4.1. Addr-spec specification

    An addr-spec is a specific Internet identifier that contains a locally interpreted string followed by the at-sign character ("@", ASCII value 64) followed by an Internet domain.


    The definition of "locally interpreted" is: you can't interpret it.

    Despite that opening remark I am LMAO off at the attempts to validate there.

  • Christopher 2011-04-27 16:14
    Dice.com is the worst offender of email address validation. They don't just validate your address; they strip out any "invalid" characters without telling you that they did anything to it! For example, I used the '+' sign in my address ("myaddress+dice@gmail.com"), and they silently changed it to "myaddressdice@gmail.com". (I happened to go back into the user settings to change something else and noticed my stripped email address).

    It's doubleplus ungood that you have to log into the site using your email address, so if you ever change your address to something that they don't like for some arbitrary reason, make sure that it's correct in their system before you log out!
  • sholdowa 2011-04-27 16:14
    dnm:
    wow:
    Paul:
    The easiest, and most correct way to validate an email address is to send an email to it.


    Wow. Just, well, wow. This is possibly the most amazingly dumb thing I have seen posted in a very long time, for more reasons than I can count.


    Why is this dumb?
    Because you've not taken into account any anti spam methods that may be in place. What you can do is to confirm that the domain exists and accepts mail. However, if the server implements a greet pause as a part of it's policy, it's going to be a very slow process. Alternatively, a DNS lookup of the MX record for that domain ( whilst technically incorrect ) may also suffice.
  • Paul 2011-04-27 16:15
    trtrwtf:

    So the mail processing software manages to parse an email address, and always does it correctly, but no other software is capable of this task?
    Hm. I smell magic.


    1) Not "No Other Software", "A Regular Expression" (i.e. the thing being touted as the right way to validate emails). A regular expression complex enough to do the job will be a pig to get right and a nightmare to debug. As I mentioned in an earlier post, I've never seen one in use that works.

    2) Not Magic. Coordination.

    If both your validator and your mail processing software are both perfectly compliant, then there is no problem.

    If, on the other hand, your mail processing software has foibles, then you should also ensure that your validator has exactly the same foibles, else you run the risk that your validator will pass a compliant address that your mail processing software cannot handle. That strikes me as rather difficult, particularly if your mail processor is a black box to you. That said, you may be content to suffer that if your validator is perfectly RFC-compliant.

    I do concede that if you are validating addresses from a list, rather than immediate user input, an email shouldn't actually be sent out to the recipient's mailbox. However, the DRY way to validate is still to use the same tool that will actually be sending the mail (but offline).
  • socknet 2011-04-27 16:15
    pfft:
    socknet:
    pfft:
    socknet:
    EvanED:

    99.9% of the time, the simple validation will suffice. Whether or not you care about that missing 0.1%, and how you do so, would be completely dependent on context.


    See, I disagree. I think it'd be much closer to 99.9% the other way. How many times have you entered someone else's email address in a form somewhere vs your own? And almost all the time you enter your own, I think it makes sense to send a confirmation email.


    You misunderstood my statement here. I am saying that 99.9% of the time, a very basic validation check will be sufficient to ensure an email address is valid and that it is only 0.1% of the time that you have oddball cases (such as '+' characters in the address).

    I was in no way referring to the likelihood that a person is entering their own email address vs the email address of another user.

    So what you're saying is that your validation code doesn't validate anything and possibly excludes valid email addresses?


    No, most people could see that's not what I'm saying - not sure what logic you are using to think that. Please elaborate on how you reach that conclusion from my statement.


    Can you please cite this "most people" study?


    so you are saying your not a person and neither are your parents?

    </pfft logic>
  • pfft 2011-04-27 16:36
    socknet:
    pfft:
    socknet:
    pfft:
    socknet:
    EvanED:

    99.9% of the time, the simple validation will suffice. Whether or not you care about that missing 0.1%, and how you do so, would be completely dependent on context.


    See, I disagree. I think it'd be much closer to 99.9% the other way. How many times have you entered someone else's email address in a form somewhere vs your own? And almost all the time you enter your own, I think it makes sense to send a confirmation email.


    You misunderstood my statement here. I am saying that 99.9% of the time, a very basic validation check will be sufficient to ensure an email address is valid and that it is only 0.1% of the time that you have oddball cases (such as '+' characters in the address).

    I was in no way referring to the likelihood that a person is entering their own email address vs the email address of another user.

    So what you're saying is that your validation code doesn't validate anything and possibly excludes valid email addresses?


    No, most people could see that's not what I'm saying - not sure what logic you are using to think that. Please elaborate on how you reach that conclusion from my statement.


    Can you please cite this "most people" study?


    so you are saying your not a person and neither are your parents?

    </pfft logic>


    I'm a person and I understand me. Therefore, most people can understand me.

    </socknet_logic>
  • Nagesh 2011-04-27 16:37
    In addition to sending mail, the true way of verification should also include one phone call made to person who has entered mail address. This is so the person does not use spam filter to block mail sent..

  • ICANNOT 2011-04-27 16:56
    Well, TLD obviously means "Three Letter Domain".
  • socknet 2011-04-27 16:58
    pfft:
    socknet:
    pfft:
    socknet:
    pfft:
    socknet:
    EvanED:

    99.9% of the time, the simple validation will suffice. Whether or not you care about that missing 0.1%, and how you do so, would be completely dependent on context.


    See, I disagree. I think it'd be much closer to 99.9% the other way. How many times have you entered someone else's email address in a form somewhere vs your own? And almost all the time you enter your own, I think it makes sense to send a confirmation email.


    You misunderstood my statement here. I am saying that 99.9% of the time, a very basic validation check will be sufficient to ensure an email address is valid and that it is only 0.1% of the time that you have oddball cases (such as '+' characters in the address).

    I was in no way referring to the likelihood that a person is entering their own email address vs the email address of another user.

    So what you're saying is that your validation code doesn't validate anything and possibly excludes valid email addresses?


    No, most people could see that's not what I'm saying - not sure what logic you are using to think that. Please elaborate on how you reach that conclusion from my statement.


    Can you please cite this "most people" study?


    so you are saying your not a person and neither are your parents?

    </pfft logic>


    I'm a person and I understand me. Therefore, most people can understand me.

    </socknet_logic>


    So you are saying that cats can understand you?

    </pfft logic>

    (I am still waiting for you to explain the logical steps which you took to take the statement "a simple validation will work most of the time" and conclude "no validation will be performed, but some valid entries will be rejected")
  • Ã 2011-04-27 16:58
    A string without a "@" is an invalid email address
    A string without a "." is an invalid email address
    Partyin' Partyin'
    Partyin' Partyin'
    Fun Fun Fun FUN
    Tomorrow is Thursday
    And Friday comes after... wards
    FRIDAY FRIDAY
  • Herby 2011-04-27 17:01
    The REAL reason for "validating" email addresses is to harvest them for SPAM. I own a domain, and I get several SPAM emails to "addresses" that are in headers. Usually these are dumb timestamps, but some email harvester thinks that they are valid email addresses.

    As for validating an address, give up! If you are on a local machine with a couple of users, the domain name is implied, so you don't even need an '@', so IN THEORY you could on an email service (gmail) just send a message to "friend" and the implied address would be "friend@gmail.com". Fortunately most email places do check for the '@', but if they are smart, that is about all they do. In fact browser plug-ins highlight what they think are email addresses (simple test!) so you can click on them (like Firefox just did).

    Me, I frequently use a dash ('-') to make unique email addresses when requested. That way I can see who is spreading around email addresses (answer: quite a few people).

    Face it: there is no simple answer!
  • Shishire 2011-04-27 17:17
    Herby:
    The REAL reason for "validating" email addresses is to harvest them for SPAM. I own a domain, and I get several SPAM emails to "addresses" that are in headers. Usually these are dumb timestamps, but some email harvester thinks that they are valid email addresses.

    As for validating an address, give up! If you are on a local machine with a couple of users, the domain name is implied, so you don't even need an '@', so IN THEORY you could on an email service (gmail) just send a message to "friend" and the implied address would be "friend@gmail.com". Fortunately most email places do check for the '@', but if they are smart, that is about all they do. In fact browser plug-ins highlight what they think are email addresses (simple test!) so you can click on them (like Firefox just did).

    Me, I frequently use a dash ('-') to make unique email addresses when requested. That way I can see who is spreading around email addresses (answer: quite a few people).

    Face it: there is no simple answer!


    Incorrect. According to RFC3696, section 3, an email address must contain both a local part and a remote part, separated by an "@" symbol, even if the mail is going to a mailbox on the same system. Many mail programs violate this, and allow address without a remote part.
  • operagost 2011-04-27 17:24
    Dan:
    Still better than just guessing the rules.

    The most annoying thing is unsubscribe pages that use different validation rules than the form that got you on the list - they tell you you can't unsubscribe because you are not giving them a valid email address, for an address they are sending daily emails to.

    That gets them submitted to black lists, here.
  • Clayton Hughes 2011-04-27 17:26
    No no no.

    The easiest and most correct way to validate an e-mail address field is to try sending it an e-mail.

    E-mails do not need @ symbols, they do not need a . in the domain, they do not need to contain only letters or numbers.

  • Nagesh 2011-04-27 17:27
    Ã:
    A string without a "@" is an invalid email address
    A string without a "." is an invalid email address
    Partyin' Partyin'
    Partyin' Partyin'
    Fun Fun Fun FUN
    Tomorrow is Thursday
    And Friday comes after... wards
    FRIDAY FRIDAY


    Total false, you corrupted memory chip!
  • Nagesh2.0 2011-04-27 17:31
    You would also need a confirmation of your own existence from a third party. A verification by your mother or father would suffice.
  • Nagesh 2011-04-27 17:34
    Nagesh2.0:
    You would also need a confirmation of your own existence from a third party. A verification by your mother or father would suffice.


    Corrupted memory chip has come back to copy my style!!!
  • ÃÆâ€℠2011-04-27 17:37
    Nagesh2.0:
    You would also need a confirmation of your own existence from a third party. A verification by your mother or father would suffice.

    Nope, not even a birth certificate would suffice.
  • ÃÆâ€â†2011-04-27 17:38
    Nagesh:


    Total false, you corrupted memory chip!


    Corrupted memory chip? It was probably manufactured by one of your cousins. Tech support from one of your other cousins didn't help.
  • Nagesh 2011-04-27 17:40
    ÃÆâ€ââ€:
    Nagesh:


    Total false, you corrupted memory chip!


    Corrupted memory chip? It was probably manufactured by one of your cousins. Tech support from one of your other cousins didn't help.


    Are you saying we are related, 8086 procesor?
  • Bert 2011-04-27 17:41
    article:

    The easiest – and mostly correct – method is to use a regular expression


    But the second dude did use a regex
  • PB 2011-04-27 17:48
    Paul:
    The easiest, and most correct way to validate an email address is to send an email to it.

    The most incorrect way is to use some trivial little regex written by someone who hasn't even heard of RFC822, and just intuits what they think an email address might be.

    I have never seen a regex in the wild, that correctly validates email addresses.

    I would much rather read through the version of in the post, isValidEmailAddress to debug it, than a regular expression complex enough to be comprehensive.


    I seem to recall seeing a site that compared several different (user submitted, I believe) regular expressions that claimed to validate email.
    For each one they managed to find addresses that failed.

    I've never understood the obsession with email validation myself. It's easy enough to make up something like bobfred756@gmail.com - don't know whether it's valid or not, and neither will a validator...

    Validating email addresses is similar to validating dates - only the brave attempt it, and more often than not it is not as critical as people think. Having had to work with all sorts of name matching and identity resolution functionality, I've found that entered data - correct or not - is most useful in its rawest form.

    Why force users to make data appear more realistic? It only makes the job of verifying whether it's legitimate a lot more difficult...
  • K 2011-04-27 17:50
    What bothers me the most is the sites that think they need to verify that the email address is not too long. And who don't know how long an address is allowed to be.

    The spec is a bit inconsistent, but it is pretty clear that a 256 character long address can be valid. Loads of systems won't even permit 100 characters in the email address.

    As for the standard itself, I don't get how somebody after specifying that the part before the @ could be 64 characters, and the part after the @ could be 255, then go and specify that the entire concatenated string must be at most 256 characters.

    If you absolutely have to put a limit on how long email addresses your system will support, you should make the limit 320 characters, as anything above that will be a clear violation of the spec.
  • PB 2011-04-27 17:57
    Paul:
    Try and enumerate a few, please.

    If you are actually writing the actual email-sending part of an actual email-sending application, then I agree, you can't send an email to validate the address.

    If you are doing anything else, then the only reliable (and DRY) way to validate an email address is to try and use it as an email address.

    Why reinvent the wheel by doing some other stupid "validation" beforehand? You make the wheel less and less round every time you reinvent it. What if (as is normally the case) your validation incorrectly discards good addresses that your server can cope with? What if your validation is correct, but your email server is wrong? What if your validation deliberately matches your email server's faulty validation, but someone else fixes the server, leaving you with a pointlessly faulty validation?

    Even if you do get your syntactic validation correct, (which appears highly unlikely, given the prevalence of incorrect validation in the world); all you know is that it looks like an email address, you have no idea whether it is an email address (i.e. one that resolves to a mailbox)

    Agree. Why validate at all? Simply send an email with further instructions. What? The email didn't reach you? Bad luck, there matey....
    Of course, it simply means people create a new address for themselves, but many people would be too lazy to create a new email account each time they want to submit a form (and lots seem paranoid that somehow an actual email account {even with bodgey detail} will be more traceable than their {possibly spoofed} IP address).

    Not even sure why email is needed in a lot of situations. We seem obsessed with arbitrarily collecting any information users are willing to give - often without thinking about why we collect it. Why, for example, does a forum want name, address, birth date, telephone number etc - They might claim that it is to ensure people behave online, but how thoroughly do they actually check that this 'very critical' data is actually even remotely correct? Other than validate the email address (and sometimes phone number) looks vaguely valid, not a lot.

    <off topic>
    For the overly paranoid reader, it's all because your personal information is worth a lot of money!!
    </off topic>
  • Johnno the Greek 2011-04-27 18:03
    frits:
    dude:
    frits:
    For those who don't already know, here's one way to verify whether an email address exists: <snip>.
    All this nonsense about input validation and sending actual messages isn't much better than the code in the article.


    The only difference between sending an email and what was done in the article is actually sending data to be delivered. It's a lot more work than just sending an email, and you have no way of verifying that the user actually got the email because you didn't send one...


    How are you verifying again?


    Why, click on the link in the email of course...Kinda got to have received the email to click the link, right?
  • Gunslinger 2011-04-27 18:03
    N. Tufnel:
    Paul:
    The easiest, and most correct way to validate an email address is to send an email to it.


    In what sense does that validate anything, if you don't have access to that address's mailbox?


    If your mail server sends it, and it doesn't bounce, then it's valid enough to accept. This isn't rocket science.
  • Sing Tao Wong 2011-04-27 18:05
    David Martensson:
    This one is actually better than most I have seen.

    I have actually tried to build a test that was 100 % rfc compliant and the result was horrifyingly large, and we still found existing, working emails that was denied ;)

    We finally went with a commercial component that so far have correctly handled every address we feed it.

    For any lesser ambition, just check the very simplest format and if you need more, use a verification where you actually send an email to the address and require them to click a link.

    With the new Asian letters coming I would not even try to build my own again =)


    We haves new letters coming? That be complicate our language some.
  • Gerald 2011-04-27 18:11
    socknet:
    Validate This:

    So how are you validating that they're entering the correct name and phone number? Or are you assuming that any (insert country specific format) sequence of digits is _the_ correct phone number for that person?


    Not really relevant to this discussion on email validation, but perhaps there are common methods which people use for names and phone numbers, feel free to google it if you are interested.

    Validate This:

    In each of those cases the difference between entering a validly formatted but incorrect e-mail address and an invalidly formatted e-mail address is zilch. You still have garbage data either way.


    Correct, but doesn't really add anything to the conversation. As mentioned, most of the time you don't have to prove with absolute certainty that an email address is correct, being 'reasonably sure' is usually close enough (asking people to input an email address twice seems to be common nowdays and probably catches a lot of entry mistakes). When you do need to be 100% sure on the email address, that is when it is a good time to do things such as send validation emails which require a response.


    Actually I always find the 'enter it twice' amusing - usually you can copy paste the first one.
  • ASDG 2011-04-27 18:24
    socknet:
    jonnyq:

    One would hope that the person providing the email address would have access to it for confirmation...

    I mean seriously... two people this dumb?

    In my opinion, if you're just trying to prevent typing mistakes: anything, followed by an @, followed by anything, followed by a ., followed by anything. That's all I ever use. If you're trying to validate that an email address is real - send a confirmation email. Anything in between is prone to mistakes and probably isn't helping anyone.


    Please provide contact details for your legal representation:

    Name: _____
    Phone: _____
    Email: ______


    Please provide contact details for your IT Support team:

    Phone: _____
    Email: ______


    Please provide your preferred email address to be created"

    Email: ______


    etc.

    There are MANY cases where you might ask someone to provide an email address which they may not have access to.

    Sending real emails to these addresses is pretty silly. 99.9% of the time, the simple validation will suffice. Whether or not you care about that missing 0.1%, and how you do so, would be completely dependent on context.


    And what does the user do if they haven't a valid email address for the legal representative, or for their IT support?

    Only the most basic validation is required on email addresses (that they contain the '@' and it's not first or last should be sufficient).
    Knowing whether an email address exists is difficult. Knowing whether it is legitimate is near impossible. Why go to any significant effort to validate an address that may not actually work anyway?

    I frequently use things like not@required.com or noneofyourbusiness@gmail.com when I don't think I should give an address. It seems to pass validation tests, but what use is it to the people who wanted it? At least if I wrote "Bugger off and leave me alone" they could easily work out that they haven't a valid address - of course this requires manual intervention, but why not use validation to flag addresses for someone to investigate, rather than to tell the user "You tell dirty great lies!!". I agree (for the most part) with an earlier comment that allowing silly addresses to get through (and perhaps having them flagged for manual intervention) allows a far better assessment of whether an email address is valid or not. It should be noted that quite often it seems the address won't necessarily be used ever (eg the legal representation - the email would most likely only be used if the phone number doesn't work) - so why bother validating it?
  • SomeYoungGuy 2011-04-27 18:36
    trtrwtf:
    jonnyq:
    N. Tufnel:
    Paul:
    The easiest, and most correct way to validate an email address is to send an email to it.


    In what sense does that validate anything, if you don't have access to that address's mailbox?


    One would hope that the person providing the email address would have access to it for confirmation...

    I mean seriously... two people this dumb?

    In my opinion, if you're just trying to prevent typing mistakes: anything, followed by an @, followed by anything, followed by a ., followed by anything. That's all I ever use. If you're trying to validate that an email address is real - send a confirmation email. Anything in between is prone to mistakes and probably isn't helping anyone.


    I think there's some confusion about the meaning of "validate" here. Some people seem to think it means "make sure this email address is really the one that belongs to this person", others believe (correctly) that it means "make sure that this email address is a well-formed address".

    The former case might be a common scenario, but when you "validate" something you're checking to see if it's valid - not whether it's correct.


    Question: Which way at the light
    Correct Answer: Left
    Valid, but incorrect answer: Right
    Invalid Answer: It's just a flesh wound
  • Eric TF Bat 2011-04-27 18:52
    wow:
    Paul:
    The easiest, and most correct way to validate an email address is to send an email to it.


    Wow. Just, well, wow. This is possibly the most amazingly dumb thing I have seen posted in a very long time, for more reasons than I can count.


    No - it's not wrong, or dumb, but you most certainly are. There is no way to validate an email address with a regex. None. Go read some actual literature on the subject and come back here when you're feeling humble and embarrassed. The easiest, and most correct way to validate an email address is to send an email to it. That is undeniably, eternally true, and you do damage every time you deny it.
  • Matt Westwood 2011-04-27 19:05
    Gerald:
    socknet:
    Validate This:

    So how are you validating that they're entering the correct name and phone number? Or are you assuming that any (insert country specific format) sequence of digits is _the_ correct phone number for that person?


    Not really relevant to this discussion on email validation, but perhaps there are common methods which people use for names and phone numbers, feel free to google it if you are interested.

    Validate This:

    In each of those cases the difference between entering a validly formatted but incorrect e-mail address and an invalidly formatted e-mail address is zilch. You still have garbage data either way.


    Correct, but doesn't really add anything to the conversation. As mentioned, most of the time you don't have to prove with absolute certainty that an email address is correct, being 'reasonably sure' is usually close enough (asking people to input an email address twice seems to be common nowdays and probably catches a lot of entry mistakes). When you do need to be 100% sure on the email address, that is when it is a good time to do things such as send validation emails which require a response.


    Actually I always find the 'enter it twice' amusing - usually you can copy paste the first one.


    ... except for those irritating websites where they don't allow cut and paste from box a to box b, and you have to type the damn thing in twice.

    Whether the email is valid in format or not is utterly immaterial. The only thing that matters at the end of the day is whether or not it *works*. Anything in between is spoo.

    Validate the email address by sending an email saying "Please press this link if you are so-and-so, and yadayada ..." which is what most sites do. If you've typed in the email address of a pirate who is about to rape your account from here to buggery then sorry but Darwin's laws apply: you've got to learn to be more careful about what you type.

    If you are entering email addresses for functions you do not have direct access to (e.g. as above: "enter email address for your tech support department") then the same ought still to apply: you just send the email as "You have been identified as the tech support team for ..." blahblah.

    Am I alone in thinking this is all being made too complicated? Don't overthink.
  • Jaime 2011-04-27 19:21
    Eric TF Bat:
    wow:
    Paul:
    The easiest, and most correct way to validate an email address is to send an email to it.


    Wow. Just, well, wow. This is possibly the most amazingly dumb thing I have seen posted in a very long time, for more reasons than I can count.


    No - it's not wrong, or dumb, but you most certainly are. There is no way to validate an email address with a regex. None. Go read some actual literature on the subject and come back here when you're feeling humble and embarrassed. The easiest, and most correct way to validate an email address is to send an email to it. That is undeniably, eternally true, and you do damage every time you deny it.
    Look into mailinator. It's trivial to respond to a confirmation email without giving a single iota of useful information. If you're away from your own computer it's also easier than giving your real address. Anybody who thinks they can validate* an email address is fooling themselves. Mailinator will accept email for any address from any one, so it will pass any SMTP checks. The only undeniable truth is that email addresses can never be fully validated*, ever.

    * Assuming that by validate we mean to confirm that the address can be used in the future to contact this individual
  • SeanC 2011-04-27 19:28
    jnewton:
    RobY:
    Not sure about before the @, but most school systems in the area I work all follow the common pattern of name@xxx.k12.in.us .


    <firstName>.<lastName>@<domain>.com is pretty popular as well


    and, for gmail users, email+tag@gmail.com is gaining momentum
  • Darth Paul 2011-04-27 19:34

    ...(asking people to input an email address twice seems to be common nowdays and probably catches a lot of entry mistakes).


    This is a useless convention as well - it is such a pain in the ass that people simply cut and paste the email address into the duplicate field. Note that we don't do this for other data - why not type in everything twice to be sure?
  • M 2011-04-27 19:42
    By my understanding, the 100 character limit isn't validation per se, but defense against SQL injection (and other similar) attacks. As the overwhelming majority of email addresses (don't have a study for this, but c'mon) are under 100 characters, the odds that someone inputting something 500 characters long is doing something nefarious are fairly good. There was a study performed at MIT a few years back that you can effectively prevent virtually all injection attacks with a 100 character limit -- as its hard to do much that's interesting in 100 characters.
  • Darth Paul 2011-04-27 19:49
    You misunderstood my statement here. I am saying that 99.9% of the time, a very basic validation check will be sufficient to ensure an email address is valid and that it is only 0.1% of the time that you have oddball cases (such as '+' characters in the address).


    Excluding a small percentage of valid data from being entered for no other reason than to have a dubious validation process that serves little or no purpose is total FAIL.

    Reminds me of the story of the drivers licensing system that could not issue licenses to people whose surname was less than 3 characters.
  • Rob 2011-04-27 20:32
    Eric TF Bat:

    No - it's not wrong, or dumb, but you most certainly are. There is no way to validate an email address with a regex. None. Go read some actual literature on the subject and come back here when you're feeling humble and embarrassed. The easiest, and most correct way to validate an email address is to send an email to it. That is undeniably, eternally true, and you do damage every time you deny it.


    I'm sure the school will appreciate it when it tries to deal with Little Bobby Relay and gets banned for porn spam.
  • anon+ymous@gmail.com 2011-04-27 21:17
    Honestly, anyone writing validation code that disallows subaddressing in the year 2011 should probably be taken out back and shot.

    At minimum, their code should be nuked, possibly from orbit.
  • Sysadmin 2011-04-27 21:35
    Testing to see if there is an MX value for the domain is a great way to determine if an email address is valid assuming a proper caching nameserver setup and some application side caching to boot.
  • Sysadmin 2011-04-27 21:36
    Sysadmin:
    Testing to see if there is an MX value for the domain is a great way to determine if an email address is valid assuming a proper caching nameserver setup and some application side caching to boot.

    That is, if the DOMAIN for an email address is valid. Now that finger doesn't work anymore there's next to no chance to test the user aside from sending it as one other said.
  • penman 2011-04-27 22:34
    I'll take NullPointerException for 100, Alex.
  • SQLDave 2011-04-27 23:06
    Question for those on this forum who are smarter in the ways of The Web (tm), which is probably 97% of you as I'm just a lowly DBA.

    Like many here, I use the address+tag@gmail.com format. Whenever I encounter a site which tells me that's an invalid email address, I take the time to send a note to "webmaster" or "contact us" or whatever. Usually I get no reply. Recently, however, I got this reply back from a site that I had an otherwise good experience with:

    "Unfortuanalty [sic] hackers use the plus sign in code to hack websites so we strip the + sign out and throw it as an error. I cant change that."

    My question for you web/email experts is, is he blowing smoke up my skirt,or is there something to what he said. (My guess: total smoke).

    Thanks!
  • EZMoney 2011-04-28 00:26
    This works for credit card numbers too. Why validate it when you can just run it?
  • lolwtf 2011-04-28 01:17
    Jaime:
    If you need to validate that the email address belongs to the user -- Send a one-off email that the user must respond to.

    If you are trying to help a user enter data on a form -- do any half-assed validation and throw up a kindly worded warning if it doesn't match. Allow the user to continue on validation failure.
    This. Exactly this.
  • Bloomer 2011-04-28 01:21
    EZMoney:
    This works for credit card numbers too. Why validate it when you can just run it?


    Yah, Credit cards is different - there rules about what is and isn't valid is simpler - and we are a little more concerned about accuracy.

    While I think of it, where does CC validation occur? On the website, or at some third-party site? (I;'m guessing bit of both, but point is there would be little to stop you from doing any more than basic sanity check that we only have numbers - the validity can always be tested by bank)
  • Noelymous Coward 2011-04-28 02:40
    HellKarnassus:
    wow:
    Paul:
    The easiest, and most correct way to validate an email address is to send an email to it.


    Wow. Just, well, wow. This is possibly the most amazingly dumb thing I have seen posted in a very long time, for more reasons than I can count.

    Why? I think it is a useful idea, you ask the user to input an e-mail address and to confirm it, then you send an activation/validation mail to finish whatever process the app must do. It is less annoying than a script or method that tells you the address is incorrect even though it does exist.


    Regex parsing to validate an email address *does* have a place in the web, but it is an advisory place only. A little piece of javascript or what have you which can indicate that the website thinks there *might* be a problem with the email address the user has supplied (after all, who doesn't make typos?) but allow the user to carry on anyway, so that the email server, which is the *only* way to be sure, can actually use the address. The same is true with postal addresses. The number/post code abbreviation (or zip code for the internationally challenged) is a shorthand which works in most but not all cases, as is any email address regex.

    Now for a real life example: SWMBO recently applied for a permanent visa. The first clerk checked that all our papers were in order and (before we paid) told us that the application would probably fail because certain information was missing BUT ALLOWED US TO CARRY ON ANYWAY (this is the validation regex). When we did just that, the actual clerk accepted the application regardless of the missing data because he was closer to the actual decision making process (this is the email server itself).

    tl;dr: Let me use a fucking + symbol in my email address you useless PHP ridden so-called websites.
  • will 2011-04-28 02:52
    SQLDave:
    Question for those on this forum who are smarter in the ways of The Web (tm), which is probably 97% of you as I'm just a lowly DBA.

    Like many here, I use the address+tag@gmail.com format. Whenever I encounter a site which tells me that's an invalid email address, I take the time to send a note to "webmaster" or "contact us" or whatever. Usually I get no reply. Recently, however, I got this reply back from a site that I had an otherwise good experience with:

    "Unfortuanalty [sic] hackers use the plus sign in code to hack websites so we strip the + sign out and throw it as an error. I cant change that."

    My question for you web/email experts is, is he blowing smoke up my skirt,or is there something to what he said. (My guess: total smoke).

    Thanks!


    Using the + sign is common in SQL injection attacks. Main use is for tring to get into areas you are not suppose to such as a identification number. Send a 2+2 and if the web site did not do proper programming and use parameterized queries you are now bringing up ID number 4.

    It is smoke and mirrors because they are not solving the problem correctly and directing problem somewhere else.

    Or in other words don't trust that site with any important information.
  • Irritated user 2011-04-28 03:03
    Paul:
    I do concede that if you are validating addresses from a list, rather than immediate user input, an email shouldn't actually be sent out to the recipient's mailbox. However, the DRY way to validate is still to use the same tool that will actually be sending the mail (but offline).


    If you are typing emails from a list, the most correct you can possibly come is to verify that what is typed into the computer matches what is on the written list.

    What if an address as written, real or not, does not pass the validation technique du jour? Do you 'fix' it? Delete it?
  • Jibble 2011-04-28 04:20
    Jaime:

    If you are trying to help a user enter data on a form -- do any half-assed validation and throw up a kindly worded warning if it doesn't match. Allow the user to continue on validation failure.

    Notice in both cases that it is not necessary to have a robust validation procedure. I can't think of a reason, other than while writing mail router software, to strictly validate the format of an email address.


    This times a million.

    PS: If I screw when typing my email address it's likely to be a misspelled domain or something which your dumbass email validator won't catch anyway.

    The ONLY thing you should be doing with input data is checking it for SQL injection attacks.

    Also: Don't ask me to "retype your email" in two different input boxes. It gets copied and pasted so any error will just be repeated.
  • Paul 2011-04-28 04:23
    Jaime:

    * Assuming that by validate we mean to confirm that the address can be used in the future to contact this individual


    No, you cannot ever confirm that the address can be used in the future, because that requires prescience. If, for example, someone uses their work email address, that is tied to their tenure at that organisation. The best you can do is check that it reaches them now. However, that is not what we mean by validate.

    By Validate, we mean confirm that it is formatted according to the RFC. We could go one step further and say we are confirming that it is formatted in such a way as to allow the mail-sending application to parse enough information out of it to be able to send to it.

    By actually sending it, even to a service like mailinator, you also get to check whether the receiving server specified in the domain-part can accept mail destined for the target specified in the local-part. It is not relevant whether this is a private mailbox, or a persistent one or any other feature that, simply by convention, one normally associates with mailboxes.

  • minime 2011-04-28 04:23
    Well, the correct regexp to validate an email address is mentioned here: http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html so to really validate an email address, using regexps is stupid. better write a proper parser, since the format is much more complicated than anyone can imagine with all the special character stuff and inline-comments.

    Using regexps on various websites is the reason why professional email users all the time get their emails rejecetd, most do not even catch foo+bar@domain.com as a valid address.
  • Grey 2011-04-28 05:17
    Duh... Character string processing WTFs are immportal :-P
  • Oik 2011-04-28 06:18
    I test e-mail addresses empirically. Connect to MX server and ask it. When it says the address is ok then you can politely close the connection.

    I do run a simple regex to sanitise the input first.

    Admittedly this assumes the mail server is up and that you're not validating a huge pile of addresses.
  • Mike 2011-04-28 06:26
    The main problem with using the network to validate it is that email isn't instant - it can take a long time to find out whether the address is accepting mail or not.

    Personally I agree with the "why ask for an email address in the first place?" crowd - it's done out of habit these days. If you need it for something (e.g. password retrieval) then you also need to verify the person who entered the address actually wants to use your service and has access to that address, so sending an email with a confirmation link is required.

    But you can't actually run an SMTP session to do a very meaningful check in real-time, unless you just want to use your local SMTP server to tell you if it thinks the address is bogus or not.

    You also need to do some very basic level of parsing, since you can't just dump whatever the user entered straight to your database/mail server. But I wouldn't do any more than:

    - check there's no whitespace (including newlines)
    - check there's an @
    - check there's stuff before and after the @
  • Kiss me I'm Polish 2011-04-28 06:45
    Mike:
    The main problem with using the network to validate it is that email isn't instant - it can take a long time to find out whether the address is accepting mail or not.

    Personally I agree with the "why ask for an email address in the first place?" crowd - it's done out of habit these days. If you need it for something (e.g. password retrieval) then you also need to verify the person who entered the address actually wants to use your service and has access to that address, so sending an email with a confirmation link is required.

    But you can't actually run an SMTP session to do a very meaningful check in real-time, unless you just want to use your local SMTP server to tell you if it thinks the address is bogus or not.

    You also need to do some very basic level of parsing, since you can't just dump whatever the user entered straight to your database/mail server. But I wouldn't do any more than:

    - check there's no whitespace (including newlines)
    - check there's an @
    - check there's stuff before and after the @
    "Mike Clueless"@example.org is a valid email address.
    Oops.
  • Mike 2011-04-28 07:12
    Kiss me I'm Polish:
    "Mike Clueless"@example.org is a valid email address.
    Oops.
    You are of course technically correct (the best kind of correct), but nobody really uses email addresses with spaces in them, because very few systems support them, and even fewer email address "validators" will accept them. ;)

    RFC 5321 even says: a host that expects to receive mail SHOULD avoid defining mailboxes where the Local-part requires (or uses) the Quoted-string form

    I'm happy to reject the one guy who thinks it's fun to try to use such an address. If he actually wants to pass the validation, he'll just retry using the email address he uses for everything else that rejects that form.
  • Ashley Sheridan 2011-04-28 07:28
    The fact that anyone thinks this is something for a regex is the real WTF. Not one regex I've seen yet will check to see whether or not the email address is part of a valid TLD, which is pretty important I'd say!

    Sure, it's possible to do it with a regex if you really tried hard, but would you really want a 3000 character regex? Imagine trying to debug it.

    "this is still a valid email address @!~?"@someplace.com
  • Paul 2011-04-28 07:36
    Irritated user:

    If you are typing emails from a list, the most correct you can possibly come is to verify that what is typed into the computer matches what is on the written list.

    What if an address as written, real or not, does not pass the validation technique du jour? Do you 'fix' it? Delete it?


    I made no mention of typing things in from a written list, and nor did the post I was responding to. In fact, that would count as "immediate user input". I'm talking about iterating over addresses in a list that is already in the computer. Perhaps in an XML file or CSV, or maybe an array of addresses randomly generated in memory.

    However, since you ask - when things from a hard copy don't validate when turned into soft copy, the correct thing to do is normally to report it to a human, who can then choose the correct action by using their brains.
  • ParkinT 2011-04-28 08:27
    We need to return to the days of Compuserve, where ALL usernames were comprised of only numerals.
  • Anonymous 2011-04-28 08:42
    Don't forget the poor saps in the .museum domain that can never pass that 5 char TLD limit!
  • Rhialto 2011-04-28 08:46
    Shishire:
    http://code.google.com/p/isemail/source/browse/trunk/is_email.php?r=6

    That is a link to what is quite possibly the only truly correct email validator out there.

    It is not an email validator. At best, it would be an email ADDRESS validator. And I'm not so sure it would pass the previously given example "#\"@\"#"@[IPv6:::ffff:173.230.158.172], given the way it initially looks for a @.
  • eric76 2011-04-28 09:41
    I use + all the time on one account that gets enormous numbers of spam.

    I made a list of words to use after the plus and created a procmail filter to accept those combinations. For example, if the word list was maroon, acre, lightning, saturn, and piano, then the acceptable e-mail addresses for eric76@example.com would be eric76+acre@example.com, eric76+lightning@example.com, eric76+saturn@example.com, and eric76+piano@example.com. I then would keep a copy of the list with me and if someone needed an address, I would give them the next one on the list.

    Any e-mail coming in to those addresses was accepted, but if I started to get spammed at one combination, it would be a simple manner to reject any and all incoming e-mail to that address.

    For e-mail coming in without the +something to eric76@example.com, if it was encrypted with my PGP key, signed by the senders PGP key, from a specific whitelist of individual addresses, or originating from anyone on the local network, the e-mail was delivered okay.

    All other e-mail is dumped into a trash folder. Originally I automatically responded back with a message telling the sender what it would take for the e-mail to be delivered, but that never seemed to do any good.

    The number of spams on that address went from 50 or more a day to 0.
  • ted 2011-04-28 09:55
    Wow, checking the link in the story and looking at the RFC 2822 email validation regex, I've confirmed that regex is dumb as shit and Perl programmers are even dumber.
  • socknet 2011-04-28 10:01
    will:
    SQLDave:
    Question for those on this forum who are smarter in the ways of The Web (tm), which is probably 97% of you as I'm just a lowly DBA.

    Like many here, I use the address+tag@gmail.com format. Whenever I encounter a site which tells me that's an invalid email address, I take the time to send a note to "webmaster" or "contact us" or whatever. Usually I get no reply. Recently, however, I got this reply back from a site that I had an otherwise good experience with:

    "Unfortuanalty [sic] hackers use the plus sign in code to hack websites so we strip the + sign out and throw it as an error. I cant change that."

    My question for you web/email experts is, is he blowing smoke up my skirt,or is there something to what he said. (My guess: total smoke).

    Thanks!


    Using the + sign is common in SQL injection attacks. Main use is for tring to get into areas you are not suppose to such as a identification number. Send a 2+2 and if the web site did not do proper programming and use parameterized queries you are now bringing up ID number 4.

    It is smoke and mirrors because they are not solving the problem correctly and directing problem somewhere else.

    Or in other words don't trust that site with any important information.


    or.. (hypothetically speaking of course), people could do things like: select password + 4 from auth_table where user_id = 'admin'. Then you'd get an error saying something like: "Error: could not convert hunter2 to integer" or whatever.. the point being that you want to try and cause a type mismatch to get the error message printed to the screen.
  • ted 2011-04-28 10:03
    grzlbrmft:
    wow:
    Paul:
    The easiest, and most correct way to validate an email address is to send an email to it.


    Wow. Just, well, wow. This is possibly the most amazingly dumb thing I have seen posted in a very long time, for more reasons than I can count.


    In case you can count that far, name three, please.


    1. You can't confirm the email was received without access to the inbox.

    2. You get your sender's domain flagged on an RBL.

    3. Bandwidth waste.

    You must be a regex and Perl fan.
  • pjt33 2011-04-28 10:37
    Yazeran:
    Splognosticus:
    (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])


    So easy a child could do it.


    Fail (according to the perl module Mail::RFC822::Address: regexp-based address validation):

    (?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
    )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:
    \r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(
    ?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[
    \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0
    31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\
    ](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+
    (?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:
    (?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
    |(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)
    ?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\
    r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
    \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)
    ?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t]
    )*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[
    \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*
    )(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
    )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)
    *:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+
    |\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r
    \n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
    \r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t
    ]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031
    ]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](
    ?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?
    :(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?
    :\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?
    :(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?
    [ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\]
    \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|
    \\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>
    @,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"
    (?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t]
    )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
    ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?
    :[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[
    \]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-
    \031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(
    ?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;
    :\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([
    ^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\"
    .\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\
    ]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\
    [\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\
    r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\]
    \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]
    |\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0
    00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\
    .|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,
    ;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?
    :[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*
    (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
    \[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[
    ^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]
    ]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*(
    ?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
    ".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(
    ?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[
    \["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t
    ])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t
    ])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?
    :\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|
    \Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:
    [^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\
    ]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)
    ?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["
    ()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)
    ?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>
    @,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[
    \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,
    ;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t]
    )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
    ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?
    (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
    \[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:
    \r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[
    "()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])
    *))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])
    +|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\
    .(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
    |(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(
    ?:\r\n)?[ \t])*))*)?;\s*)

    And even that is only with comments removed......

    I've never understood why that Perl regex is so long, and I've implemented validation myself using regexes and following the RFCs. If you can provide a test case which the shorter regex fails and the longer one doesn't then I'll be interested to check whether my validator passes or not.
  • dtobias 2011-04-28 10:39
    Christopher:

    It's doubleplus ungood that you have to log into the site using your email address,


    Why does every site seem to want to do this these days (use the email address as the username)? Just over the last few months I've had several sites that had perfectly good login systems where I had chosen short, memorable usernames which then suffered a redesign in which they insisted on changing the login system to use my email address instead. I dislike that for a number of reasons, including that I store my myriad usernames/passwords in an iPhone app, and typing all-letter usernames like 'dtobias' is much simpler than typing e-mail addresses, which requires switching to the symbols touchscreen keyboard with the at sign in it.
  • dohpaz42 2011-04-28 10:48
    TRWTF are all of the commenters who are confusing the validity of an email address with the ability to send an email to said address. Those are two separate, and distinct, functions that compliment each other, but should be treated separately. In most real-world production systems it may be unfeasible from a business perspective to waste the customer's valuable time trying to validate whether or not email can be sent to an email address; PHP, for example, does provide getmxrr() to test a domain for valid MX records. The problem is, with enough load and traffic, this will block your website until the function returns. So it generally is acceptable to only validate the format of an email, and worry about bounces on whatever system that actually sends out emails (i.e., newsletters, etc). This is why a lot of sites have adopted the paradigm of forcing a user to validate their account via email.
  • Paul 2011-04-28 10:58
    ted:

    1. You can't confirm the email was received without access to the inbox.


    Not relevant, you can't do that with a regular expression either. Whether it is received is a different matter to whether it was sent.

    By sending the email you have proven that the address is parseable enough to be able to send email to it. If the sender fails to send, you know that it is not parseable enough.

    Surely this is better than proving that the address only has alphabetic characters with an "@" in the middle and a "." 3-4 characters from the right-hand edge. Such proof has very little to do with whether it is a valid email address or not.

    ted:

    2. You get your sender's domain flagged on an RBL.


    If you use this method to validate a big list, without alerting the recipients, then set up some mechanism whereby the emails don't actually reach the recipients. Perhaps configure DNS so that the mail server sends everything to itself.

    That will still be easier and more correct than writing a regular expression that accepts all valid email addresses and rejects all invalid ones (considering that even the enormous one that people have already linked to above requires you to preprocess the address before testing it with the regex).

    If you are actually in charge of your mail server, then it may even be easier than writing a proper parser for emails, I'm not sure, but it's certainly more DRY.

    ted:

    3. Bandwidth waste.


    A) That depends on your definition of waste. I'm sure that for most applications, the bandwidth required to do it this way is much cheaper (in money) than the developer time involved in creating an effective email address validator.

    B) If you actually only have a small, finite amount of bandwidth and you absolutely must do the validation without using any, then see my response to 2, above.
  • Kuba 2011-04-28 11:02
    Paul:
    The most incorrect way is to use some trivial little regex written by someone who hasn't even heard of RFC822, and just intuits what they think an email address might be.

    I have never seen a regex in the wild, that correctly validates email addresses.
    Means you didn't look two posts before yours. FAIL.
  • dtobias 2011-04-28 11:12
    ASDG:

    I frequently use things like not@required.com or noneofyourbusiness@gmail.com when I don't think I should give an address.


    You ought to use an RFC-compliant fake address like nobody@test.example or test@example.org, instead of things like you mentioned which might be somebody's actual address.
  • Validate This 2011-04-28 12:05
    dohpaz42:
    TRWTF are all of the commenters who are confusing the validity of an email address with the ability to send an email to said address. Those are two separate, and distinct, functions that compliment each other, but should be treated separately. In most real-world production systems it may be unfeasible from a business perspective to waste the customer's valuable time trying to validate whether or not email can be sent to an email address; PHP, for example, does provide getmxrr() to test a domain for valid MX records. The problem is, with enough load and traffic, this will block your website until the function returns. So it generally is acceptable to only validate the format of an email, and worry about bounces on whatever system that actually sends out emails (i.e., newsletters, etc). This is why a lot of sites have adopted the paradigm of forcing a user to validate their account via email.


    If you are collecting an e-mail address for the sake of having an e-mail address and do not intend on ever sending an e-mail to that address there's no reason to validate. It's a waste of resources. Hell, if you're never going to use it, why even ask for it?
  • CyVan 2011-04-28 12:29
    I used this code for a site I was managing:
    http://www.dominicsayers.com/isemail

    It reduced the number of trouble calls due to invalid email addresses significantly once it was implemented. What I especially liked was the fact it did a MX lookup on the domain to make sure it was valid as the final step. Catches all the hotmal.com, yhaoo.com etcs that would pass normal validation.

    Ofcourse they can still misspell the first part of their address since we can't validate that BUT then we have them enter the email address twice and error if they differ to try and mitigate that as well :)
  • Dave 2011-04-28 13:49
    As with e-mail validation as with other validation, they are meant to just help the user, not to slap him on the wrist when he makes a mistake.

    I think the most errors in users e-mails are simple typos like alfred@einstien.com instead of alfred@einstein.com and the mighty 100-line regex doesn't help against that.

    Also, if you need a valid e-mail address from a user, send a confirmation mail, this way you can be sure the user double checks what he enters.

    If you don't want bogus data in your database, don't ask for data which is going to be bogus most of the time. Like if you make 'hobbies' a required field, you can bet it's going to be something like "adsf" most of the time.
  • Nagesh 2011-04-28 15:55
    Just for recording

    Guy posting image saying "First Post" is now annoying me lots.
  • Shinji 2011-04-28 16:43
    I guarantee you that I've seen that CodeSOD on a couple of sites. I have domain elite-systems.org registered but when I came across that I ended up having to register domain elitesystems.org as well and setting up an alias.
  • Worf 2011-04-28 16:53
    ted:
    grzlbrmft:
    wow:
    Paul:
    The easiest, and most correct way to validate an email address is to send an email to it.


    Wow. Just, well, wow. This is possibly the most amazingly dumb thing I have seen posted in a very long time, for more reasons than I can count.


    In case you can count that far, name three, please.


    1. You can't confirm the email was received without access to the inbox.

    2. You get your sender's domain flagged on an RBL.

    3. Bandwidth waste.

    You must be a regex and Perl fan.


    There's something called "double opt-in" that's very useful if you're trying to maintain a mailing list.

    What it is is the user subscribes (website or email) to your mailing list. The server sends back a response which the user then simply hits reply and send. This validates the email between the server and client (if there's an RBL in progress, the user won't get the confirmation email and the server won't bother sending emails to a blackhole'd address). This also confirms that the user WANTS the email. Perhaps the user made a typo and the email is going to someone else's account. All they have to do is ... nothing. Or hit delete. And they won't be bothered again. Hence, double opt-in. The user opted in once, and confirmed that yes, they really really really do want the email.

    Yes, I have been "subscribed" to many email lists by those who don't check (probably spammers and the like). I write a procmail recipe to filter out those emails and delete them, because their unsubscribes don't work. Hell, I keep getting emails from British Telecom about some phone services. Never could figure out how to get access to the account so I could get some phone cards or upgrade the guy's bill or something.

    I also get the occasional joe-job with someone using a whitelist. I do whatever it takes to get that email accepted because they obviously don't know about backscatter spamming. (I've always wanted to use a public wifi and the like to send a pile of emails to those addresses with a fake header leading back to RBL honeypots to get those whitelist domains blocked...).
  • Kver 2011-04-28 23:13
    Please make the bad code go away.
  • Stefan 2011-04-29 02:31
    Even properly validators sometimes forbid '+', which pisses me off. Haven't they heard of plus addressing?
  • K 2011-04-29 02:48
    The ONLY thing you should be doing with input data is checking it for SQL injection attacks.
    No you shouldn't. As soon as you start checking input for any kind of injection attacks, you are going to end up with a terrible system.

    The way to code is to use proper escaping. (SQL can do this behind the scenes for you because the API has ways to pass statements and data as separate string arguments.)

    The only place where you need to check for injection attacks is in your unit tests. If your unit tests uses all the special characters that could be used to perform attacks, and verify that they are properly escaped and unescaped, then you are well protected against injection.

    If OTOH you try to just forbid "invalid" characters at a higher level before saving the email address in your database, you will be rejecting perfectly valid email addresses.

    It may come as a surprise to many people, but the local part of an email address permit many characters. In fact the original spec permitted every single 7-bit ascii character. Yes, all 128 of them, including the NUL character, the NUL character didn't even have to be escaped. Only CR, LF, ", and \ had to be escaped by putting a \ in front of it.

    The wording in the spec about valid characters was: "any one of the 128 ASCII characters (no exceptions)"

    A later update to the spec says that you should not define email addresses that require quoting. But you still have to support it for interoperability.
  • wow 2011-04-29 15:44
    This is what I was referring to, and it's very task-specific.

    If you want to confirm that the email address entered belongs to:

    1. a living person with
    2. some way to read mails sent to them and
    3. a mailbox which can accept mails (not full, etc) and
    4. isn't behind a spam filter which will eat your verification email and
    5. has some way to successfully confirm that they received the email (so can contact your server, or send a reply email, or whatever) and
    6. your outgoing mail server is working and
    7. you have some sort of extra persistent database to store the keys you're using to validate emails so you know which response is which and
    8. the time taken for mail to get from A to B and for the user to do their confirmation action is not important

    then sure, use the standard call-and-response stuff. If you just want to validate if "blah@blah.com" is okay fot possible future use, then all you have is syntax checking.

  • VxJasonxV 2011-04-29 17:05
    The real wtf is that a + is perfectly ok in e-mail addresses.
  • dtobias 2011-04-29 17:32
    If you want to prove that the address belongs to a living person, should you demand a birth certificate? Is the short form OK?
  • Bozo 2011-04-29 20:09
    ...and RFC 2822 was superseded in October 2008 by 5322.

    Cheers
  • Bozo 2011-04-29 20:21
    Bloomer:
    EZMoney:
    This works for credit card numbers too. Why validate it when you can just run it?


    Yah, Credit cards is different - there rules about what is and isn't valid is simpler - and we are a little more concerned about accuracy.

    While I think of it, where does CC validation occur? On the website, or at some third-party site? (I;'m guessing bit of both, but point is there would be little to stop you from doing any more than basic sanity check that we only have numbers - the validity can always be tested by bank)


    You could at least do a Luhn checksum. It works for most cards.
  • Marnen Laibow-Koser 2011-05-02 11:44
    wow:
    Paul:
    The easiest, and most correct way to validate an email address is to send an email to it.


    Wow. Just, well, wow. This is possibly the most amazingly dumb thing I have seen posted in a very long time, for more reasons than I can count.


    It may seem dumb -- and it's certainly counterintuitive -- but it's true. RFC822 provides for a very wide range of valid syntax. The only correct e-mail address validation regexes are on the order of a page in length. Please see http://www.linuxjournal.com/article/9585 for an overview of some of the issues involved.
  • Ryan 2011-05-03 18:11
    Tears literally welled up in my eyes.
  • Ben 2011-05-03 20:33
    Most SANE people would use THIS regex to validate emails:

    http://www.ex-parrot.com/pdw/Mail-RFC822-Address.html

    NOT SPAM AKISMET
  • Pilum 2011-05-07 11:40
    If you want to get technical about it, this regexp should catch them all: .+@.+ (in theory it would be possible to have a mail server at a TLD :-))
  • Nagendra 2013-08-26 07:15
    ^[A-Za-z0-9](([_\.\-]?[a-zA-Z0-9]+)*)@([A-Za-z0-9]+)(([\.\-]?[a-zA-Z0-9]+)*)\.([A-Za-z]{2,})$
  • Sravan 2013-08-26 07:16
    Thank you soo much its working