• Xeronos (unregistered)
    Comment held for moderation.
  • PJH (cs)
    it's a pet peeve of mine when people "validate" away perfectly valid addresses, for instance: websites that think all domains end in .com, .net, .edu, or .org; and agents that refuse to transfer mail with a + in the local-part. [...] And as I promised, here's my own RegExp for you to tear apart. (Yes, I know it doesn't handle a quoted local-part. No, I don't mind. Seriously, who does that?)
    Um - the people who might use + in the local part?
  • PJH (cs) in reply to Xeronos
    Xeronos:
    I can't tell you how many times I see my email address come int eh mail from a website that supposedly never sells email addresses to third parties.
    I can count on the hands of one arm this has happened to me - and judging by the subsequent 'spam' I got, it appears to have been an 'inside job' where someone leaving the company acquired the address. All my other unique addresses have only ever been used by the companies they've been attributed to.
  • TheJasper (cs)

    The format for e-mail addresses is specified in a number of RFCs; it's a pet peeve of mine when people "validate" away perfectly valid addresses, for instance: websites that think all domains end in .com, .net, .edu, or .org; and agents that refuse to transfer mail with a + in the local-part. To that end, I wrote my own regular expression that (I believe) follows the specification, which I'll share below.

    And as I promised, here's my own RegExp for you to tear apart. (Yes, I know it doesn't handle a quoted local-part. No, I don't mind. Seriously, who does that?)

    so, you don't like it that valid addresses are invalidated, so you present a regex that follows spec...except you then admit it doesn't do everything because...well who uses that anyway. Somewhere there is a person gnashing their teeth becuase their perfectly valid address isn't being validated by your code.

    btw, I don't no the exact spec, and haven't tried to figure out if your regexp follows it. I don't think my boss would appreciate the time spent ;}

  • MooseBrains (unregistered)
    Comment held for moderation.
  • Anonymous Tart (unregistered)

    use Mail::RFC822::Address qw(valid)

  • Steve (unregistered)
    Comment held for moderation.
  • leeg (unregistered)
    Comment held for moderation.
  • Dave (unregistered)

    The regexp for validating all compliant email addresses is to large to this in this margin.

    (Seriously, it's pretty big.)

  • mol (unregistered)

    Don't tell me that you think that the ugly regex is more readable than the javascript version. The purpose of email validation is just to check for common errors it has no sense to try to validate perfectly because it won't save you against valid nonexisting email (you just have to send the mail there and wait for the response).

  • MCA (unregistered)
    Comment held for moderation.
  • wibble (unregistered)
    Comment held for moderation.
  • Silex (cs)

    The email rfc is full of crazy ideas... Look at the RFC compliant regex to validate an email :

    http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html

  • John (unregistered)
    Comment held for moderation.
  • John (unregistered) in reply to John
    Comment held for moderation.
  • Bert (unregistered)

    The regexp doesn't work with domain names starting with numbers. I think that might comply with RFC822, but try telling 3com that.

  • bentronic (unregistered)
    Comment held for moderation.
  • John (unregistered) in reply to John
    Comment held for moderation.
  • GeneWitch (cs)

    regex looks like garbage. is there a framework somewhere that can validate an email address?

    Don't FTP servers have constructs that allow them to verify email addresses as being valid without physically checking them on the internet?

  • Jeroen (unregistered)

    my ISP doesn't support de local part!

  • morry (cs)

    The Regexes are utterly unreadable and therefore unmaintainable. I'd hate to have to fix one of those monsters.

    Recently I overheard a collegues' phone conversation. He was babbling on about the email validation not being tight enough. "after the period, it should check for exactly 3 characters. You know: .com .org .net. But it should just check for those three we don't want to limit ourselves if they come up with more TLDs. I'll raise a low priority defect for that after the call."

    So I shot him off an email (not wanting to interrupt his call) giving him some examples of .info and .co.uk email addresses. I didn't have the heart to show him the RFC.

  • TheD (unregistered) in reply to PJH
    PJH:
    I can count on the hands of one arm...

    For some reason, this was hilarious to me. Maybe I need more coffee?

  • craaazy (unregistered)
    Comment held for moderation.
  • LizardKing (cs)

    Hmm, email address validation is a nasty one. I remember trying to validate by doing lookup on the hostname portion, only to get scuppered by mail servers that don't resolve but are valid. I forget the details as this was many aeons ago, however a more experienced colleague pointed me at some RFC's (and would have probably submitted my code as a WTF if this site had been around).

  • Suggan (unregistered)
    Comment held for moderation.
  • Bill (unregistered) in reply to morry
    morry:
    The Regexes are utterly unreadable and therefore unmaintainable. I'd hate to have to fix one of those monsters.

    I read a quote somewhere that described regex as a "write once, read never" syntax. It's the poster child for the differnce between "Clever" and "Wise".

    Captcha: tesla - good scientist, BAAAD band...

  • Tukaro (cs)

    Er... I use a much simpler check than most of you do; perhaps it doesn't cover everything, but this is an internal thing, so it doesn't need to.

    /^([a-zA-Z0-9_.-])+@(([a-zA-Z0-9-])+.)+([a-zA-Z0-9]{2,4})+$/

  • steve (unregistered)
    Comment held for moderation.
  • MULL (unregistered)
    Comment held for moderation.
  • stevekj (cs) in reply to Steve
    Steve:
    Ooooh, ooh! I got it!

    The WTF (apart from this stupid comment box I'm typing in being only 20x2 characters) is that he thinks the 'at' sign is called an ampersand.

    I don't think that's it. I think he's using "amp" as a short form for "ampersat", which is indeed a more or less valid reference to "@". The real WTF is that no one besides this particular coder knows what an "ampersat" is.

    The other real WTF is that you can also refer to "@" as an "asperand". WTF?

    In a Google battle between "asperand" and "ampersat", "ampersat" comes out slightly ahead - but both are practically undefined, by Google standards, at just under 3k references each. So using either one in code that is going to be maintained by someone else is definitely a WTF.

    No, the real real WTF here is shortening "address" to "add"... that's clarity right there.

  • Anonymous Tart (unregistered) in reply to Tukaro
    Comment held for moderation.
  • Buzz (unregistered)
    Comment held for moderation.
  • TSK (unregistered)

    To add insult to injury, there may be e-mail addresses of IDN's (Internationalized Domain names) in future with umlauts, ogoneks, cedilles and such a stuff...they will be transformed by nameprep and punycode into RFC addresses before using, but that won't help validating them....

    To regexps: They violate the good old KISS principle ("Writing solid code"). They are hard to read (both visually and mentally), they cannot be accordingly commented (if you have qualms to spread both comment and regexp over the page) AND they are fragile (you know what I mean if you accidentally tipped one more char than necessary)....sometimes it breaks, sometimes not. I think it is some pride involved to be able setting up a mighty "all-cases-in-one" regexp, but for maintenance the long monsters are garbage. It's not so much fun, but keep the style boring; write so that you know five pages beforehand what it going to happen. In this case break the mail address into parts and verify them individually (with short regexps, yes) and comment what you are doing. You will be pleased if you are forced to rewrite old routines which you haven't seen a year under time pressure.

  • Asd (unregistered)
    Comment held for moderation.
  • facetious (unregistered)
    Comment held for moderation.
  • fuzzylollipop (unregistered)
    Comment held for moderation.
  • Otto (cs) in reply to Buzz
    Buzz:
    Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. Jamie Zawinski

    Some people, when confronted with regular expressions, like to quote jwz. Now they are fools who cannot cope with regular expressions.

  • mathew (unregistered)
    Comment held for moderation.
  • another moron (unregistered)

    regex sucks. they are a real solo trip, to be trotted out after 15 cups of coffee and a similar number of cigarettes. maintainability? you're joking... just start again!

  • skington (cs) in reply to TSK
    TSK:
    To regexps: They violate the good old KISS principle ("Writing solid code"). They are hard to read (both visually and mentally), they cannot be accordingly commented (if you have qualms to spread both comment and regexp over the page) AND they are fragile (you know what I mean if you accidentally tipped one more char than necessary)....sometimes it breaks, sometimes not.

    Perl has allowed comments and non-meaningful white space in regexes since 1998. That you can write regexes that look like line noise doesn't mean you have to.

    That big monster of a regex that validates RFC822 email addresses exists because, if you're going to validate email addresses, you may as well validate them properly - by dint of building up a regex bit by bit, and then, once you're happy it works, compiling it down to one big long humungous lump of code for performance, so other people can just say "use Email::Valid" or whatever other Perl modules use it. In the same way that ages ago people used to shorten variables and eliminate white space to fit more code into 32K, or however much RAM their machine had at the time. It doesn't mean you develop that way.

  • Toby (unregistered)
    Comment held for moderation.
  • Buzz (unregistered) in reply to Otto

    Some people, when defending regular expressions, like to show their arrogance because they know how to write obscure and usually unmaintainable code.

  • Ölbaum (unregistered)

    I own domain ölbaum.ch. Isn't there an RFC that allows it in an e-mail address? Then most of these regexps would have to be rewritten. Lucky no e-mail client (that I know of) supports IDNs.

  • imMute (unregistered) in reply to Bill
    Bill:
    morry:
    The Regexes are utterly unreadable and therefore unmaintainable. I'd hate to have to fix one of those monsters.

    I read a quote somewhere that described regex as a "write once, read never" syntax. It's the poster child for the differnce between "Clever" and "Wise".

    That regex was not written by a human, it was compiled using probably Parser::RecDescent or some other module
  • Iff (unregistered)
    Comment held for moderation.
  • Chris (unregistered) in reply to TSK

    When I've had to use fairly hairy regexes, they are always iteratively designed. I start simple, and add to it until it does what I need.

    The trick is I have each iteration as a comment beforehand.

    This lets me see what I've done, documents the limits of the regex, and lets me dig in and make changes without having to completely start over.

    Sure, that chunk of the code will have a large amount of comments relative to other places... but when you've got a tough chunk of code, isn't that a good thing?

  • Bill (unregistered) in reply to imMute
    imMute:
    Bill:
    morry:
    The Regexes are utterly unreadable and therefore unmaintainable. I'd hate to have to fix one of those monsters.

    I read a quote somewhere that described regex as a "write once, read never" syntax. It's the poster child for the differnce between "Clever" and "Wise".

    That regex was not written by a human, it was compiled using probably Parser::RecDescent or some other module

    Possibly, but matters not. The fact remains that it's unmaintainable as-is. Just because the metadata that "Documents" it might be maintained elsewhere, such as a tool, doesn't mitigate the fact that no one reading the source can be sure of what it does. Also, if the tool were worth a damn, it would also give you comments to imbed along with the regex.

    Hopefully this WAS simply the output of a builder class, where the method calls used to build it provide adequate documentation. But based on the OP, I doubt it.

    Captcha: tacos - with that suggestion, I'm off to lunch

  • JC (unregistered)
    Comment held for moderation.
  • Josh (unregistered)

    The real WTF is you comments section not word wrapping.:P

    Maybe you should consider a HTML class?

  • Kalle (unregistered)

    The easiest and most likely to succeed way to validate an address is to establish an SMTP session to the primary MX of the domain and do an RCPT. If the address is invalid, either you cannot establish a connection or the SMTP server returns an error. Easy :)

    [And yes, I do know that the Internet mail doesn't work like that any more, more is the pity.]

Leave a comment on “Validating Email Addresses”

Log In or post as a guest

Replying to comment #:

« Return to Article