The Daily WTF: Curious Perversions in Information Technology

2007-02-19 Reply Admin

I use the + quite a bit to separate the wheat from the chaff. If I'm forced to sign up for something on a website and I have to get the unlock through a valid email address, I just slap the name of the website I used the email address on ([email protected]) and watch as they either ignore or honor their privacy policy. I can't tell you how many times I see my email address come int eh mail from a website that supposedly never sells email addresses to third parties.

captcha: burned (indeed, indeed)

PJH · 2007-02-19 Reply Admin

it's a pet peeve of mine when people "validate" away perfectly valid addresses, for instance: websites that think all domains end in .com, .net, .edu, or .org; and agents that refuse to transfer mail with a + in the local-part. [...] And as I promised, here's my own RegExp for you to tear apart. (Yes, I know it doesn't handle a quoted local-part. No, I don't mind. Seriously, who does that?)

Um - the people who might use + in the local part?

PJH · 2007-02-19 Reply Admin

Xeronos:
I can't tell you how many times I see my email address come int eh mail from a website that supposedly never sells email addresses to third parties.

I can count on the hands of one arm this has happened to me - and judging by the subsequent 'spam' I got, it appears to have been an 'inside job' where someone leaving the company acquired the address. All my other unique addresses have only ever been used by the companies they've been attributed to.

TheJasper · 2007-02-19 Reply Admin

The format for e-mail addresses is specified in a number of RFCs; it's a pet peeve of mine when people "validate" away perfectly valid addresses, for instance: websites that think all domains end in .com, .net, .edu, or .org; and agents that refuse to transfer mail with a + in the local-part. To that end, I wrote my own regular expression that (I believe) follows the specification, which I'll share below.

And as I promised, here's my own RegExp for you to tear apart. (Yes, I know it doesn't handle a quoted local-part. No, I don't mind. Seriously, who does that?)

so, you don't like it that valid addresses are invalidated, so you present a regex that follows spec...except you then admit it doesn't do everything because...well who uses that anyway. Somewhere there is a person gnashing their teeth becuase their perfectly valid address isn't being validated by your code.

btw, I don't no the exact spec, and haven't tried to figure out if your regexp follows it. I don't think my boss would appreciate the time spent ;}

2007-02-19 Reply Admin

Whitespace is allowed in email addresses, as are constructs like:

"Moose Brains !!!" @ (yes, this is my address) spam.la <MooseBrains>

which both would fall over on.

2007-02-19 Reply Admin

use Mail::RFC822::Address qw(valid)

2007-02-19 Reply Admin

Ooooh, ooh! I got it!

The WTF (apart from this stupid comment box I'm typing in being only 20x2 characters) is that he thinks the 'at' sign is called an ampersand.

Can't see anything else wrong with it. It validates my email address just fine: "@/.

Steve

PS. That RegEx would fail on email addresses that use an IP address instead of a FQ name.

2007-02-19 Reply Admin

Your regexp doesn't support valid addresses such as billg@[131.107.115.212]

2007-02-19 Reply Admin

The regexp for validating all compliant email addresses is to large to this in this margin.

(Seriously, it's pretty big.)

2007-02-19 Reply Admin

Don't tell me that you think that the ugly regex is more readable than the javascript version. The purpose of email validation is just to check for common errors it has no sense to try to validate perfectly because it won't save you against valid nonexisting email (you just have to send the mail there and wait for the response).

2007-02-19 Reply Admin

^((\"[^\"\f\n\r\t\v\b]+\")|([\w\!\#\$\%\&\'\*\+\-\~\/\^\`\|\{\}]+(\.[\w\!\#\$\%\&\'\*\+\-\~\/\^\`\|\{\}]+)*))@((\[(((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9]))\.((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9]))\.((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9]))\.((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9])))\])|(((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9]))\.((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9]))\.((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9]))\.((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9])))|((([A-Za-z0-9\-])+\.)+[A-Za-z\-]+))$

From Regexlib.com

No?

2007-02-19 Reply Admin

This one is much more fun:

(?:(?:\r\n)?[ \t])(?:(?:(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|"(?:[^"\r\]|\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t]))(?:.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|"(?:[^"\r\]|\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t])))@(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t]))(?:.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t])))|(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|"(?:[^"\r\]|\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t]))<(?:(?:\r\n)?[ \t])(?:@(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t]))(?:.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t])))(?:,@(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t]))(?:.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t])))):(?:(?:\r\n)?[ \t]))?(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|"(?:[^"\r\]|\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t]))(?:.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|"(?:[^"\r\]|\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t])))@(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t]))(?:.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t])))>(?:(?:\r\n)?[ \t]))|(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|"(?:[^"\r\]|\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t])):(?:(?:\r\n)?[ \t])(?:(?:(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|"(?:[^"\r\]|\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t]))(?:.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|"(?:[^"\r\]|\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t])))@(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t]))(?:.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t])))|(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|"(?:[^"\r\]|\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t]))<(?:(?:\r\n)?[ \t])(?:@(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t]))(?:.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t])))(?:,@(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t]))(?:.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t])))):(?:(?:\r\n)?[ \t]))?(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|"(?:[^"\r\]|\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t]))(?:.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|"(?:[^"\r\]|\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t])))@(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t]))(?:.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t])))>(?:(?:\r\n)?[ \t]))(?:,\s(?:(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|"(?:[^"\r\]|\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t]))(?:.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|"(?:[^"\r\]|\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t])))@(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t]))(?:.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t])))|(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|"(?:[^"\r\]|\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t]))<(?:(?:\r\n)?[ \t])(?:@(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t]))(?:.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t])))(?:,@(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t]))(?:.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t])))):(?:(?:\r\n)?[ \t]))?(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|"(?:[^"\r\]|\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t]))(?:.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|"(?:[^"\r\]|\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t])))@(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t]))(?:.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \x00-\x1F]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t])))>(?:(?:\r\n)?[ \t]))))?;\s)

captcha: tastey (yum yum- if only it was spelt correctly!)

Silex · 2007-02-19 Reply Admin

The email rfc is full of crazy ideas... Look at the RFC compliant regex to validate an email :

http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html

2007-02-19 Reply Admin

And check this one

2007-02-19 Reply Admin

Drat. I mean this:

And check [url=http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html]this one[url].

2007-02-19 Reply Admin

The regexp doesn't work with domain names starting with numbers. I think that might comply with RFC822, but try telling 3com that.

2007-02-19 Reply Admin

The regular expression for Mail::RFC822::Address is horrendous - and it doesn't even handle comments. If you squint at the bottom half of it, you can see a picture of a donkey.

2007-02-19 Reply Admin

Crap, I give up - either copy/paste, or simply believe me that this regex is on that page:

(?:(?:\r\n)?[ \t])(?:(?:(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t] )+|\Z|(?=[["()<>@,;:\".[]]))|"(?:[^"\r\]|\.|(?:(?:\r\n)?[ \t]))"(?:(?: \r\n)?[ \t]))(?:.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \000-\031]+(?:(?:( ?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|"(?:[^"\r\]|\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t])))@(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \000-\0 31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)
](?:(?:\r\n)?[ \t]))(?:.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \000-\031]+ (?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)](?: (?:\r\n)?[ \t])))|(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z |(?=[["()<>@,;:\".[]]))|"(?:[^"\r\]|\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n) ?[ \t]))<(?:(?:\r\n)?[ \t])(?:@(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:
r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t]))(?:.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n) ?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t] )))(?:,@(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t])* )(?:.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t] )+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t])))) :(?:(?:\r\n)?[ \t]))?(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+ |\Z|(?=[["()<>@,;:\".[]]))|"(?:[^"\r\]|\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r \n)?[ \t]))(?:.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?: \r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|"(?:[^"\r\]|\.|(?:(?:\r\n)?[ \t ]))"(?:(?:\r\n)?[ \t])))@(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \000-\031 ]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)]( ?:(?:\r\n)?[ \t]))(?:.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \000-\031]+(? :(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)](?:(? :\r\n)?[ \t])))>(?:(?:\r\n)?[ \t]))|(?:[^()<>@,;:\".[] \000-\031]+(?:(? :(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|"(?:[^"\r\]|\.|(?:(?:\r\n)? [ \t]))"(?:(?:\r\n)?[ \t])):(?:(?:\r\n)?[ \t])(?:(?:(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|"(?:[^"\r\]| \.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t]))(?:.(?:(?:\r\n)?[ \t])(?:[^()<> @,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|" (?:[^"\r\]|\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t])))@(?:(?:\r\n)?[ \t] )(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\ ".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t]))(?:.(?:(?:\r\n)?[ \t])(? :[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[ ]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t])))|(?:[^()<>@,;:\".[] \000- \031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|"(?:[^"\r\]|\.|( ?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t]))<(?:(?:\r\n)?[ \t])(?:@(?:[^()<>@,; :\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([ ^[]\r\]|\.)](?:(?:\r\n)?[ \t]))(?:.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\" .[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[
]\r\]|\.)](?:(?:\r\n)?[ \t])))(?:,@(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".
[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]
r\]|\.)](?:(?:\r\n)?[ \t]))(?:.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\] |\.)](?:(?:\r\n)?[ \t])))):(?:(?:\r\n)?[ \t]))?(?:[^()<>@,;:\".[] \0 00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|"(?:[^"\r\]|\ .|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t]))(?:.(?:(?:\r\n)?[ \t])(?:[^()<>@, ;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|"(? :[^"\r\]|\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t])))@(?:(?:\r\n)?[ \t]) (?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\". []]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t]))(?:.(?:(?:\r\n)?[ \t])(?:[ ^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[] ]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t])))>(?:(?:\r\n)?[ \t]))(?:,\s( ?:(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\ ".[]]))|"(?:[^"\r\]|\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t]))(?:.(?:( ?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[ ["()<>@,;:\".[]]))|"(?:[^"\r\]|\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t ])))@(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t ])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t]))(? :.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+| \Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t])))|(?: [^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[
]]))|"(?:[^"\r\]|\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t]))<(?:(?:\r\n) ?[ \t])(?:@(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[[" ()<>@,;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t]))(?:.(?:(?:\r\n) ?[ \t])(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<> @,;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t])))(?:,@(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@, ;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t]))(?:.(?:(?:\r\n)?[ \t] )(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\ ".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t])))):(?:(?:\r\n)?[ \t]))? (?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\". []]))|"(?:[^"\r\]|\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t]))(?:.(?:(?: \r\n)?[ \t])(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[[ "()<>@,;:\".[]]))|"(?:[^"\r\]|\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t]) ))@(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t]) +|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t]))(?:
.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z |(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t])))>(?:( ?:\r\n)?[ \t]))))?;\s*)

GeneWitch · 2007-02-19 Reply Admin

regex looks like garbage. is there a framework somewhere that can validate an email address?

Don't FTP servers have constructs that allow them to verify email addresses as being valid without physically checking them on the internet?

2007-02-19 Reply Admin

my ISP doesn't support de local part!

morry · 2007-02-19 Reply Admin

The Regexes are utterly unreadable and therefore unmaintainable. I'd hate to have to fix one of those monsters.

Recently I overheard a collegues' phone conversation. He was babbling on about the email validation not being tight enough. "after the period, it should check for exactly 3 characters. You know: .com .org .net. But it should just check for those three we don't want to limit ourselves if they come up with more TLDs. I'll raise a low priority defect for that after the call."

So I shot him off an email (not wanting to interrupt his call) giving him some examples of .info and .co.uk email addresses. I didn't have the heart to show him the RFC.

2007-02-19 Reply Admin

PJH:
I can count on the hands of one arm...

For some reason, this was hilarious to me. Maybe I need more coffee?

2007-02-19 Reply Admin

I believe it's much better to not do any validation, really. Even better, let e-mail be an optional field, and only if it's filled, complain about a missing @ if it's not there - and if you really need that @ to be there (you might want to accept local accounts/aliasses, ugly exchange/notes addresses, X.500, whatnot).

Other than that, the syntax is simply too complex. Even if you catch someone forgetting a bit of the FQDN or whatever, you still can't catch them making "valid" typos ([email protected]).

If you're going to be using the e-mail adress for, well, sending e-mail, you still need to send a confirmation e-mail just so you won't be called a spammer anyway. So let their mail server do the checking for you.

The one exception I can think of, is if your e-mail system itself has some limitation (that isn't in the specs). For example, if your system simply can't handle IP addresses and quoted local parts, validate against those.

LizardKing · 2007-02-19 Reply Admin

Hmm, email address validation is a nasty one. I remember trying to validate by doing lookup on the hostname portion, only to get scuppered by mail servers that don't resolve but are valid. I forget the details as this was many aeons ago, however a more experienced colleague pointed me at some RFC's (and would have probably submitted my code as a WTF if this site had been around).

2007-02-19 Reply Admin

Actually, most validation algorithms disapproves of this perfectly valid address: me@se Why is that??

2007-02-19 Reply Admin

morry:
The Regexes are utterly unreadable and therefore unmaintainable. I'd hate to have to fix one of those monsters.

I read a quote somewhere that described regex as a "write once, read never" syntax. It's the poster child for the differnce between "Clever" and "Wise".

Captcha: tesla - good scientist, BAAAD band...

Tukaro · 2007-02-19 Reply Admin

Er... I use a much simpler check than most of you do; perhaps it doesn't cover everything, but this is an internal thing, so it doesn't need to.

/^([a-zA-Z0-9_.-])+@(([a-zA-Z0-9-])+.)+([a-zA-Z0-9]{2,4})+$/

2007-02-19 Reply Admin

I just like how in the code they have the var "ampisthere" for ampersand (&) and they think that's what you call @

2007-02-19 Reply Admin

The correct regexp for emails can be found there: http://examples.oreilly.com/regex/

stevekj · 2007-02-19 Reply Admin

Steve:
Ooooh, ooh! I got it!
The WTF (apart from this stupid comment box I'm typing in being only 20x2 characters) is that he thinks the 'at' sign is called an ampersand.

I don't think that's it. I think he's using "amp" as a short form for "ampersat", which is indeed a more or less valid reference to "@". The real WTF is that no one besides this particular coder knows what an "ampersat" is.

The other real WTF is that you can also refer to "@" as an "asperand". WTF?

In a Google battle between "asperand" and "ampersat", "ampersat" comes out slightly ahead - but both are practically undefined, by Google standards, at just under 3k references each. So using either one in code that is going to be maintained by someone else is definitely a WTF.

No, the real real WTF here is shortening "address" to "add"... that's clarity right there.

2007-02-19 Reply Admin

Tukaro:
Er... I use a much simpler check than most of you do; perhaps it doesn't cover everything, but this is an internal thing, so it doesn't need to.
/^([a-zA-Z0-9_.-])+@(([a-zA-Z0-9-])+.)+([a-zA-Z0-9]{2,4})+$/

Yeah, I use something much simpler even than that in internal code.

!/^$/

If they dont get their email, it is highly likely that they didnt put in the right address :)

2007-02-19 Reply Admin

From http://www.eskimo.com/~hottub/software/programming_quotes.html

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

Jamie Zawinski

2007-02-19 Reply Admin

To add insult to injury, there may be e-mail addresses of IDN's (Internationalized Domain names) in future with umlauts, ogoneks, cedilles and such a stuff...they will be transformed by nameprep and punycode into RFC addresses before using, but that won't help validating them....

To regexps: They violate the good old KISS principle ("Writing solid code"). They are hard to read (both visually and mentally), they cannot be accordingly commented (if you have qualms to spread both comment and regexp over the page) AND they are fragile (you know what I mean if you accidentally tipped one more char than necessary)....sometimes it breaks, sometimes not. I think it is some pride involved to be able setting up a mighty "all-cases-in-one" regexp, but for maintenance the long monsters are garbage. It's not so much fun, but keep the style boring; write so that you know five pages beforehand what it going to happen. In this case break the mail address into parts and verify them individually (with short regexps, yes) and comment what you are doing. You will be pleased if you are forced to rewrite old routines which you haven't seen a year under time pressure.

2007-02-19 Reply Admin

After reading this I had a look at what the jakarta commons validator did. And found a nice mini WTF. The EmailValidator is a singleton despite it not having any state. God I wish they had never invented that pattern.

http://svn.apache.org/viewvc/jakarta/commons/proper/validator/trunk/src/main/java/org/apache/commons/validator/EmailValidator.java?view=markup

2007-02-19 Reply Admin

Please spam my valid address: .@._ Thank you.

captcha: tastey (t .. to the a .. to the s-t-e-y girl, you tastey)

2007-02-19 Reply Admin

unfortunately your regex isn't any better at actually validating email addresses, and here is why

http://www.regular-expressions.info/email.html

Otto · 2007-02-19 Reply Admin

Buzz:
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. Jamie Zawinski

Some people, when confronted with regular expressions, like to quote jwz. Now they are fools who cannot cope with regular expressions.

2007-02-19 Reply Admin

The only validation of e-mail addresses I do is to check it matches .+@.+..+

i.e. there's an @, there are characters on both sides of the @, there's at least one . to the right of the @, there are characters on both sides of the .

Remember, no syntactic check is going to determine whether an e-mail address is actually valid and working. All you're checking for is obvious brokenness like putting in localhost-specific user IDs, or putting their name in the field instead of their e-mail address.

2007-02-19 Reply Admin

regex sucks. they are a real solo trip, to be trotted out after 15 cups of coffee and a similar number of cigarettes. maintainability? you're joking... just start again!

skington · 2007-02-19 Reply Admin

TSK:
To regexps: They violate the good old KISS principle ("Writing solid code"). They are hard to read (both visually and mentally), they cannot be accordingly commented (if you have qualms to spread both comment and regexp over the page) AND they are fragile (you know what I mean if you accidentally tipped one more char than necessary)....sometimes it breaks, sometimes not.

Perl has allowed comments and non-meaningful white space in regexes since 1998. That you can write regexes that look like line noise doesn't mean you have to.

That big monster of a regex that validates RFC822 email addresses exists because, if you're going to validate email addresses, you may as well validate them properly - by dint of building up a regex bit by bit, and then, once you're happy it works, compiling it down to one big long humungous lump of code for performance, so other people can just say "use Email::Valid" or whatever other Perl modules use it. In the same way that ages ago people used to shorten variables and eliminate white space to fit more code into 32K, or however much RAM their machine had at the time. It doesn't mean you develop that way.

2007-02-19 Reply Admin

With the help of the wonderful RegExBuddy the splendid regex above can be translated into an approximation of English.

^-!#$%∓'*+/0-9=?A-Z^_a-z{|}~ @a-zA-Z(.a-zA-Z)+$ Assert position at the start of the string «^» Match a single character present in the list below «[-!#$%∓'+/0-9=?A-Z^az{|}~] » One of the characters "-!#$%∓'*+/" «-!#$%∓'*+/» A character in the range between "0" and "9" «0-9» One of the characters "=?" «=?» A character in the range between "A" and "Z" «A-Z» One of the characters "^" «^» A character in the range between "a" and "z" «a-z» One of the characters "{|}~" «{|}~» Match the regular expression below and capture its match into backreference number 1 «(.?[-!#$%∓'+/0-9=?A-Z^_a-z{|}~])» Match the character "." literally «.?» Between zero and one times, as many times as possible, giving back as needed (greedy) «?» Match a single character present in the list below «[-!#$%∓'+/0-9=?AZ^ a-z{|}~]» One of the characters "-!#$%∓'+/" «-!#$%∓'+/» A character in the range between "0" and "9" «0-9» One of the characters "=?" «=?» A character in the range between "A" and "Z" «A-Z» One of the characters "^_" «^_» A character in the range between "a" and "z" «a-z» One of the characters "{|}~" «{|}~» Match the character " " literally « » Match the character " " literally « » Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «» Match the character "@" literally «@» Match a single character present in the list below «[a-zA-Z]» A character in the range between "a" and "z" «a-z» A character in the range between "A" and "Z" «A-Z» Match the regular expression below and capture its match into backreference number 2 «(-?[a-zA-Z0-9])» Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «» Note: You repeated the backreference itself. The backreference will capture only the last iteration. Put the backreference inside a group and repeat that group to capture all iterations. «» Match the character "-" literally «-?» Between zero and one times, as many times as possible, giving back as needed (greedy) «?» Match a single character present in the list below «[a-zA-Z0-9]» A character in the range between "a" and "z" «a-z» A character in the range between "A" and "Z" «A-Z» A character in the range between "0" and "9" «0-9» Match the regular expression below and capture its match into backreference number 3 «(.a-zA-Z)+» Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» Note: You repeated the backreference itself. The backreference will capture only the last iteration. Put the backreference inside a group and repeat that group to capture all iterations. «+» Match the character "." literally «.» Match a single character present in the list below «[a-zA-Z]» A character in the range between "a" and "z" «a-z» A character in the range between "A" and "Z" «A-Z» Match the regular expression below and capture its match into backreference number 4 «(-?[a-zA-Z0-9])» Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «» Note: You repeated the backreference itself. The backreference will capture only the last iteration. Put the backreference inside a group and repeat that group to capture all iterations. «*» Match the character "-" literally «-?» Between zero and one times, as many times as possible, giving back as needed (greedy) «?» Match a single character present in the list below «[a-zA-Z0-9]» A character in the range between "a" and "z" «a-z» A character in the range between "A" and "Z" «A-Z» A character in the range between "0" and "9" «0-9» Assert position at the end of the string (or before the line break at the end of the string, if any) «$»

2007-02-19 Reply Admin

Some people, when defending regular expressions, like to show their arrogance because they know how to write obscure and usually unmaintainable code.

2007-02-19 Reply Admin

I own domain ölbaum.ch. Isn't there an RFC that allows it in an e-mail address? Then most of these regexps would have to be rewritten. Lucky no e-mail client (that I know of) supports IDNs.

2007-02-19 Reply Admin

Bill:
morry:
The Regexes are utterly unreadable and therefore unmaintainable. I'd hate to have to fix one of those monsters.

I read a quote somewhere that described regex as a "write once, read never" syntax. It's the poster child for the differnce between "Clever" and "Wise".

That regex was not written by a human, it was compiled using probably Parser::RecDescent or some other module

2007-02-19 Reply Admin

Just a little note to add: I literally go maniacal when a webpage refuses my e-mail address that starts with "i@". One letter local part still makes an e-mail, for the love of Pete.

Regards.

2007-02-19 Reply Admin

When I've had to use fairly hairy regexes, they are always iteratively designed. I start simple, and add to it until it does what I need.

The trick is I have each iteration as a comment beforehand.

This lets me see what I've done, documents the limits of the regex, and lets me dig in and make changes without having to completely start over.

Sure, that chunk of the code will have a large amount of comments relative to other places... but when you've got a tough chunk of code, isn't that a good thing?

2007-02-19 Reply Admin

imMute:
Bill:
morry:
The Regexes are utterly unreadable and therefore unmaintainable. I'd hate to have to fix one of those monsters.

I read a quote somewhere that described regex as a "write once, read never" syntax. It's the poster child for the differnce between "Clever" and "Wise".

That regex was not written by a human, it was compiled using probably Parser::RecDescent or some other module

Possibly, but matters not. The fact remains that it's unmaintainable as-is. Just because the metadata that "Documents" it might be maintained elsewhere, such as a tool, doesn't mitigate the fact that no one reading the source can be sure of what it does. Also, if the tool were worth a damn, it would also give you comments to imbed along with the regex.

Hopefully this WAS simply the output of a builder class, where the method calls used to build it provide adequate documentation. But based on the OP, I doubt it.

Captcha: tacos - with that suggestion, I'm off to lunch

2007-02-19 Reply Admin

I don't see the point of validating email addresses at all... even if it vaguely resembles what an email looks like, there is no guarantee that it is correct. It will soon enough be validated when you try and send an email to it and at least you won't be alienating those users with non-standard-but-legal addresses.

(And even the argument that it is helping the user by telling them they've typoed... they are probably just as likely to typo the text part of the address than the @ and . parts!)

2007-02-19 Reply Admin

The real WTF is you comments section not word wrapping.:P

Maybe you should consider a HTML class?

2007-02-19 Reply Admin

The easiest and most likely to succeed way to validate an address is to establish an SMTP session to the primary MX of the domain and do an RCPT. If the address is invalid, either you cannot establish a connection or the SMTP server returns an error. Easy :)

[And yes, I do know that the Internet mail doesn't work like that any more, more is the pity.]

Validating Email Addresses

With the help of the wonderful RegExBuddy the splendid regex above can be translated into an approximation of English.

Leave a comment on “Validating Email Addresses”