The Daily WTF: Curious Perversions in Information Technology

2007-02-19 Reply Admin

This is how I do it in Java

try{ InternetAddress foo = new InternetAddress(emailCandidate); } catch (AddressException ex) { return false; } return true;

2007-02-19 Reply Admin

Janek:
This is how I do it in Java
try{ InternetAddress foo = new InternetAddress(emailCandidate); } catch (AddressException ex) { return false; } return true;

How do we know she's a witch? Let's build a bridge of her.

2007-02-19 Reply Admin

Hahaha! Nice.

mol:
... The purpose of email validation is just to check for common errors it has no sense to try to validate perfectly because it won't save you against valid nonexisting email (you just have to send the mail there and wait for the response).

Mol is right, validation should eliminate the basic errors in entering an email. If the user does not reply with a confirmation, too bad.

2007-02-19 Reply Admin

About the only thing I ever bothered to check for in email validating regexps is that the email address won't trigger some weird kind of non-mailbox delivery on the local host (e.g. "/dev/sda@localhost" or "|/bin/sh@[127.0.0.1]"). This was using /usr/sbin/sendmail to submit mail.

Now I open a SMTP connection to a relay server. Modern SMTP servers are already adequately hardened against malicious or merely uncooperative email addresses, so it's a waste of my time to duplicate these features.

I only check for the absence of the following characters:

CR, LF, NUL - explicitly prohibited by RFC

end quote character for RCPT command should not appear in an address, half the MTAs on the planet wouldn't know what to do with it.

% ! - abused by spammers on open relays. Get a real email address if you are living behind one of these, you luddite.

/^[^\r\n\0>!%]+@[^\r\n\0>!%]+$/os

Many MTA's don't support anything like full RFC822 email syntax, and only support "RCPT TO: <" string-of-some-characters ">" with various permitted values for the "string-of-some-characters", and various transformations done on the text if special characters like spaces or parens are used.

RFC822 was designed to allow any damn email address from any damn local email system to be encoded into an RFC822 email address. It's not feasible for me to validate addresses from some legacy email system that still runs on a PDP-10 somewhere, so I don't try.

If I'm required to determine the validity of the email address I'll send a token to the address and require the user to enter the token before I talk to them again. That tests not just the validity of the email address, but the reliability and availability of the whole return path to the requesting user and the user's willingness to cooperate with receiving mail at that address--a much more useful assertion.

2007-02-19 Reply Admin

Kalle:
The easiest and most likely to succeed way to validate an address is to establish an SMTP session to the primary MX of the domain and do an RCPT. If the address is invalid, either you cannot establish a connection or the SMTP server returns an error. Easy :)
[And yes, I do know that the Internet mail doesn't work like that any more, more is the pity.]

For those who haven't tried it, there are four cases:

It actually works, RCPT returns OK if the address is valid and an error otherwise.
It's totally broken, RCPT returns an error if the address is valid and OK otherwise. People with this kind of mail host don't get much mail. This breed is rare but not extinct.
The remote host graylists all SMTP hosts that contact it for the first time, in which case you'll get an RCPT temporary error on the first connection, then a correct answer when you retry between two minutes and 24 hours later.
The remote host is not the final destination host but a gateway without access to a database of local addresses for validation, so it says OK to all RCPT commands. Some time later a bounce message will be generated for the invalid ones and sent to the envelope sender address. This kind of host is really damn annoying and I get hundreds of messages from them every day bouncing messages containing Windows viruses because the message had my email address as the sender.

2007-02-19 Reply Admin

mathew:
The only validation of e-mail addresses I do is to check it matches .+@.+\..+

It pains me to see you post this so soon after my .@._ comment. Your check would think that @@@.@ is a valid email address.

2007-02-19 Reply Admin

The utterly insane regex listed earlier in this thread is actually NOT 100% RFC822-compliant. It takes a shortcut by placing an arbitrary restriction on the nesting depth of the comments.

A regular expression actually CANNOT validate an E-mail address according to RFC822. The language described in RFC822 is recursive and cannot be normalized to an iterative description. If you can't normalize it like this (that is, if there's no way to write the language in such a way that you never have to refer to a symbol that hasn't been defined yet and you never have a rule that refers to itself) then it is, technically, impossible to construct a regular expression for it.

That said, there's no value in validating an address against the full force of RFC822, as discussed earlier in this thread; not many MTAs -- and even fewer desktop mail applications -- conform to the full scope of the "requirements" and only implement the most commonly used subset.

CDarklock · 2007-02-19 Reply Admin

Bill:
The fact remains that it's unmaintainable as-is.

Regular expressions are maintained by throwing them away and writing new ones. You do this when they don't work. When they do, just leave them alone.

If you don't know whether a given regular expression works, it doesn't.

2007-02-19 Reply Admin

From the VMWare converter registration page:

function isValidEmail(str) {

  return (str.indexOf("@") > 0);

}

The frustrating thing is that I sent them an e-mail a while back to complain about their old e-mail 'validator', and they changed it to something more sensible, but have now regressed to this which is even worse than the original.

2007-02-19 Reply Admin

Clearly this is not an appropriate use of regular expressions at all.

See the following articles:

http://blogs.msdn.com/oldnewthing/archive/2006/05/22/603788.aspx

http://blogs.msdn.com/larryosterman/archive/2005/01/07/348548.aspx

http://blogs.msdn.com/larryosterman/archive/2005/01/10/350135.aspx

2007-02-19 Reply Admin

This is how I do it in Java
try{ InternetAddress foo = new InternetAddress(emailCandidate); } catch (AddressException ex) { return false; } return true;

MMMMM...using exceptions to handle expected program flow. Tasty. Not sure if Java would allow you to do that another way without using exceptions, but that is really bad form.

Now if you'll excuse me I'm going to go hang myself.

CDarklock · 2007-02-19 Reply Admin

Janek:
This is how I do it in Java

This is one of the many reasons why I believe Java developers are evil and must be stopped.

Oh, did I say "evil"? I meant "stupid".

2007-02-19 Reply Admin

As far as I understand, the following Regexp will perfectly work, too:

([^\x00-\x20\x22\x28\x29\x2c\x2e\x3a-\x3c\x3e\x40\x5b-\x5d\x7f-\xff]+|\x22([^\x0d\x22\x5c\x80-\xff]|\x5c[\x00-\x7f])\x22)(\x2e([^\x00-\x20\x22\x28\x29\x2c\x2e\x3a-\x3c\x3e\x40\x5b-\x5d\x7f-\xff]+|\x22([^\x0d\x22\x5c\x80-\xff]|\x5c[\x00-\x7f])\x22))\x40([^\x00-\x20\x22\x28\x29\x2c\x2e\x3a-\x3c\x3e\x40\x5b-\x5d\x7f-\xff]+|\x5b([^\x0d\x5b-\x5d\x80-\xff]|\x5c[\x00-\x7f])\x5d)(\x2e([^\x00-\x20\x22\x28\x29\x2c\x2e\x3a-\x3c\x3e\x40\x5b-\x5d\x7f-\xff]+|\x5b([^\x0d\x5b-\x5d\x80-\xff]|\x5c[\x00-\x7f])\x5d))

It is an absolute hell to maintain but since the standards probably won't change in the next few hundred years (why should they?) I find it quite acceptable.

2007-02-19 Reply Admin

... and of course, the real comedy here is that Igor's masterpiece code is actually better than the regex - there is some hope of maintaining it.

2007-02-19 Reply Admin

New improvements to the Perl5 regexp engine allow us to write a much easier to read regexp to validate email addresses. Courtesy of Abigail:

use 5.9.5; # In fact, you need the newest blead.

my $email_address = qr { (?(DEFINE) (?<addr_spec> (?&local_part) @ (?&domain)) (?<local_part> (?&dot_atom) | (?&quoted_string)) (?<domain> (?&dot_atom) | (?&domain_literal)) (?<domain_literal> (?&CFWS)? [ (?: (?&FWS)? (?&dcontent))* (?&FWS)? ] (?&CFWS)?) (?<dcontent> (?&dtext) | (?&quoted_pair)) (?<dtext> (?&NO_WS_CTL) | [\x21-\x5a\x5e-\x7e])

  (?<atext>           (?&ALPHA) | (?&DIGIT) | [!#\$%&'*+-/=?^_`{|}~])
  (?<atom>            (?&CFWS)? (?&atext)+ (?&CFWS)?)
  (?<dot_atom>        (?&CFWS)? (?&dot_atom_text) (?&CFWS)?)
  (?<dot_atom_text>   (?&atext)+ (?: \. (?&atext)+)*)

  (?<text>            [\x01-\x09\x0b\x0c\x0e-\x7f])
  (?<quoted_pair>     \\ (?&text))

  (?<qtext>           (?&NO_WS_CTL) | [\x21\x23-\x5b\x5d-\x7e])
  (?<qcontent>        (?&qtext) | (?&quoted_pair))
  (?<quoted_string>   (?&CFWS)? (?&DQUOTE) (?:(?&FWS)? (?&qcontent))*
                       (?&FWS)? (?&DQUOTE) (?&CFWS)?)

  (?<word>            (?&atom) | (?&quoted_string))
  (?<phrase>          (?&word)+)

  # Folding white space
  (?<FWS>             (?: (?&WSP)* (?&CRLF))? (?&WSP)+)
  (?<ctext>           (?&NO_WS_CTL) | [\x21-\x27\x2a-\x5b\x5d-\x7e])
  (?<ccontent>        (?&ctext) | (?&quoted_pair) | (?&comment))
  (?<comment>         \( (?: (?&FWS)? (?&ccontent))* (?&FWS)? \) )
  (?<CFWS>            (?: (?&FWS)? (?&comment))*
                      (?: (?:(?&FWS)? (?&comment)) | (?&FWS)))

  # No whitespace control
  (?<NO_WS_CTL>       [\x01-\x08\x0b\x0c\x0e-\x1f\x7f])

  (?<ALPHA>           [A-Za-z])
  (?<DIGIT>           [0-9])
  (?<CRLF>            \x0d \x0a)
  (?<DQUOTE>          ")
  (?<WSP>             [\x20\x09])
)

(?&addr_spec)

}x;

http://groups.google.com/group/comp.lang.perl.misc/browse_thread/thread/221ac5c8159f5ef4?hl=en

EvanED · 2007-02-19 Reply Admin

Bill:
imMute:
That regex was not written by a human, it was compiled using probably Parser::RecDescent or some other module

Possibly, but matters not. The fact remains that it's unmaintainable as-is. Just because the metadata that "Documents" it might be maintained elsewhere, such as a tool, doesn't mitigate the fact that no one reading the source can be sure of what it does. Also, if the tool were worth a damn, it would also give you comments to imbed along with the regex.

It matters not? If it's used correctly, it matters the world! Having generated parts of code is fine as long as you don't have to modify the generated parts. It's done all the time... witness Yacc and Lex. Heck, witness a compiler. Do you think that convoluted but terribly efficient assembly code produced by a compiler is bad because the fact that it was compiled matters not?

If all it does is plop out a RegEx and wrap "isvalid(email_address)" around it, then this is a perfectly valid approach.

(Of course, if it goes into a file that needs to be modified, then you can no longer regenerate the file if you make changes/fix bugs/find a better way/etc. without losing those changes. Then it's just as bad as if a human wrote it.)

2007-02-19 Reply Admin

thrashaholic:
This is how I do it in Java
try {
    InternetAddress foo = new InternetAddress(emailCandidate);
} catch (AddressException ex) {
    return false;
}
return true;
MMMMM...using exceptions to handle expected program flow. Tasty. Not sure if Java would allow you to do that another way without using exceptions, but that is really bad form.

It won't. There's no "TryParse" equivalent to this. You shouldn't design code to use exceptions to handle expected conditions, but occasionally when there is an impedance mismatch like this with a library routine, it is necessary.

thrashaholic:
Now if you'll excuse me I'm going to go hang myself.

Good riddance.

2007-02-19 Reply Admin

CDarklock:
Janek:
This is how I do it in Java

This is one of the many reasons why I believe Java developers are evil and must be stopped.

Oh, did I say "evil"? I meant "stupid".

Yes, using a standard library routine to validate an email address, as opposed to writing your own unreadable, unmaintainable, and broken regular expression to do it, is both evil and stupid. Idiot.

2007-02-19 Reply Admin

LizardKing:
Hmm, email address validation is a nasty one. I remember trying to validate by doing lookup on the hostname portion, only to get scuppered by mail servers that don't resolve but are valid. I forget the details as this was many aeons ago, however a more experienced colleague pointed me at some RFC's (and would have probably submitted my code as a WTF if this site had been around).

I would guess that the piece you were missing is the idea of an MX (mail exchanger) record. You can have a domain, such as email-handled-elsewhere.com, and it has an MX record in DNS of mail-handler.com (or several records, with several different hosts), and the mail goes not to the server mentioned in the email address, but to (one of) the one(s) in the MX record(s).

2007-02-19 Reply Admin

darwin:
Yes, using a standard library routine to validate an email address, as opposed to writing your own unreadable, unmaintainable, and broken regular expression to do it, is both evil and stupid. Idiot.

Using anything coded in Java is evil and stupid. You whine because you can't read machine-generated code and then you load an entire virtual machine to make sure an e-mail address meets the approval of a library you've never read.

Let us know when you learn to code.

2007-02-19 Reply Admin

darwin:
thrashaholic:
This is how I do it in Java
try {
    InternetAddress foo = new InternetAddress(emailCandidate);
} catch (AddressException ex) {
    return false;
}
return true;
MMMMM...using exceptions to handle expected program flow. Tasty. Not sure if Java would allow you to do that another way without using exceptions, but that is really bad form.
It won't. There's no "TryParse" equivalent to this. You shouldn't design code to use exceptions to handle expected conditions, but occasionally when there is an impedance mismatch like this with a library routine, it is necessary.

That's what I feared. However I see code like this used far too many times (when it shouldn't) to simply keep quiet about it. I honestly could not care less about the general WTF-ness of Java's standard libs (because I f-in hate Java), but seeing code like that makes me want to vomit. As it should.

darwin:

thrashaholic:
Now if you'll excuse me I'm going to go hang myself.

Good riddance.

Wow, mature there sir, real mature. Your momma's calling, she said you need to clean up the basement after you're done playing WoW.

savar · 2007-02-19 Reply Admin

Tukaro:
Er... I use a much simpler check than most of you do; perhaps it doesn't cover everything, but this is an internal thing, so it doesn't need to.
/^([a-zA-Z0-9_.-])+@(([a-zA-Z0-9-])+.)+([a-zA-Z0-9]{2,4})+$/

You don't need to escape '.' inside a character class. But yes, there are many different ways to kinda validate an email address.

I wish they'd stop posting stories like this on TDWTF because every time they do, the usual furor erupts with 99% of people not being aware how complex the RFC actually is, or what a monstrous regex it takes to meet it.

Then a wise soul points out that validating the formatting of an email address isn't really doing anything in the first place, and we all end up right where we started.

2007-02-19 Reply Admin

Regexes are not bad. Just because people don't make use of the /x modifier and comments doesn't make regexes themselves evil. They are incredibly useful, and those that think they add an extra problem to the task at hand just haven't bothered to learn how to wield the power of regexes.

If you find regexes confusing or difficult you need to read Freidl's book. If you still have trouble then you are not a geek and should not be programming.

2007-02-19 Reply Admin

and here's another

	if (emailstr.length == 0) {
		alert("You must supply an email address.\nPlease try again");
		elm.focus();
		return(false);
		}
		
	var matchArray=emailstr.match(emailPat);
	
	if (matchArray==null) {
		var sErrMsg;
		sErrMsg =  "<" + emailstr + "> is not a valid e-mail address.\n";
		sErrMsg += "You must supply a valid e-mail address.\n";
		sErrMsg += "Please try again.";
		alert(sErrMsg);
	    elm.focus();
		return (false);
	}

Iago · 2007-02-19 Reply Admin

thrashaholic:
Not sure if Java would allow you to do that another way without using exceptions, but that is really bad form.

Why?

darwin:
You shouldn't design code to use exceptions to handle expected conditions

Why not?

The code is readable, maintainable, and acceptably efficient. And those are the criteria for determining what is good code, not whether or not it meets some arbitrary idea of "purity" that has no useful rationale or general applicability in the real world.

2007-02-19 Reply Admin

Iago:
thrashaholic:
Not sure if Java would allow you to do that another way without using exceptions, but that is really bad form.
Why?
darwin:
You shouldn't design code to use exceptions to handle expected conditions
Why not?
The code is readable, maintainable, and acceptably efficient. And those are the criteria for determining what is good code, not whether or not it meets some arbitrary idea of "purity" that has no useful rationale or general applicability in the real world.

Try/Catch is expensive. (Severity of the expense depends on language/compiler/platform/VM/etc..) They should not be used in places where a simple If statement would suffice. That's all the reason I need right there.

The other reasons are mostly philosophical, if not accepted "form". Exceptions (as the name implies) shouldn't be used to control the flow of the normal happenings of your code. If you're doing this, you have a fundamental misunderstanding of exception handling. Exceptions are to be used for EXCEPTIONAL CASES that you can not plan for.

If you actually code the logic for something happening, you probably shouldn't use an exception to control the flow of said logic. "Is something null?" "Is this variable actually instantiated?", etc..are all cases where IF (WHATEVER) is a lot cheaper and vastly more "proper" than using exceptions to branch.

<a rel="nofollow" href="href="http://www.mortench.net/archive/eh.pdf" target="_blank" title="href="http://www.mortench.net/archive/eh.pdf">Here is a thesis you might find interesting.

This is a more high level blog post by the author of above linked thesis.

Some of the more accepted Exception "Best Practices" include:

"Don't use exceptions to indicate absence of a resource" "Don't use exception handling as means of returning information from a method" "Use exceptions for errors that should not be ignored"

Etc..etc..etc..indeed, the most important rule for exception handling is: Don't do it.

(People who know WTF they're doing will understand that)

2007-02-19 Reply Admin

stevekj:

... I think he's using "amp" as a short form for "ampersat", which is indeed a more or less valid reference to "@". The real WTF is that no one besides this particular coder knows what an "ampersat" is. ...

"@" better known as: "a human or elf"

EvanED · 2007-02-19 Reply Admin

thrashaholic:
The other reasons are mostly philosophical, if not accepted "form". Exceptions (as the name implies) shouldn't be used to control the flow of the normal happenings of your code. If you're doing this, you have a fundamental misunderstanding of exception handling. Exceptions are to be used for EXCEPTIONAL CASES that you can not plan for.

So it's the name? What if the language called them something else? For instance, Common Lisp has something similar it calls conditions. Would it be appropriate to signal a condition?

And what's exceptional? Is a network connection going down exceptional and a good place for an exception? Why not malformed user input? Both will happen from time to time, it's just a difference of degree. If something happens 1 in every 100 requests is it exceptional, or does it need to be 1 in 1000?

"Don't use exceptions to indicate absence of a resource"

Hmm, what about C++'s bad_alloc exception, Java's OutOfMemoryError, or .Net's OutOfMemoryException? Are those aspects of those languages poorly designed? Or should memory be treated differently than other resources?

"Don't use exception handling as means of returning information from a method"

Exceptions ALWAYS return information from a method, specifically "this method could not complete normally." If he means as a substitute for return, yes.

I can understand a dislike for exceptions as a whole, but I don't see how you can approve of some uses and yet remain limited enough to think that malformed addresses is an inappropriate use. (At least absent profiling information that tells you so.)

2007-02-19 Reply Admin

Since the RFC allows comments to be indefinitely deeply nested, there can't be a single REGEX to work on every single valid email address. Even the 6598 byte long REGEX in Appendix B of Mastering Regular Expressions assumes zero spaces and no hash marks only allows for doubly nested comments.

2007-02-19 Reply Admin

EvanED:
So it's the name? What if the language called them something else? For instance, Common Lisp has something similar it calls conditions. Would it be appropriate to signal a condition?

In the case we're speaking about, probably not. I'm not familiar with CL, however.

EvanED:
And what's exceptional? Is a network connection going down exceptional and a good place for an exception? Why not malformed user input? Both will happen from time to time, it's just a difference of degree. If something happens 1 in every 100 requests is it exceptional, or does it need to be 1 in 1000?

The network going down is a prime case for exceptions. Malformed user input is not. IMO.

EvanED:
Hmm, what about C++'s bad_alloc exception, Java's OutOfMemoryError, or .Net's OutOfMemoryException? Are those aspects of those languages poorly designed? Or should memory be treated differently than other resources?

Of course not. Being out of memory is an exceptional condition. However, a variable being null is not. In C#, would you do :

try { object thrash = Request.QueryString["something"]; //do branch 1 } catch (Exception e) { // do branch 2 }

or you would rather see:

if (Request.QueryString["something"]!=null) { // do branch 1 } else { // do branch 2 }

?

These are the types of instances I'm speaking of. (And I, unfortunately, see a lot) Having a network connection unavailable, or being OOM are exception cases. Examples like above, are not.

EvanED:
Exceptions ALWAYS return information from a method, specifically "this method could not complete normally." If he means as a substitute for return, yes.

Of course that's what's meant here. Seeing try { crap } catch() { return false; } return true; makes my skin crawl.

EvanED:
I can understand a dislike for exceptions as a whole, but I don't see how you can approve of some uses and yet remain limited enough to think that malformed addresses is an inappropriate use. (At least absent profiling information that tells you so.)

If it's something that you can check for with a minimal amount of conditional code, then it's not a good case for exceptions. If it's something that no sane amount of conditional branching could ever solve, then a try...catch block is appropriate.

foxyshadis · 2007-02-19 Reply Admin

MooseBrains:
Whitespace is allowed in email addresses, as are constructs like:
"Moose Brains !!!" @ (yes, this is my address) spam.la <MooseBrains>
which both would fall over on.

That's not an email address, that's an exercise in wankery.

Why do wonks always bring up the RFC's utter insanity whenever email comes up? It's 2007, not 1987. It's hard to find a specific place to draw the line, but it should have been done for RFC 2822. If at least 20% of MTAs online can't actually transfer your message, it's not an email address for all practical purposes, and in your case it's more like 99%.

The whole RFC is just an exercise in what goes wrong when standards are designed around including everyone and not forcing anyone to change, and practices that are long gone or have used at all are left in because they sound cool.

BTW, why are you all referencing 822? 2822 obseleted it.

2007-02-19 Reply Admin

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

Some people, when faced with a regular expression, think "I know, I'll use Jamie Zawinski as an excuse and cherish my ignorance". Now they've got an infinite number of problems.

2007-02-19 Reply Admin

Suggan:
Actually, most validation algorithms disapproves of this perfectly valid address: me@se Why is that??

I believe because all TLD consist of 2,3 or 4 characters and so "se" would be considered a TLD. Is it? I dunno. Even if it was one you can't just email that directly, it would be like emailing "me@com"

EvanED · 2007-02-19 Reply Admin

thrashaholic:

EvanED:
And what's exceptional? Is a network connection going down exceptional and a good place for an exception? Why not malformed user input? Both will happen from time to time, it's just a difference of degree. If something happens 1 in every 100 requests is it exceptional, or does it need to be 1 in 1000?

The network going down is a prime case for exceptions. Malformed user input is not. IMO.

I think you're stricter then on when you think exceptions are appropriate than I am. ;-)

I don't think I'd explicitly throw an exception on malformed input, but it depends on what the context is, and it's not on it's face something that I'd say I would consider bad style.

EvanED:
Hmm, what about C++'s bad_alloc exception, Java's OutOfMemoryError, or .Net's OutOfMemoryException? Are those aspects of those languages poorly designed? Or should memory be treated differently than other resources?

Of course not. Being out of memory is an exceptional condition. However, a variable being null is not. In C#, would you do :

[snip]

These are the types of instances I'm speaking of. (And I, unfortunately, see a lot) Having a network connection unavailable, or being OOM are exception cases. Examples like above, are not.

Gotcha. But on the other hand, there are plenty of times when not having a resource available IS a fine time for an exception, so I don't think that saying "Don't use exceptions to indicate absence of a resource" is silly advice.

Seeing try { crap } catch() { return false; } return true; makes my skin crawl.

The one exception (hah) to that I would note is if you're at a module boundary and changing from exceptions to a return code scheme or something like that.

2007-02-19 Reply Admin

Regex to support: RFC 2822 (Internet Message Format), RFC 2821 (SMTP), RFC 1123 (Requirements for Internet Hosts -- Application and Support)

Supports MSIE JavaScript client side .net validation. Also supports server side validation for other browsers.

^((([\t\x20][!#-'*+-/-9=?A-Z^-~]+[\t\x20]|"[\x01-\x09\x0B\x0C\x0E-\x21\x23-\x5B\x5D-\x7F]")+)?[\t\x20]<(([\t\x20][!#-'*+-/-9=?A-Z^-~]+(.[!#-'*+-/-9=?A-Z^-~]+)|"[\x01-\x09\x0B\x0C\x0E-\x21\x23-\x5B\x5D-\x7F]"))@(((a-zA-Z0-9.)+[a-zA-Z]{2,}|[(([0-9]?[0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5]).){3}([0-9]?[0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])]))>[\t\x20]|(([\t\x20][!#-'*+-/-9=?A-Z^-~]+(.[!#-'*+-/-9=?A-Z^-~]+)|"[\x01-\x09\x0B\x0C\x0E-\x21\x23-\x5B\x5D-\x7F]"))@(((a-zA-Z0-9*.)+[a-zA-Z]{2,}|[(([0-9]?[0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5]).){3}([0-9]?[0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])])))$

2007-02-19 Reply Admin

I had a friend that met the "owner" of the .fr TLD. Despite some pretty consistant cajoling, he could not convince him to let him receive email as reed@fr.

2007-02-19 Reply Admin

"It's done all the time... witness Yacc and Lex."

Actually, I think that would be the easiest way to accomplish the goal of validating an RFC-compliant address. Maintenance would be fairly easy with the original source files.

As others have pointed out, the goal may be a stupid one, but that's how I'd initially go about tackling it.

2007-02-19 Reply Admin

Here's an idea...

get them to type it in, and take their word for it, or if that's too trusting for you, send them a confirmation email...

FIXED!

AssimilatedByBorg · 2007-02-19 Reply Admin

Sigh

I wish I had a nickel for every time someone asked me to write code to validate email addresses, and thought it was simple.

I gently try to explain the incredible formats of addresses that are actually valid, and that eventually, they will really annoy somebody by using their restrictive idea of an email address.

The smart ones understand that.

It's the other kind of people that I don't know what to do with. The kind of people who ask, "why did your validation routine let through an email address with a typo in it?" (Seriously. This has happened.)

powerlord · 2007-02-19 Reply Admin

Suggan:
Actually, most validation algorithms disapproves of this perfectly valid address: me@se Why is that??

The answer is rather simple: You domain name only has one part to it. As I understand RFC 921, domains with only one part to it are assumed to be in the .arpa TLD.

2007-02-19 Reply Admin

Honestly, comments like "regexs are write once/read never" and "regular expressions are an excercise in arrogance on the part of the programmer" really rankle with me. How hard is it to compile the regex with the '/x' modifier (or with RegexOptions.IgnorePatternWhitespace in .NET) and write your regex like this:

^

# the first part of the email, we let it accept ',' because pointy haired 
# boss changed his name to 'wile e. coyote, genius' by deed poll and insisted 
# that we set 'wile e. coyote, genius@100%paradigmsynergies.com' up despite telling
# him it could never work. the upshot was that he praised us for our initiative in 
# reducing the amount of spam he receives.
[0-9A-z, \$\.]+

# fix by Maintenance Q. Programmer, Esq.:
# the cfo demanded we extend the validation so his email 
# john.citizen.the.greatest.cfo.ever@100%paradigmsynergies.com would be accepted.
# thank heavens this was so well documented otherwise I might never have had the 
# courage to pick up a manual on regular expressions and instead spent my time on
# message boards pissing and moaning about how hard they are to read.
(\.[0-9A-z]+)*

# for all of those lazy bum programmers out there who are too lazy to bother 
# learning regular expression syntax, this doesn't do anything fancy at all. it just
# matches an 'at' symbol.
@

# matches the domain name portion of the email address, although not very well. 
# Needs to accept the percent sign otherwise our company domain name won't work.
([0-9A-z%]+\.)*

# nobody yet knows what this bit does. if you work it out, drop me a line at 
# jed@100%paradigmsynergies.com
[0-9A-z]+

$

So there you have it. It's full of idiotic comments, thinly veiled insults, general silliness and a cameo appearance by an old friend, but as you can clearly see, Mr. Maintenance Q. Programmer, Esq. didn't have too much trouble working out what was going on and successfully made his change.

2007-02-19 Reply Admin

Several people have said that as the standard for email addresses is recursive then there is no way to write a regular expression for it. Given that email addresses have a maximum length, can a regexp be used even though the standard is recursive? For example, there can only be a maximum of 127 full stops in the domain part.

2007-02-19 Reply Admin

thrashaholic:
Exceptions (as the name implies) shouldn't be used to control the flow of the normal happenings of your code. If you're doing this, you have a fundamental misunderstanding of exception handling. Exceptions are to be used for EXCEPTIONAL CASES that you can not plan for.

A malformed E-mail address IS an exceptional case, albeit one that you can plan for. When asked to provide their E-mail address, I suspect most users will enter it correctly.

Let's say that you do something like this:

if (!formattedCorrectly(address)) { print(error_message); } else { sendEmailTo(address); }

You're checking the E-mail address before you send the message, which at first glance seems a logical approach.

But that means that either (A) sendEmailTo() doesn't check the address itself, in which case it's accepting on faith that its input is valid (not generally a safe programming practice), or (B) sendEmailTo() also calls formattedCorrectly(), in which case the address is getting checked twice, which is redundant; this may very well offset any "cost" of try/catch.

davidh

2007-02-19 Reply Admin

Janek:
This is how I do it in Java
try{ InternetAddress foo = new InternetAddress(emailCandidate); } catch (AddressException ex) { return false; } return true;

I beg your pardon?

there are no classes 'InternetAddress' and 'AddressException' that I know of in the Java standard libraries.

there is a class 'InetAddress' with two subclasses 'Inet4Address' and 'Inet6Address' (for obvious reasons), but these are only usably for IP addresses, not for the full mail address scheme.

if these should be home-grown utility classes (and you do have control over it), it would be preferable to have a boolean 'isValid()' method in lieu of having to use exception handling for the control flow.

2007-02-19 Reply Admin

.....you know sometimes the reason its not a big deal is because it isn't.......Mail Servers validate email addresses.....users validate addresses by recieving it....why don't we build reg-ex's to validate users first names or middle initials....jesus....try to make sure it conforms > (@) TLD (uk.com, ws, biz, com, net, org.uk) but you can never be 100% so move on its really not worth the hassle!

AssimilatedByBorg · 2007-02-19 Reply Admin

woohoo:
there are no classes 'InternetAddress' and 'AddressException' that I know of in the Java standard libraries.

javax.mail.internet.InternetAddress, found in J2EE libraries.

It's not standard in the sense of, "it's not J2 Standard Edition", but otherwise as close to standard as it gets :)

2007-02-19 Reply Admin

woohoo:
I beg your pardon?
there are no classes 'InternetAddress' and 'AddressException' that I know of in the Java standard libraries.

there is a class 'InetAddress' with two subclasses 'Inet4Address' and 'Inet6Address' (for obvious reasons), but these are only usably for IP addresses, not for the full mail address scheme.

if these should be home-grown utility classes (and you do have control over it), it would be preferable to have a boolean 'isValid()' method in lieu of having to use exception handling for the control flow.

They're both in JavaMail (specifically, javax.mail.internet).

2007-02-19 Reply Admin

And I should have added that there is a validate() method.

...But it's a void, that throws an exception if invalid.

:\

2007-02-19 Reply Admin

LizardKing:
Hmm, email address validation is a nasty one. I remember trying to validate by doing lookup on the hostname portion, only to get scuppered by mail servers that don't resolve but are valid. I forget the details as this was many aeons ago, however a more experienced colleague pointed me at some RFC's (and would have probably submitted my code as a WTF if this site had been around).

I don't care about stupid regexes. I don't accept mail from hostnames that don't exist. I also reject mail from hosts that don't use fully-qualified doman names with the helo, where the helo FQDN doesn't resolve, and where the sender domain doesn't exist. If they don't want to ensure their mail can be bounced correctly if need be (much less replied to normally), I don't have to accept it. And putting in those simple rules have reduced our spam by 90% (giving the anti-spam engine a bit of a rest).

2007-02-19 Reply Admin

Bat:
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
Some people, when faced with a regular expression, think "I know, I'll use Jamie Zawinski as an excuse and cherish my ignorance". Now they've got an infinite number of problems.

Every time someone thinks about quoting Jaime Zawinski, their computer should generate a cock-shaped sound wave and plunge it repeatedly through their skulls.

(PS Does anyone else find it funny that the quote about skins was never said by jwz?)

Validating Email Addresses

Leave a comment on “Validating Email Addresses”