- Feature Articles
- CodeSOD
-
Error'd
- Most Recent Articles
- Secret Horror
- Not Impossible
- Monkeys
- Killing Time
- Hypersensitive
- Infallabella
- Doubled Daniel
- It Figures
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
This is how I do it in Java
try{ InternetAddress foo = new InternetAddress(emailCandidate); } catch (AddressException ex) { return false; } return true;
Admin
Admin
Hahaha! Nice.
Mol is right, validation should eliminate the basic errors in entering an email. If the user does not reply with a confirmation, too bad.
Admin
About the only thing I ever bothered to check for in email validating regexps is that the email address won't trigger some weird kind of non-mailbox delivery on the local host (e.g. "/dev/sda@localhost" or "|/bin/sh@[127.0.0.1]"). This was using /usr/sbin/sendmail to submit mail.
Now I open a SMTP connection to a relay server. Modern SMTP servers are already adequately hardened against malicious or merely uncooperative email addresses, so it's a waste of my time to duplicate these features.
I only check for the absence of the following characters:
CR, LF, NUL - explicitly prohibited by RFC
% ! - abused by spammers on open relays. Get a real email address if you are living behind one of these, you luddite.
/^[^\r\n\0>!%]+@[^\r\n\0>!%]+$/os
Many MTA's don't support anything like full RFC822 email syntax, and only support "RCPT TO: <" string-of-some-characters ">" with various permitted values for the "string-of-some-characters", and various transformations done on the text if special characters like spaces or parens are used.
RFC822 was designed to allow any damn email address from any damn local email system to be encoded into an RFC822 email address. It's not feasible for me to validate addresses from some legacy email system that still runs on a PDP-10 somewhere, so I don't try.
If I'm required to determine the validity of the email address I'll send a token to the address and require the user to enter the token before I talk to them again. That tests not just the validity of the email address, but the reliability and availability of the whole return path to the requesting user and the user's willingness to cooperate with receiving mail at that address--a much more useful assertion.
Admin
For those who haven't tried it, there are four cases:
It actually works, RCPT returns OK if the address is valid and an error otherwise.
It's totally broken, RCPT returns an error if the address is valid and OK otherwise. People with this kind of mail host don't get much mail. This breed is rare but not extinct.
The remote host graylists all SMTP hosts that contact it for the first time, in which case you'll get an RCPT temporary error on the first connection, then a correct answer when you retry between two minutes and 24 hours later.
The remote host is not the final destination host but a gateway without access to a database of local addresses for validation, so it says OK to all RCPT commands. Some time later a bounce message will be generated for the invalid ones and sent to the envelope sender address. This kind of host is really damn annoying and I get hundreds of messages from them every day bouncing messages containing Windows viruses because the message had my email address as the sender.
Admin
It pains me to see you post this so soon after my .@._ comment. Your check would think that @@@.@ is a valid email address.
Admin
The utterly insane regex listed earlier in this thread is actually NOT 100% RFC822-compliant. It takes a shortcut by placing an arbitrary restriction on the nesting depth of the comments.
A regular expression actually CANNOT validate an E-mail address according to RFC822. The language described in RFC822 is recursive and cannot be normalized to an iterative description. If you can't normalize it like this (that is, if there's no way to write the language in such a way that you never have to refer to a symbol that hasn't been defined yet and you never have a rule that refers to itself) then it is, technically, impossible to construct a regular expression for it.
That said, there's no value in validating an address against the full force of RFC822, as discussed earlier in this thread; not many MTAs -- and even fewer desktop mail applications -- conform to the full scope of the "requirements" and only implement the most commonly used subset.
Admin
Regular expressions are maintained by throwing them away and writing new ones. You do this when they don't work. When they do, just leave them alone.
If you don't know whether a given regular expression works, it doesn't.
Admin
From the VMWare converter registration page:
function isValidEmail(str) {
}
The frustrating thing is that I sent them an e-mail a while back to complain about their old e-mail 'validator', and they changed it to something more sensible, but have now regressed to this which is even worse than the original.
Admin
Clearly this is not an appropriate use of regular expressions at all.
See the following articles:
http://blogs.msdn.com/oldnewthing/archive/2006/05/22/603788.aspx
http://blogs.msdn.com/larryosterman/archive/2005/01/07/348548.aspx
http://blogs.msdn.com/larryosterman/archive/2005/01/10/350135.aspx
Admin
MMMMM...using exceptions to handle expected program flow. Tasty. Not sure if Java would allow you to do that another way without using exceptions, but that is really bad form.
Now if you'll excuse me I'm going to go hang myself.
Admin
This is one of the many reasons why I believe Java developers are evil and must be stopped.
Oh, did I say "evil"? I meant "stupid".
Admin
As far as I understand, the following Regexp will perfectly work, too:
([^\x00-\x20\x22\x28\x29\x2c\x2e\x3a-\x3c\x3e\x40\x5b-\x5d\x7f-\xff]+|\x22([^\x0d\x22\x5c\x80-\xff]|\x5c[\x00-\x7f])\x22)(\x2e([^\x00-\x20\x22\x28\x29\x2c\x2e\x3a-\x3c\x3e\x40\x5b-\x5d\x7f-\xff]+|\x22([^\x0d\x22\x5c\x80-\xff]|\x5c[\x00-\x7f])\x22))\x40([^\x00-\x20\x22\x28\x29\x2c\x2e\x3a-\x3c\x3e\x40\x5b-\x5d\x7f-\xff]+|\x5b([^\x0d\x5b-\x5d\x80-\xff]|\x5c[\x00-\x7f])\x5d)(\x2e([^\x00-\x20\x22\x28\x29\x2c\x2e\x3a-\x3c\x3e\x40\x5b-\x5d\x7f-\xff]+|\x5b([^\x0d\x5b-\x5d\x80-\xff]|\x5c[\x00-\x7f])\x5d))
It is an absolute hell to maintain but since the standards probably won't change in the next few hundred years (why should they?) I find it quite acceptable.
Admin
... and of course, the real comedy here is that Igor's masterpiece code is actually better than the regex - there is some hope of maintaining it.
Admin
New improvements to the Perl5 regexp engine allow us to write a much easier to read regexp to validate email addresses. Courtesy of Abigail:
use 5.9.5; # In fact, you need the newest blead.
my $email_address = qr { (?(DEFINE) (?<addr_spec> (?&local_part) @ (?&domain)) (?<local_part> (?&dot_atom) | (?"ed_string)) (?<domain> (?&dot_atom) | (?&domain_literal)) (?<domain_literal> (?&CFWS)? [ (?: (?&FWS)? (?&dcontent))* (?&FWS)? ] (?&CFWS)?) (?<dcontent> (?&dtext) | (?"ed_pair)) (?<dtext> (?&NO_WS_CTL) | [\x21-\x5a\x5e-\x7e])
}x;
http://groups.google.com/group/comp.lang.perl.misc/browse_thread/thread/221ac5c8159f5ef4?hl=en
Admin
It matters not? If it's used correctly, it matters the world! Having generated parts of code is fine as long as you don't have to modify the generated parts. It's done all the time... witness Yacc and Lex. Heck, witness a compiler. Do you think that convoluted but terribly efficient assembly code produced by a compiler is bad because the fact that it was compiled matters not?
If all it does is plop out a RegEx and wrap "isvalid(email_address)" around it, then this is a perfectly valid approach.
(Of course, if it goes into a file that needs to be modified, then you can no longer regenerate the file if you make changes/fix bugs/find a better way/etc. without losing those changes. Then it's just as bad as if a human wrote it.)
Admin
It won't. There's no "TryParse" equivalent to this. You shouldn't design code to use exceptions to handle expected conditions, but occasionally when there is an impedance mismatch like this with a library routine, it is necessary.
Good riddance.
Admin
Yes, using a standard library routine to validate an email address, as opposed to writing your own unreadable, unmaintainable, and broken regular expression to do it, is both evil and stupid. Idiot.
Admin
I would guess that the piece you were missing is the idea of an MX (mail exchanger) record. You can have a domain, such as email-handled-elsewhere.com, and it has an MX record in DNS of mail-handler.com (or several records, with several different hosts), and the mail goes not to the server mentioned in the email address, but to (one of) the one(s) in the MX record(s).
Admin
Using anything coded in Java is evil and stupid. You whine because you can't read machine-generated code and then you load an entire virtual machine to make sure an e-mail address meets the approval of a library you've never read.
Let us know when you learn to code.
Admin
That's what I feared. However I see code like this used far too many times (when it shouldn't) to simply keep quiet about it. I honestly could not care less about the general WTF-ness of Java's standard libs (because I f-in hate Java), but seeing code like that makes me want to vomit. As it should.
Wow, mature there sir, real mature. Your momma's calling, she said you need to clean up the basement after you're done playing WoW.
Admin
You don't need to escape '.' inside a character class. But yes, there are many different ways to kinda validate an email address.
I wish they'd stop posting stories like this on TDWTF because every time they do, the usual furor erupts with 99% of people not being aware how complex the RFC actually is, or what a monstrous regex it takes to meet it.
Then a wise soul points out that validating the formatting of an email address isn't really doing anything in the first place, and we all end up right where we started.
Admin
Regexes are not bad. Just because people don't make use of the /x modifier and comments doesn't make regexes themselves evil. They are incredibly useful, and those that think they add an extra problem to the task at hand just haven't bothered to learn how to wield the power of regexes.
If you find regexes confusing or difficult you need to read Freidl's book. If you still have trouble then you are not a geek and should not be programming.
Admin
and here's another
function validateEMail(elm, bRequired) { if (arguments.length == 1) bRequired = true; var emailstr = elm.value; if (emailstr.length == 0 && !bRequired) return(true); var emailPat=/^(.+)@(.+)$/ var specialChars="\(\)<>@,;:\\\"\.\[\]"; var validChars="[^\s" + specialChars + "]"; var firstChars=validChars; var quotedUser="("[^"]")"; var ipDomainPat=/^[(\d{1,3}).(\d{1,3}).(\d{1,3}).(\d{1,3})]$/; var atom="(" + firstChars + validChars + "" + ")"; var word="(" + atom + "|" + quotedUser + ")"; var userPat=new RegExp("^" + word + "(\." + word + ")$"); var domainPat=new RegExp("^" + atom + "(\." + atom +")$");
Admin
The code is readable, maintainable, and acceptably efficient. And those are the criteria for determining what is good code, not whether or not it meets some arbitrary idea of "purity" that has no useful rationale or general applicability in the real world.
Admin
Try/Catch is expensive. (Severity of the expense depends on language/compiler/platform/VM/etc..) They should not be used in places where a simple If statement would suffice. That's all the reason I need right there.
The other reasons are mostly philosophical, if not accepted "form". Exceptions (as the name implies) shouldn't be used to control the flow of the normal happenings of your code. If you're doing this, you have a fundamental misunderstanding of exception handling. Exceptions are to be used for EXCEPTIONAL CASES that you can not plan for.
If you actually code the logic for something happening, you probably shouldn't use an exception to control the flow of said logic. "Is something null?" "Is this variable actually instantiated?", etc..are all cases where IF (WHATEVER) is a lot cheaper and vastly more "proper" than using exceptions to branch.
<a rel="nofollow" href="href="http://www.mortench.net/archive/eh.pdf" target="_blank" title="href="http://www.mortench.net/archive/eh.pdf">Here is a thesis you might find interesting.
This is a more high level blog post by the author of above linked thesis.
Some of the more accepted Exception "Best Practices" include:
"Don't use exceptions to indicate absence of a resource" "Don't use exception handling as means of returning information from a method" "Use exceptions for errors that should not be ignored"
Etc..etc..etc..indeed, the most important rule for exception handling is: Don't do it.
(People who know WTF they're doing will understand that)
Admin
"@" better known as: "a human or elf"
Admin
So it's the name? What if the language called them something else? For instance, Common Lisp has something similar it calls conditions. Would it be appropriate to signal a condition?
And what's exceptional? Is a network connection going down exceptional and a good place for an exception? Why not malformed user input? Both will happen from time to time, it's just a difference of degree. If something happens 1 in every 100 requests is it exceptional, or does it need to be 1 in 1000?
Hmm, what about C++'s bad_alloc exception, Java's OutOfMemoryError, or .Net's OutOfMemoryException? Are those aspects of those languages poorly designed? Or should memory be treated differently than other resources?
Exceptions ALWAYS return information from a method, specifically "this method could not complete normally." If he means as a substitute for return, yes.
I can understand a dislike for exceptions as a whole, but I don't see how you can approve of some uses and yet remain limited enough to think that malformed addresses is an inappropriate use. (At least absent profiling information that tells you so.)
Admin
Since the RFC allows comments to be indefinitely deeply nested, there can't be a single REGEX to work on every single valid email address. Even the 6598 byte long REGEX in Appendix B of Mastering Regular Expressions assumes zero spaces and no hash marks only allows for doubly nested comments.
Admin
In the case we're speaking about, probably not. I'm not familiar with CL, however.
The network going down is a prime case for exceptions. Malformed user input is not. IMO.
Of course not. Being out of memory is an exceptional condition. However, a variable being null is not. In C#, would you do :
try { object thrash = Request.QueryString["something"]; //do branch 1 } catch (Exception e) { // do branch 2 }
or you would rather see:
if (Request.QueryString["something"]!=null) { // do branch 1 } else { // do branch 2 }
?
These are the types of instances I'm speaking of. (And I, unfortunately, see a lot) Having a network connection unavailable, or being OOM are exception cases. Examples like above, are not.
Of course that's what's meant here. Seeing try { crap } catch() { return false; } return true; makes my skin crawl.
If it's something that you can check for with a minimal amount of conditional code, then it's not a good case for exceptions. If it's something that no sane amount of conditional branching could ever solve, then a try...catch block is appropriate.
Admin
Why do wonks always bring up the RFC's utter insanity whenever email comes up? It's 2007, not 1987. It's hard to find a specific place to draw the line, but it should have been done for RFC 2822. If at least 20% of MTAs online can't actually transfer your message, it's not an email address for all practical purposes, and in your case it's more like 99%.
The whole RFC is just an exercise in what goes wrong when standards are designed around including everyone and not forcing anyone to change, and practices that are long gone or have used at all are left in because they sound cool.
BTW, why are you all referencing 822? 2822 obseleted it.
Admin
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
Some people, when faced with a regular expression, think "I know, I'll use Jamie Zawinski as an excuse and cherish my ignorance". Now they've got an infinite number of problems.
Admin
I believe because all TLD consist of 2,3 or 4 characters and so "se" would be considered a TLD. Is it? I dunno. Even if it was one you can't just email that directly, it would be like emailing "me@com"
Admin
I think you're stricter then on when you think exceptions are appropriate than I am. ;-)
I don't think I'd explicitly throw an exception on malformed input, but it depends on what the context is, and it's not on it's face something that I'd say I would consider bad style.
Gotcha. But on the other hand, there are plenty of times when not having a resource available IS a fine time for an exception, so I don't think that saying "Don't use exceptions to indicate absence of a resource" is silly advice.
The one exception (hah) to that I would note is if you're at a module boundary and changing from exceptions to a return code scheme or something like that.
Admin
Regex to support: RFC 2822 (Internet Message Format), RFC 2821 (SMTP), RFC 1123 (Requirements for Internet Hosts -- Application and Support)
Supports MSIE JavaScript client side .net validation. Also supports server side validation for other browsers.
^((([\t\x20][!#-'*+-/-9=?A-Z^-~]+[\t\x20]|"[\x01-\x09\x0B\x0C\x0E-\x21\x23-\x5B\x5D-\x7F]")+)?[\t\x20]<(([\t\x20][!#-'*+-/-9=?A-Z^-~]+(.[!#-'*+-/-9=?A-Z^-~]+)|"[\x01-\x09\x0B\x0C\x0E-\x21\x23-\x5B\x5D-\x7F]"))@(((a-zA-Z0-9.)+[a-zA-Z]{2,}|[(([0-9]?[0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5]).){3}([0-9]?[0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])]))>[\t\x20]|(([\t\x20][!#-'*+-/-9=?A-Z^-~]+(.[!#-'*+-/-9=?A-Z^-~]+)|"[\x01-\x09\x0B\x0C\x0E-\x21\x23-\x5B\x5D-\x7F]"))@(((a-zA-Z0-9*.)+[a-zA-Z]{2,}|[(([0-9]?[0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5]).){3}([0-9]?[0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])])))$
Admin
I had a friend that met the "owner" of the .fr TLD. Despite some pretty consistant cajoling, he could not convince him to let him receive email as reed@fr.
Admin
"It's done all the time... witness Yacc and Lex."
Actually, I think that would be the easiest way to accomplish the goal of validating an RFC-compliant address. Maintenance would be fairly easy with the original source files.
As others have pointed out, the goal may be a stupid one, but that's how I'd initially go about tackling it.
Admin
Here's an idea...
get them to type it in, and take their word for it, or if that's too trusting for you, send them a confirmation email...
FIXED!
Admin
Sigh
I wish I had a nickel for every time someone asked me to write code to validate email addresses, and thought it was simple.
I gently try to explain the incredible formats of addresses that are actually valid, and that eventually, they will really annoy somebody by using their restrictive idea of an email address.
The smart ones understand that.
It's the other kind of people that I don't know what to do with. The kind of people who ask, "why did your validation routine let through an email address with a typo in it?" (Seriously. This has happened.)
Admin
The answer is rather simple: You domain name only has one part to it. As I understand RFC 921, domains with only one part to it are assumed to be in the .arpa TLD.
Admin
Honestly, comments like "regexs are write once/read never" and "regular expressions are an excercise in arrogance on the part of the programmer" really rankle with me. How hard is it to compile the regex with the '/x' modifier (or with RegexOptions.IgnorePatternWhitespace in .NET) and write your regex like this:
So there you have it. It's full of idiotic comments, thinly veiled insults, general silliness and a cameo appearance by an old friend, but as you can clearly see, Mr. Maintenance Q. Programmer, Esq. didn't have too much trouble working out what was going on and successfully made his change.
Admin
Several people have said that as the standard for email addresses is recursive then there is no way to write a regular expression for it. Given that email addresses have a maximum length, can a regexp be used even though the standard is recursive? For example, there can only be a maximum of 127 full stops in the domain part.
Admin
Let's say that you do something like this:
if (!formattedCorrectly(address)) { print(error_message); } else { sendEmailTo(address); }
You're checking the E-mail address before you send the message, which at first glance seems a logical approach.
But that means that either (A) sendEmailTo() doesn't check the address itself, in which case it's accepting on faith that its input is valid (not generally a safe programming practice), or (B) sendEmailTo() also calls formattedCorrectly(), in which case the address is getting checked twice, which is redundant; this may very well offset any "cost" of try/catch.
davidh
Admin
I beg your pardon?
there are no classes 'InternetAddress' and 'AddressException' that I know of in the Java standard libraries.
there is a class 'InetAddress' with two subclasses 'Inet4Address' and 'Inet6Address' (for obvious reasons), but these are only usably for IP addresses, not for the full mail address scheme.
if these should be home-grown utility classes (and you do have control over it), it would be preferable to have a boolean 'isValid()' method in lieu of having to use exception handling for the control flow.
Admin
.....you know sometimes the reason its not a big deal is because it isn't.......Mail Servers validate email addresses.....users validate addresses by recieving it....why don't we build reg-ex's to validate users first names or middle initials....jesus....try to make sure it conforms > (@) TLD (uk.com, ws, biz, com, net, org.uk) but you can never be 100% so move on its really not worth the hassle!
Admin
javax.mail.internet.InternetAddress, found in J2EE libraries.
It's not standard in the sense of, "it's not J2 Standard Edition", but otherwise as close to standard as it gets :)
Admin
They're both in JavaMail (specifically, javax.mail.internet).
Admin
And I should have added that there is a validate() method.
...But it's a void, that throws an exception if invalid.
:\
Admin
I don't care about stupid regexes. I don't accept mail from hostnames that don't exist. I also reject mail from hosts that don't use fully-qualified doman names with the helo, where the helo FQDN doesn't resolve, and where the sender domain doesn't exist. If they don't want to ensure their mail can be bounced correctly if need be (much less replied to normally), I don't have to accept it. And putting in those simple rules have reduced our spam by 90% (giving the anti-spam engine a bit of a rest).
Admin
(PS Does anyone else find it funny that the quote about skins was never said by jwz?)