- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
I made no mention of typing things in from a written list, and nor did the post I was responding to. In fact, that would count as "immediate user input". I'm talking about iterating over addresses in a list that is already in the computer. Perhaps in an XML file or CSV, or maybe an array of addresses randomly generated in memory.
However, since you ask - when things from a hard copy don't validate when turned into soft copy, the correct thing to do is normally to report it to a human, who can then choose the correct action by using their brains.
Admin
We need to return to the days of Compuserve, where ALL usernames were comprised of only numerals.
Admin
Don't forget the poor saps in the .museum domain that can never pass that 5 char TLD limit!
Admin
Admin
I use + all the time on one account that gets enormous numbers of spam.
I made a list of words to use after the plus and created a procmail filter to accept those combinations. For example, if the word list was maroon, acre, lightning, saturn, and piano, then the acceptable e-mail addresses for [email protected] would be [email protected], [email protected], [email protected], and [email protected]. I then would keep a copy of the list with me and if someone needed an address, I would give them the next one on the list.
Any e-mail coming in to those addresses was accepted, but if I started to get spammed at one combination, it would be a simple manner to reject any and all incoming e-mail to that address.
For e-mail coming in without the +something to [email protected], if it was encrypted with my PGP key, signed by the senders PGP key, from a specific whitelist of individual addresses, or originating from anyone on the local network, the e-mail was delivered okay.
All other e-mail is dumped into a trash folder. Originally I automatically responded back with a message telling the sender what it would take for the e-mail to be delivered, but that never seemed to do any good.
The number of spams on that address went from 50 or more a day to 0.
Admin
Wow, checking the link in the story and looking at the RFC 2822 email validation regex, I've confirmed that regex is dumb as shit and Perl programmers are even dumber.
Admin
or.. (hypothetically speaking of course), people could do things like: select password + 4 from auth_table where user_id = 'admin'. Then you'd get an error saying something like: "Error: could not convert hunter2 to integer" or whatever.. the point being that you want to try and cause a type mismatch to get the error message printed to the screen.
Admin
You can't confirm the email was received without access to the inbox.
You get your sender's domain flagged on an RBL.
Bandwidth waste.
You must be a regex and Perl fan.
Admin
Admin
Why does every site seem to want to do this these days (use the email address as the username)? Just over the last few months I've had several sites that had perfectly good login systems where I had chosen short, memorable usernames which then suffered a redesign in which they insisted on changing the login system to use my email address instead. I dislike that for a number of reasons, including that I store my myriad usernames/passwords in an iPhone app, and typing all-letter usernames like 'dtobias' is much simpler than typing e-mail addresses, which requires switching to the symbols touchscreen keyboard with the at sign in it.
Admin
TRWTF are all of the commenters who are confusing the validity of an email address with the ability to send an email to said address. Those are two separate, and distinct, functions that compliment each other, but should be treated separately. In most real-world production systems it may be unfeasible from a business perspective to waste the customer's valuable time trying to validate whether or not email can be sent to an email address; PHP, for example, does provide getmxrr() to test a domain for valid MX records. The problem is, with enough load and traffic, this will block your website until the function returns. So it generally is acceptable to only validate the format of an email, and worry about bounces on whatever system that actually sends out emails (i.e., newsletters, etc). This is why a lot of sites have adopted the paradigm of forcing a user to validate their account via email.
Admin
Not relevant, you can't do that with a regular expression either. Whether it is received is a different matter to whether it was sent.
By sending the email you have proven that the address is parseable enough to be able to send email to it. If the sender fails to send, you know that it is not parseable enough.
Surely this is better than proving that the address only has alphabetic characters with an "@" in the middle and a "." 3-4 characters from the right-hand edge. Such proof has very little to do with whether it is a valid email address or not.
If you use this method to validate a big list, without alerting the recipients, then set up some mechanism whereby the emails don't actually reach the recipients. Perhaps configure DNS so that the mail server sends everything to itself.
That will still be easier and more correct than writing a regular expression that accepts all valid email addresses and rejects all invalid ones (considering that even the enormous one that people have already linked to above requires you to preprocess the address before testing it with the regex).
If you are actually in charge of your mail server, then it may even be easier than writing a proper parser for emails, I'm not sure, but it's certainly more DRY.
A) That depends on your definition of waste. I'm sure that for most applications, the bandwidth required to do it this way is much cheaper (in money) than the developer time involved in creating an effective email address validator.
B) If you actually only have a small, finite amount of bandwidth and you absolutely must do the validation without using any, then see my response to 2, above.
Admin
Admin
You ought to use an RFC-compliant fake address like [email protected] or [email protected], instead of things like you mentioned which might be somebody's actual address.
Admin
If you are collecting an e-mail address for the sake of having an e-mail address and do not intend on ever sending an e-mail to that address there's no reason to validate. It's a waste of resources. Hell, if you're never going to use it, why even ask for it?
Admin
I used this code for a site I was managing: http://www.dominicsayers.com/isemail
It reduced the number of trouble calls due to invalid email addresses significantly once it was implemented. What I especially liked was the fact it did a MX lookup on the domain to make sure it was valid as the final step. Catches all the hotmal.com, yhaoo.com etcs that would pass normal validation.
Ofcourse they can still misspell the first part of their address since we can't validate that BUT then we have them enter the email address twice and error if they differ to try and mitigate that as well :)
Admin
As with e-mail validation as with other validation, they are meant to just help the user, not to slap him on the wrist when he makes a mistake.
I think the most errors in users e-mails are simple typos like [email protected] instead of [email protected] and the mighty 100-line regex doesn't help against that.
Also, if you need a valid e-mail address from a user, send a confirmation mail, this way you can be sure the user double checks what he enters.
If you don't want bogus data in your database, don't ask for data which is going to be bogus most of the time. Like if you make 'hobbies' a required field, you can bet it's going to be something like "adsf" most of the time.
Admin
Just for recording
Guy posting image saying "First Post" is now annoying me lots.
Admin
I guarantee you that I've seen that CodeSOD on a couple of sites. I have domain elite-systems.org registered but when I came across that I ended up having to register domain elitesystems.org as well and setting up an alias.
Admin
There's something called "double opt-in" that's very useful if you're trying to maintain a mailing list.
What it is is the user subscribes (website or email) to your mailing list. The server sends back a response which the user then simply hits reply and send. This validates the email between the server and client (if there's an RBL in progress, the user won't get the confirmation email and the server won't bother sending emails to a blackhole'd address). This also confirms that the user WANTS the email. Perhaps the user made a typo and the email is going to someone else's account. All they have to do is ... nothing. Or hit delete. And they won't be bothered again. Hence, double opt-in. The user opted in once, and confirmed that yes, they really really really do want the email.
Yes, I have been "subscribed" to many email lists by those who don't check (probably spammers and the like). I write a procmail recipe to filter out those emails and delete them, because their unsubscribes don't work. Hell, I keep getting emails from British Telecom about some phone services. Never could figure out how to get access to the account so I could get some phone cards or upgrade the guy's bill or something.
I also get the occasional joe-job with someone using a whitelist. I do whatever it takes to get that email accepted because they obviously don't know about backscatter spamming. (I've always wanted to use a public wifi and the like to send a pile of emails to those addresses with a fake header leading back to RBL honeypots to get those whitelist domains blocked...).
Admin
Please make the bad code go away.
Admin
Even properly validators sometimes forbid '+', which pisses me off. Haven't they heard of plus addressing?
Admin
The way to code is to use proper escaping. (SQL can do this behind the scenes for you because the API has ways to pass statements and data as separate string arguments.)
The only place where you need to check for injection attacks is in your unit tests. If your unit tests uses all the special characters that could be used to perform attacks, and verify that they are properly escaped and unescaped, then you are well protected against injection.
If OTOH you try to just forbid "invalid" characters at a higher level before saving the email address in your database, you will be rejecting perfectly valid email addresses.
It may come as a surprise to many people, but the local part of an email address permit many characters. In fact the original spec permitted every single 7-bit ascii character. Yes, all 128 of them, including the NUL character, the NUL character didn't even have to be escaped. Only CR, LF, ", and \ had to be escaped by putting a \ in front of it.
The wording in the spec about valid characters was: "any one of the 128 ASCII characters (no exceptions)"
A later update to the spec says that you should not define email addresses that require quoting. But you still have to support it for interoperability.
Admin
This is what I was referring to, and it's very task-specific.
If you want to confirm that the email address entered belongs to:
then sure, use the standard call-and-response stuff. If you just want to validate if "[email protected]" is okay fot possible future use, then all you have is syntax checking.
Admin
The real wtf is that a + is perfectly ok in e-mail addresses.
Admin
If you want to prove that the address belongs to a living person, should you demand a birth certificate? Is the short form OK?
Admin
...and RFC 2822 was superseded in October 2008 by 5322.
Cheers
Admin
You could at least do a Luhn checksum. It works for most cards.
Admin
It may seem dumb -- and it's certainly counterintuitive -- but it's true. RFC822 provides for a very wide range of valid syntax. The only correct e-mail address validation regexes are on the order of a page in length. Please see http://www.linuxjournal.com/article/9585 for an overview of some of the issues involved.
Admin
Tears literally welled up in my eyes.
Admin
Most SANE people would use THIS regex to validate emails:
http://www.ex-parrot.com/pdw/Mail-RFC822-Address.html
NOT SPAM AKISMET
Admin
If you want to get technical about it, this regexp should catch them all: .+@.+ (in theory it would be possible to have a mail server at a TLD :-))
Admin
^A-Za-z0-9@([A-Za-z0-9]+)(([.-]?[a-zA-Z0-9]+)*).([A-Za-z]{2,})$
Admin
Thank you soo much its working