- Feature Articles
-
CodeSOD
- Most Recent Articles
- Crossly Joined
- My Identification
- Mr Number
- intint
- Empty Reasoning
- Zero Competence
- One Month
- A Little Extra Padding
-
Error'd
- Most Recent Articles
- Not Impossible
- Monkeys
- Killing Time
- Hypersensitive
- Infallabella
- Doubled Daniel
- It Figures
- Three Little Nyms
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
[quote user="Matt Westwood"] Especially since the translation of "regular expression" in french is a troll in itself. Some (including me) argue for "expression rationnelle", other argues for "expression régulière" (which sort of translates to "recurring expression").[/quote]
"expression norme" or "expression canonique" may be appropriate.
Although now its etymology has been called into question, I now wonder what's so "regular" about a "regular expression".[/quote]
I believe regular is reference to regular language, which basics of regular expression is based on altough often expanded.
New top level domains might cause issues in future, as we can't always expect . in domain anymore. And really most validation things are quite hard to do if you get to details, why can't we just build some nice simple systems for everything?
Admin
Strictly speaking, you cannot validate an email address with a regular expression, actually. (That link there is basically a multi-paragraph explanation/excuse for why his regex doesn't match RFC 5321). The RFC allows for infinite nesting in the local-part and as any fule kno, that requires a stack machine (i.e. a context-free grammar) to parse at minimum. You can sort of fake it with this one: http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html
But don't do that. Check for an "@" sign. Maaaaaaaaybe check for a "." in the domain, although that's technically stricter than the RFC and will break things for intranet applications.
Admin
Nice
Admin
The Real WTF is that you had a long winded story about things getting lost in translation and could have skipped the fluffery
Admin
I pity the foo - or more specifically [email protected]: http://bar.com/
Admin
Hm. I saw Lost in Translation on a recent flight to Europe or Africa or somewhere. Rather enjoyed it, even over the plane noise.
Admin
Admin
No, they can't. Stop trying, because you're wrong. It doesn't matter what regex you use, validating anything more than "There must be an @" is wrong.
And for a subset of cases, "There must be an @" is also wrong.
Admin
No, it isn't. Quite simply because if you develop for an intranet, you usually do not take the email address as login identifier.
You use the email address as login for services like facebook or pinterest. And those services will communicate with you per email over the internet. "foobar" won't work, even if it is theoratically a valid email address. For the point of view of the service, "valid" is defined as: "an address I can send an email to", and for that definition of valid "foobar" will not do. "[email protected]" will do however, because if the account "foo" at bar.com exists, then the email can be delivered.
Admin
Once you figure that part out living in fear of regular expressions, which is a very good thing to do, just comes naturally.
Admin
Apart from that? Trim/normalise spaces, newlines, etc. as normal with user input, and watch out for commas, but otherwise whatever. (The number of non-malicious users with multiple spaces in their mailbox name is likely zero. And all of them could be considered to be malicious anyway for just existing.)
Admin
Yeah, it’s unlikely that anyone would add an automatic translator to a web page (did the existing non-English-speaking developers not notice something wrong?)
I did have to recently head off a localization effort here at work where a developer put all the UI strings through Google translate rather than talk to our paid translators. I don’t how how he chose “Retten” for “Save” considering Google translate gives “Speichern” as an alternative at or near the top of the list, but that’s how I noticed what he was doing. Luckily I had spent a few months in Germnany 20 years ago. Although I couldn’t remember the correct words, some just didn’t feel right, and “Retten” was obviously wrong.
But an automatic translator is an order of magnitude worse.
Admin
The actual shortest single regex that can match any uncommented RFC822 (et. seq.) address is (remove newlines before using):
And yeah, I bet most people don't use something like this in their Javascript. (By the way, this is a Perl regex, so your mileage may vary.)
Admin
So I went to my french colleague and ask him "Is that how you say ... in french?" to which his answer was: "no, it isn't".
The twist of the story was that those messages had been translated from english to french not by google translate but by some french employee in one of our offices in france.
Sometimes human translators are no better than google translate.
Admin
I'd build a two-tiered system...
First, we'd validate email addresses that can be delivered to. If it has [text]@[text with at least one dot], assume it can be delivered to. If not, don't accept it at all.
Next, run a different validation to see if it's a "normal" email address. For example, "[email protected]" would technically follow that first regex (and be valid according to the RFC), but doesn't end in one of the common TLDs. Give a warning if it fails this one (Something like "Your email address appears invalid, please verify it is correct"). The user can ignore the warning if they have some super-eccentric address, but it would catch most people who do something like forget the .com...
Captcha: damndum - damn that translation program was num[b] to the real meaning of that word...
Admin
Fails on fred&[email protected], a perfectly valid address (and an auto-responder… try it!).
Admin
Yeah. Tell that to the site that rejected my email address [email protected] because it didn't end in .com. After all, everyone knows that Internet addresses end in .com.
Admin
So now we're calling RFC-Compliant Email addresses "Malicious" simply for existing?
You gonna start complaining about non-English languages, too, since they force you to use Unicode?
Admin
Wikipedia agrees with "expression rationelles" (and god damn there are heavy trolls on wikipedia FR)
Admin
I once had a president's daughter who was a path-banging insensitive clod. I assure you it was no laughing matter.
Admin
Les gars… "expression régulière" ou "expression rationnelle" ou autre, tout ça c'est schtroumpf vert et vert schtroumpf.
Admin
You have bang paths? We had to print our emails, photo them on a wooden table, and hand-carry them to ihnp4 or uunet !
Admin
What does the whole Email validation has to do with the WTF? Context? I don't think so!
Admin
What I don't understand is why they had such big problems to communicate. All programmers share at lease one language (cursing) so they could easily have communicated like:
A: You just have do do the #%&!! job B: OK. I'll %*#!? do it.
(The result will then as usually be #?&%!)
Admin
Actually, while email addresses are relatively easy to describe in a grammar, it is impossible to create a regex which validates all syntactically correct addresses while rejecting all syntactically incorrect ones.
Usually, we just check to see if it looks like an email address.
Admin
And we have TRWTF
Admin
Admin
Lets not forget the WTF of having a site that is translated into french (and presumably other languages) and pushing it out to production without first pushing it to stage and testing that the site works -- in each language!!!
Admin
This regex is fucking wrong.
Admin
I tried fucking your sister but she was too ugly even with a bag over her head
Admin
Admin
No, it does not require a stack. Nothing needs to be done with each section. The regular expression for that part might be something like:
Let CHARS be the set of characters acceptable in a part. This would exclude "." and other characters (and I do not know the full list).
Let PARTEXPR = [CHARS]+
The local part regex would be something like PARTEXPR (. PARTEXPR)*
AIUI, the problem is that there is a comment part that is not amenable to regex.
Sincerely,
Gene Wirchenko
Admin
No kidding. So many WTFs here.
-The author, obviously -That lack of translation -People thinking that not validating client-side when you can is even remotely a good idea -Yvonne being tired all the time. She's probably pregnant with the president's kid. Is this a president's daughter origin story?
Admin
Admin
Admin
Ohlalalala, de quel bled venez-vous? Que la flotte vous emporte.
Admin
Wouldn't the compiler stop when it got to a unrecognized word/command such as "POUR" or was their compiler bilingual? O.o
Admin
JavaScript is an interpreted language.
Admin
In javascript? It'll throw an notice for unknown function pour, another for missing ) before ;, a third for ) without (, and a final one for no ; before {. Probably.
Admin
hah
http://thedailywtf.com/Comments/Nobody-Likes-Andy.aspx?pg=2#430480
akismet says that my link to this site is spam
Admin
Why bother validating e-mail addresses? Your code can, at best, reject "merde" and accept "[email protected]", but there is no user named "merde" on Yahoo. Sending a confirmation message takes a few seconds but validates not only the account format but the account existance also. Lots of sites do this during signup.
Admin
Admin
Admin
Admin
Admin
But I agree with you on the spacing. "dkf" looks like initials. If dkf ever gets an e-mail address with two spaces in dkf's full name, dkf will be a malicious user.
Admin
A temperamental programmer in Normandy? That's more than believable, that's the way our industry is.
A polite programmer in Normandy? No way.
Admin
Admin
One of the rare occasions that I can say "Been there, done that." here.
I think most people who begin employing autotranslation function to the "whole page content" does that. :P
Admin
Or, just as likely, the tool that extract text from the page failed to recognise "length" as a word due to being preceded by a period instead of whitespace.