• QJo (unregistered) in reply to Norman Diamond
    Norman Diamond:
    Someone:
    Hey everyone, I'm having a problem related to this story. I'm trying to make a list of email addresses that I can validate entries against, but typing it all out is really slow. Can some people help me out?

    Here is what I have so far:

    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    At least you'll be immune from this prank: http://www.youtube.com/watch?v=gJuGKJaSyVU

    Akismet says I should sell you stolen credit cards instead.

    TRWTF there is going into hysterical panic at the thought of having an ordinary everyday insect on your back. The correct response (to being told "You've got a bee on your back") is to just carry on doing whatever you were doing, sure in the knowledge that once it has finished its innocuous and harmless business on your clothing, then it will simply fly away and go somewhere else.

  • Walky_one (unregistered)

    Funny thing that E-Mail addresses like

    MyEmail@[email protected] seem to be valid by the above code... (More funny that nobody pointed that out so far)

  • Tim (unregistered)

    TRWTF is that there's not much point in validating the syntax of the email address closely because that doesn't prove that the email address actually exists, let alone that it is the correct one.

    if someone makes a mistake typing in an email address, most of the time what they actually type will be a valid email address and will either bounce or go to the wrong person

  • (cs) in reply to Warren
    Warren:
    OK, so they should have had a return type of boolean and used exceptions for the errors....
    Not sure if serious or troll.
  • csrster (unregistered) in reply to faoileag
    faoileag:
    csrster:
    The real WTF is surely not using the Composite pattern to aggregate multiple validation rules in a single rule. Then each individual rule can be ruthlessly and independently unit-tested. Plus you're able instantiate these generalised validation rules using an Abstract Factory Pattern and an appropriate dependency-injection framework. Here, let me show you some UML ...
    I'm missing the XML in your design. Without XML in it, it's definitely not enterprisey enough!

    How do you think I'm configuring my dependency-injection framework? Lot's of lovely XML there ...

  • The Fury (unregistered) in reply to Kuba
    Kuba:
    Citron:
    The real WTF is "alphanumeric characters only". With all these possible e-mail-addresses out there, the only useful thing to do for e-mail validation is to check, if the ser may have misstyped his e-mail-address, by checking for '@' and '.'. Use an opt-in to check if the user has access to the address.
    I fucking don't get why on Earth one just won't point to the applicable RFCs and be done with it. Do we really have to paraphrase internet standards all the time? Don't people have better things to do? Writing "specs" for what is a valid email address is like writing "specs" as to how a valid TCP/IP connection should look on the wire. It's like going full retard and being proud of it.

    Actually why would you bother to even do this. Why would you want a developer who wouldn't know to look up the relevant RFC writing your app anyway?

  • (cs)

    Seems to me about the only way of finding out for sure if an email address is valid (but not necessarily real) is to an nslookup on the host portion. Of course if your code waits 30 seconds for the resolution to time out then . . .

    As a side note, one nice thing about LotusScript (I'll wait for all the Lotus Notes jokes to die down before continuing) is that you can use "", {} or || as string delimiters, so including " in your string is easy. Very handy when writing HTML codes.

    print {Hello World!}

  • Anon (unregistered) in reply to faoileag
    faoileag:
    pjt33:
    faoileag:
    think that the code ... does not represent a wtf per se.
    Regardless of the spec, any code which could be compressed by 90% with a loop or two is a WTF unless it's explicitly commented that the loop was unrolled with a significant impact on performance.
    For a peer review, I would agree with you completely. However, this is code delivered by an offshore team. In an ideal world, you run your pre-written unit-tests against it and tell the offshore team which have failed if any fail. You do not look at the codebase itself, unless somewhere in your contract with the overseas company you have a clause that explicitly states that the code itself must also meet certain standards. Which is normally not the case. So who cares if they do the loop unrolling themselves? Let them. Perhaps they get paid by lines of code.

    The dev who has to clean this up in a few years cares. As does the dev who has to work on the 400 bugs generated by the shit code.

  • Anon (unregistered) in reply to The Fury
    The Fury:
    Kuba:
    Citron:
    The real WTF is "alphanumeric characters only". With all these possible e-mail-addresses out there, the only useful thing to do for e-mail validation is to check, if the ser may have misstyped his e-mail-address, by checking for '@' and '.'. Use an opt-in to check if the user has access to the address.
    I fucking don't get why on Earth one just won't point to the applicable RFCs and be done with it. Do we really have to paraphrase internet standards all the time? Don't people have better things to do? Writing "specs" for what is a valid email address is like writing "specs" as to how a valid TCP/IP connection should look on the wire. It's like going full retard and being proud of it.

    Actually why would you bother to even do this. Why would you want a developer who wouldn't know to look up the relevant RFC writing your app anyway?

    A dev who uses the RFC standard instead of the PHB standard is not long for most jobs.

  • Anon (unregistered) in reply to n9ds
    n9ds:
    Seems to me about the only way of finding out for sure if an email address is valid (but not necessarily real) is to an nslookup on the host portion. Of course if your code waits 30 seconds for the resolution to time out then . . .

    Not sure if serious.

  • Vlad Patryshev (unregistered)

    They may be stupid, but they are also wrong. '+' SHOULD be allowed.

  • Herr Otto Flick (unregistered)

    Andrew completed his functional design document detailing valid email address requirements - the address must contain an "@" symbol, must include a domain name, alphanumeric characters only, and punctuation like underscores, hyphens, periods are all OK

    Andrew sounds exactly like the kind of tool who thinks he is clever for incorrectly specifying how to solve a problem that was fixed back in the 80s. No need to know anything, just spout shit and hope it sticks - much like his equally inept "remote coding team".

  • Jay (unregistered) in reply to XXI
    XXI:
    It is always much easier to make a whitelist of what you support than trying to blacklist all possible unsupported cases. This is the way it should be done

    Well, I'd say "often easier", not necessarily "always".

    Like if your function won't work with a quote mark but will work with any other Unicode character, it's sure easier to say char!='"' then char=='a' or char=='b' or char=='c' or ... or char=0x128 or char=0x129 or ... etc

    Or if you accept user ids of any length except 9, it's easier to write len!=9 rather than len==1 or len==2 or len==3 ... or len==8 or len==10 or ...

  • Jay (unregistered) in reply to asd
    asd:
    insert rant about how data validation is over used....

    It's easy to spot a bodgey address if it's: No Way You gunna get my emial MF

    than if it's: [email protected]

    One of the questions that needs to be considered is "Why do they need an email address?". Normally it's so they can send you stuff (in which case send something with a link and see if it works). Sometimes it's under the pretext of verifying your identity (or at least somehow holding you to account for how you use their site - and again, if you NEED the address, verify it via an emailed link). Sometimes it's because they want to send you SPAM (actually in all cases this could be the case - and again SEND A LINK).

    The only time when you don't need to send a link to verify, is in cases where you don't intend to use the email. But if you don't intend using it, why ask for it?

    Of course everyone does, simply because they like having your information, but the only certain way to see that the address exists (even if for only a fleeting moment) is to have an email send there and somehow verified.

    Who gives a shit about the actual format? This is a classic case of programmers overthinking the problem and reengineering the wheel.

    There's a difference between a valid email address and a real email address. Many people ask for valid when they mean real - at the end of the day, a valid email address that doesn't exist is about as useful as an invalid one. And if you demand a valid one, I'll either use my enemies one (and hope you don't verify it) or make up some stupid one that doesn't exist anyway....

    If you really absolutely need to be sure that the email is real, sure.

    But for most purposes, sending the user an email and requiring a response is an unnecessary pain. We have to write code to create the emails, and then more code to receive the replies and update the database that this email is now confirmed. More important, the user has to reply to our email. Are we going to hold up processing his order until we get the email response? What if he never replies to the email? We could be losing an order, i.e. money, just because the user forgot to respond to the email or decided it was too much trouble or deliberately gave us a wrong address because he doesn't want to receive spam. If the purpose of collecting the email was just so we could send him order status messages or future advertising, do we want to lose an order for that? Etc.

    By the same reasoning, you could say that if we ask for a phone number, instead of validating format we should call the number and make sure someone answers. Or that if, say, a web site that sells auto parts asks the user what model car he needs parts for, instead of just verifying that this model is in our database we should send someone to his house to verify that he really owns such a car. Etc.

    There might be times when such additional rigor is necessary. But often it is just too much trouble.

  • Jay (unregistered) in reply to Someone
    Someone:
    Hey everyone, I'm having a problem related to this story. I'm trying to make a list of email addresses that I can validate entries against, but typing it all out is really slow. Can some people help me out?

    Here is what I have so far:

    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]

    I was just working on a program the other day that used a similar approach to validate that a date was within a date range. Instead of doing something lazy like "date >= start_date and date <= end_date", the programmer wrote a loop that generated all the dates between the start and the end dates, and then checked that every date given was found on this list.

    I cried.

  • Jay (unregistered) in reply to Tim
    Tim:
    TRWTF is that there's not much point in validating the syntax of the email address closely because that doesn't prove that the email address actually exists, let alone that it is the correct one.

    if someone makes a mistake typing in an email address, most of the time what they actually type will be a valid email address and will either bounce or go to the wrong person

    Agree. If you check that it includes an @ sign and a period, that's a good indication that the person did indeed type an email address and not that he got confused and typed his zip code there by mistake or some such. Beyond that, yeah, if someone's email address is "[email protected]" and he bounces on the keyboard and types "[email protected]", no format test is going to catch that.

  • neminem (unregistered)

    Oh... my god. I think I just found v2 of this function. I just got an email about a contest I could enter if I gave them my email and said they could spam me. I don't mind that, but I do like knowing if they gave out my email to anyone else, so I used the gmail trick where you add +[the site name] to the end of your email address. I was told it wasn't a valid email address. I was like... this is going to be good, I'm going to look what they did. It was better than I expected:

    They do check to make sure you don't have any plus, dot, space, comma, semicolon, colon, forward slash, backslash, bang, open or close parens, hashes, open or close curly or square brackets, or dollar signs. They also check to make sure you don't have more than one @, and then after that, also check that you don't have two @s or two periods next to each other.

    Finally, and this is the huge kicker, they make sure that your domain isn't in a large list of mispellings of common domains, and that your tld isn't in a list of common tld mispellings (and a lot of these overlap). So if you wanted to run your own email server at, say, yaho.com, or al.com, or rocketmaill.com, or hoymail.com... tough luck. Also if you live in China, because .cn is apparently a mispelling of .com, and therefore invalid.

    If I saw this code written by someone at my company, you can frelling bet I'd submit it, cause holy frack is it awful. (There's even a commented out "endsWithGoodDomain" function that only accepts a handful of domains, with a comment that they used to call it, but they "had so many folks in Europe that wanted to subscribe", they had to comment it out. Apparently they only care about Europe, though, not China. :D)

    I don't think you can submit just random code you find online, though, sadly? I recommend you visit it and see for yourself, though, anyway, if you want a laugh. http://staticcdn13.tastingtable.com/javascript_v2/mc_main.js

  • anonymous (unregistered) in reply to Arne Nonymous
    Arne Nonymous:
    Meh, yet another email validator that doesn't accept "@ @"@example.com as valid (or the light version "@_@"@example.com ).
    Come back when you've got yourself a real e-mail address, son. RFC 5321: "a host that expects to receive mail SHOULD avoid defining mailboxes where the Local-part requires (or uses) the Quoted-string form".

    Yes, it's valid, but if it's really your e-mail address, you need a different one.

  • (cs)

    Here is an interesting fact: The asterisk (*) is valid in email addresses.

    my*[email protected] is valid.

  • foo (unregistered) in reply to anonymous
    anonymous:
    Arne Nonymous:
    Meh, yet another email validator that doesn't accept "@ @"@example.com as valid (or the light version "@_@"@example.com ).
    Come back when you've got yourself a real e-mail address, son. RFC 5321: "a host that expects to receive mail SHOULD avoid defining mailboxes where the Local-part requires (or uses) the Quoted-string form".

    Yes, it's valid, but if it's really your e-mail address, you need a different one.

    So if some site just wants my address for no purpose (or rather, just for spamming), I can use such an address because I don't really expect to receive mail from them.

  • Maarten (unregistered) in reply to henke37

    Before I was able to use aliases, I used + signs a lot in my only address. It was rejected by half of the contact forms. I'm not sure where and why this (in)validation is coming from but it seems all too common :(

  • Anon (unregistered) in reply to Maarten
    Maarten:
    Before I was able to use aliases, I used + signs a lot in my only address. It was rejected by half of the contact forms. I'm not sure where and why this (in)validation is coming from but it seems all too common :(

    Sites that reject valid email addresses should be blackholed.

  • anonymous (unregistered) in reply to foo
    foo:
    anonymous:
    Arne Nonymous:
    Meh, yet another email validator that doesn't accept "@ @"@example.com as valid (or the light version "@_@"@example.com ).
    Come back when you've got yourself a real e-mail address, son. RFC 5321: "a host that expects to receive mail SHOULD avoid defining mailboxes where the Local-part requires (or uses) the Quoted-string form".

    Yes, it's valid, but if it's really your e-mail address, you need a different one.

    So if some site just wants my address for no purpose (or rather, just for spamming), I can use such an address because I don't really expect to receive mail from them.
    If you're just giving them a fake e-mail address anyway, why would you use an address that probably won't pass validation?

  • FreddyFrogg (unregistered) in reply to Walky_one

    And why reject email with + in? That's so annoying.

  • (cs) in reply to QJo
    QJo:
    Aha! I know this - TRWTF is using Goto! Do I win a prize?

    Apart from that, all perfectly cromulent. Oh, apart from not leaving a neat space between the instances of the function names (Len, instr etc.) and their arguments.

    Look how much better 'If InStr (strEmail, "@") = 0 Then' looks.

    I hate people that code like that. Don't ever do that.

  • anonymous (unregistered) in reply to QJo
    QJo:
    Aha! I know this - TRWTF is using Goto! Do I win a prize?

    Apart from that, all perfectly cromulent. Oh, apart from not leaving a neat space between the instances of the function names (Len, instr etc.) and their arguments.

    Look how much better 'If InStr (strEmail, "@") = 0 Then' looks.

    There is already a neat space that's specifically designed to separate a function name from its arguments. It is called the space occupied by the left parenthesis.

  • (cs) in reply to anonymous
    anonymous:
    QJo:
    Aha! I know this - TRWTF is using Goto! Do I win a prize?

    Apart from that, all perfectly cromulent. Oh, apart from not leaving a neat space between the instances of the function names (Len, instr etc.) and their arguments.

    Look how much better 'If InStr (strEmail, "@") = 0 Then' looks.

    There is already a neat space that's specifically designed to separate a function name from its arguments. It is called the space occupied by the left parenthesis.

    People who use InStr() need to be slapped around a bit with a large trout.

    if(strEmail.IndexOf("@") == -1) { ... }
    
    If strEmail.IndexOf("@") = -1 Then
       ...
    End If
    
  • fuzzix (unregistered) in reply to Walky_one

    "MyEmail@somewhere"@url.com would be valid, some mail agents will so the quoting for you.

  • Iain (unregistered)

    I laughed for 10 minutes straight at this. If I hadn't have laughed, I'd have cried, because I've just had the same experience with a (locally-based) contractor we hired.

  • gr (unregistered)

    ^[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}$

    Whoops.

  • anonymous (unregistered) in reply to chubertdev
    chubertdev:
    anonymous:
    QJo:
    Aha! I know this - TRWTF is using Goto! Do I win a prize?

    Apart from that, all perfectly cromulent. Oh, apart from not leaving a neat space between the instances of the function names (Len, instr etc.) and their arguments.

    Look how much better 'If InStr (strEmail, "@") = 0 Then' looks.

    There is already a neat space that's specifically designed to separate a function name from its arguments. It is called the space occupied by the left parenthesis.

    People who use InStr() need to be slapped around a bit with a large trout.

    if(strEmail.IndexOf("@") == -1) { ... }
    
    If strEmail.IndexOf("@") = -1 Then
       ...
    End If
    
    If Not InStr(strEmail, "@") Then
    

    ...

  • (cs) in reply to anonymous
    anonymous:
    If Not InStr(strEmail, "@") Then
       ...

    And...............trout.

  • anonymous (unregistered) in reply to chubertdev
    chubertdev:
    anonymous:
    If Not InStr(strEmail, "@") Then
       ...

    And...............trout.

    Sorry, not really comprehending how or why you think that If InStr and If Not InStr are not good form. They say exactly what they mean.

    While I'm at it, what's the fascination with the magic number -1? At least zero is ubiquitously understood to be equivalent to the Boolean value False (and non-zero integers are similarly understood to be True). The meaning of -1 is typically True, which is entirely nonsensical: the index of needle within haystack is True?

  • MDMoore313 (unregistered)

    The sad part is that domain names truly end with '.', just try navigating to 'google.com.'. By convention everyone leaves it off but it's the root.

  • urza9814 (unregistered) in reply to Kuba
    Kuba:
    Citron:
    The real WTF is "alphanumeric characters only". With all these possible e-mail-addresses out there, the only useful thing to do for e-mail validation is to check, if the ser may have misstyped his e-mail-address, by checking for '@' and '.'. Use an opt-in to check if the user has access to the address.
    I fucking don't get why on Earth one just won't point to the applicable RFCs and be done with it. Do we really have to paraphrase internet standards all the time? Don't people have better things to do? Writing "specs" for what is a valid email address is like writing "specs" as to how a valid TCP/IP connection should look on the wire. It's like going full retard and being proud of it.

    Well, in this example it can take thirty seconds to write a basic validator that will be good enough for 99% of cases...and maybe a couple hours to write one that fits the RFC. More possibilities for bugs too if you try to match the RFC, as there will be far more code. Also very easy for coders to misinterpret the RFC, as it's not the easiest thing to read. Unless you have some reason where you really need to be 100% certain the emails are valid to the RFC (like if you're coding a mail server or something), I'd argue you should almost never try to validate everything in the RFC -- when in doubt, accept it all. You should be sending a confirmation email to verify anyway, right?

    Even if you validate to the RFC, all you've verified is that it's a valid email address, not that it is their email address, which is what you really want. So you have to do extra steps which, as an extra bonus, will fully validate the address for you!

  • urza9814 (unregistered) in reply to anonymous
    anonymous:
    chubertdev:
    anonymous:
    If Not InStr(strEmail, "@") Then
       ...

    And...............trout.

    Sorry, not really comprehending how or why you think that If InStr and If Not InStr are not good form. They say exactly what they mean.

    While I'm at it, what's the fascination with the magic number -1? At least zero is ubiquitously understood to be equivalent to the Boolean value False (and non-zero integers are similarly understood to be True). The meaning of -1 is typically True, which is entirely nonsensical: the index of needle within haystack is True?

    They use -1 because zero is the first character in the string....

  • anonymous (unregistered) in reply to urza9814
    urza9814:
    anonymous:
    chubertdev:
    anonymous:
    If Not InStr(strEmail, "@") Then
       ...

    And...............trout.

    Sorry, not really comprehending how or why you think that If InStr and If Not InStr are not good form. They say exactly what they mean.

    While I'm at it, what's the fascination with the magic number -1? At least zero is ubiquitously understood to be equivalent to the Boolean value False (and non-zero integers are similarly understood to be True). The meaning of -1 is typically True, which is entirely nonsensical: the index of needle within haystack is True?

    They use -1 because zero is the first character in the string....

    Returning -1 implies that you didn't even look at my string, you lazy bastard. If you're going to return an invalid index, at least return strlen so that I know you looked.

  • Peter Scott (unregistered) in reply to Walky_one

    Good point.

    If you want to check if an email address is valid and really exists, you should use a service like e.g. http://www.email-validator.net. We have been using their API for 6 months now and are really impressed by the quality of their service and the fast turnaround.

  • BrandonPhone (unregistered)
    Comment held for moderation.

Leave a comment on “Email Hyper-Validation”

Log In or post as a guest

Replying to comment #:

« Return to Article