Email Hyper-Validation

« Return to Article
  • idisjunction 2013-08-12 06:34
    Making the frist post is also an agreed upon standard.
  • henke37 2013-08-12 06:35
    But plussigns and ampserands are legal!
  • Citron 2013-08-12 06:36
    The real WTF is "alphanumeric characters only". With all these possible e-mail-addresses out there, the only useful thing to do for e-mail validation is to check, if the ser may have misstyped his e-mail-address, by checking for '@' and '.'. Use an opt-in to check if the user has access to the address.
  • Damien 2013-08-12 06:39
    Meh. It has errors too:

    If Right(strEmail, 1) = "." Then
    strReturn = "Email address cannot end with '.'"
    GoTo ExitHandler

    An email address can actually end with a '.'. Its a fully qualified domain name..
  • Hannes 2013-08-12 06:39
    Right or wrong, this is what they agreed to do to the presidents sick daughter. And let me assure you: It was no laughing matter!
  • Warren 2013-08-12 06:45
    OK, so they should have had a return type of boolean and used exceptions for the errors....
  • u 2013-08-12 06:46
    Citron:
    The real WTF is "alphanumeric characters only". With all these possible e-mail-addresses out there, the only useful thing to do for e-mail validation is to check, if the ser may have misstyped his e-mail-address, by checking for '@' and '.'. Use an opt-in to check if the user has access to the address.


    It states "In following with agreed upon standards" - it is nowhere said that they are following _RFC_ standards.
  • Grzechooo 2013-08-12 06:49
    Good that he didn't use a regular expression.
  • ratchet freak 2013-08-12 06:50
    u:
    Citron:
    The real WTF is "alphanumeric characters only". With all these possible e-mail-addresses out there, the only useful thing to do for e-mail validation is to check, if the ser may have misstyped his e-mail-address, by checking for '@' and '.'. Use an opt-in to check if the user has access to the address.


    It states "In following with agreed upon standards" - it is nowhere said that they are following _RFC_ standards.


    isn't TRWTF that Andrew didn't specify that the RFC standard should be followed

    though I'm afraid of the code that would come out of that specification
  • Tim B 2013-08-12 06:54
    An email address can actually end with a '.'

    Domain names can end with '.', but email addresses can't. See ttp://tools.ietf.org/html/rfc5321#section-4.1.2
  • Hannes 2013-08-12 07:00
    Grzechooo:
    Good that he didn't use a regular expression.


    Well, he had 99 problems once and used regular expressions. In the end, he had 100 problems.

    http://xkcd.com/1171/

    Also, I find it interesting that Akismet catches the url in the QUOTE and think it's spam...
  • JimmyCrackedCorn 2013-08-12 07:01
    It would have been nice to have the routine aggregate all the validation errors so that they could be presented at once (up to a certain limit).
  • Ian Eiloart 2013-08-12 07:01
    But, hey, the specs were screwed up anyway. There's no point having great programming style if you're being told to write nonsense. Almost every character is valid on the left hand side of an email address, if quoted. And the plus symbol in particular is widely used.
  • QJo 2013-08-12 07:04
    Aha! I know this - TRWTF is using Goto! Do I win a prize?

    Apart from that, all perfectly cromulent. Oh, apart from not leaving a neat space between the instances of the function names (Len, instr etc.) and their arguments.

    Look how much better 'If InStr (strEmail, "@") = 0 Then' looks.
  • QJo 2013-08-12 07:09
    But seriously folks, TRWTF is:

    "Thank goodness he has his own coding experience to fall back upon."

    A suboptimal approach. Better would be to communicate with the coder in question and explain in detail the shortcomings of the design used. Then the coder learns to code and the subject of this piece learns to delegate. Doing it himself is a complete waste of the effort taken to give him PM experience.
  • csrster 2013-08-12 07:22
    The real WTF is surely not using the Composite pattern to aggregate multiple validation rules in a single rule. Then each individual rule can be ruthlessly and independently unit-tested. Plus you're able instantiate these generalised validation rules using an Abstract Factory Pattern and an appropriate dependency-injection framework. Here, let me show you some UML ...
  • faoileag 2013-08-12 07:36
    QJo:
    But seriously folks, TRWTF is:

    "Thank goodness he has his own coding experience to fall back upon."

    A suboptimal approach. Better would be to communicate with the coder in question and explain in detail the shortcomings of the design used.

    In a way even the completely wrong apporach - if taken into consideration that it is not his job to code the validity check, but to supervise the offshore team.
  • faoileag 2013-08-12 07:38
    csrster:
    The real WTF is surely not using the Composite pattern to aggregate multiple validation rules in a single rule. Then each individual rule can be ruthlessly and independently unit-tested. Plus you're able instantiate these generalised validation rules using an Abstract Factory Pattern and an appropriate dependency-injection framework. Here, let me show you some UML ...

    I'm missing the XML in your design. Without XML in it, it's definitely not enterprisey enough!
  • Floobart 2013-08-12 07:39
    I don't know about all the characters and weird combinations he checks for, but I do know that email addresses can contain + (plus) " (quotes) and ( ) (parentheses)


    CAPTCHA: immitto - post this immitto!
  • faoileag 2013-08-12 07:43
    strReturn = "Email address cannot contain " & Chr(34)

    Don't tell me Visual Basic has no other means to include a quote in string?
  • wrojr 2013-08-12 07:48
    TRWTF is that code being wildly used, since so many forms don't accept the + and so on...
  • Mattmon 2013-08-12 07:49
    If InStr(strEmail, "frist") > 0 Then
    strReturn = "Email address cannot contain 'frist'"
    GoTo ExitHandler
    End If
  • Christian 2013-08-12 07:53
    Hi,

    and this is my all time favourite ....

    > If InStr(1, strEmail, "+") > 0 Then
    > strReturn = "Email address cannot contain '+'"
    > GoTo ExitHandler
    > End If


    Why the hell shouldn't an email address contain a +. I use that all the time.

    Greetings
    Christian
  • JimmyCrackedCorn 2013-08-12 07:56
    I know VB, but perhaps this would have been on the way to better:


    Module VBModule

    Sub Main()
    Console.WriteLine(LibValidateEmail("test@test.com"))

    End Sub


    Module VBModule

    Sub Main()
    Console.WriteLine(LibValidateEmail("test@test.com") )
    End Sub


    Function LibValidateEmail(ByVal strEmail As String) As String
    '
    ' Validate email address - if valid returns "".
    '
    Dim strReturn As String = ""

    If Len(strEmail) < 7 Then
    strReturn = MoreErrors(strReturn,"Please fill in full email address")
    End If

    If CharacterCount(strEmail, "@") <> 1 Then
    strReturn = MoreErrors(strReturn, "Address must contain only one '@' character")
    End If

    If Left(strEmail, 1) = "@" Then
    strReturn = MoreErrors(strReturn,"Email address cannot start with '@'")
    End If
    If Right(strEmail, 1) = "@" Then
    strReturn = MoreErrors(strReturn,"Email address cannot end with '@'")
    End If
    If InStr(strEmail, ".@") > 0 Then
    strReturn = MoreErrors(strReturn,"Email address cannot contain '.@'")
    End If
    If InStr(strEmail, "@.") > 0 Then
    strReturn = MoreErrors(strReturn,"Email address cannot contain '@.'")
    End If

    If InStr(strEmail, "..") > 0 Then
    strReturn = MoreErrors(strReturn,"Email address cannot contain '..'")
    End If
    If Left(strEmail, 1) = "." Then
    strReturn = MoreErrors(strReturn,"Email address cannot start with '.'")
    End If
    If Right(strEmail, 1) = "." Then
    strReturn = MoreErrors(strReturn,"Email address cannot end with '.'")
    End If




    If Not ValidateChars(strEmail) Then
    MoreErrors(strReturn,"Email address cannot contain invalid characters")
    End If

    If Not ExcludeChars(strEmail) Then
    MoreErrors(strReturn,"Email address cannot contain invalid characters")
    End If


    If InStr(strEmail, Chr(34)) > 0 Then
    strReturn = MoreErrors(strReturn,"Email address cannot contain " & Chr(34))
    End If


    If InStr(strEmail, Chr(127)) > 0 Then
    strReturn = MoreErrors(strReturn,"Email address cannot contain invalid characters")
    End If

    End Function

    ' Eliminate low end of ASCI range
    Function ValidateChars(ByVal value As String) As Boolean
    Dim errorFlag As Boolean = true
    For Each c As Char In value
    if Convert.toInt32(Convert.ToByte(c)) < 33
    errorFlag = false
    exit for
    end if
    Next
    Return errorFlag
    End Function

    ' Exclude specific characters
    Function ExcludeChars(ByVal value As String) As Boolean
    Dim okFlag As Boolean = true
    Dim excludedChars As String = "!#$%&^*()+,/:;<=>?[\]`~{|}"

    For Each c As Char In value
    If InStr(excludedChars,c) > 0 Then
    okFlag = false
    exit for
    end if
    Next
    Return okFlag
    End Function

    ' Simple count of a specific character
    Function CharacterCount(ByVal value As String, ByVal ch As Char) As Integer
    Dim cnt As Integer = 0
    For Each c As Char In value
    If c = ch Then cnt += 1
    Next
    Return cnt
    End Function

    ' Concatinate a string
    Function MoreErrors(ByVal strError As String, ByVal strMore As String) As String
    return strError & vbCrLf & strMore
    End Function
    End Module


    ' Let the flamage begin!
  • faoileag 2013-08-12 07:59
    Looking at the article I can not help but to think that the code might have the odd bug regarding false negatives (as others have noticed before), but without knowledge of the documents Andrew sent to the offshore team, it does not represent a wtf per se.

    Perhaps Andrew did not tell the offshore team that the string to test would come from a web form and would therefore be highly unlikely to contain bell characters etc?

    Perhaps the return value was specified as "empty string if valid, error msg when not"? Then the developer would have had all the freedom to make the error message as verbose and specific as he wanted.

    You get what you specify. Unclear specs and this is what you get. Clear specs that state "gimme precise error messages on all failures" and this is also what you get.

    Give your spec like "Function must test a string for validity as email address against relevant RFC, and return TRUE if valid, FALSE if not" and you can run sample email addresses against the delivered function and complain if the sample email addresses give false positives or negatives.

    But this being Andrew's first stab at being an offshore team lead, I wouldn't even count any bad specs on his side as a wtf. "Puppy license" applies to all new recruits. Ok, make that should apply ;-)
  • faoileag 2013-08-12 08:02
    JimmyCrackedCorn:
    perhaps this would have been on the way to better:
    (...endless lines of VB code excluded...)

    You haven't heard of http://pastebin.com/ , have you?
  • JimmyCrackedCorn 2013-08-12 08:04
    faoileag:
    JimmyCrackedCorn:
    perhaps this would have been on the way to better:
    (...endless lines of VB code excluded...)

    You haven't heard of http://pastebin.com/ , have you?


    I thought some hadn't.
  • JimmyCrackedCorn 2013-08-12 08:06
    JimmyCrackedCorn:
    faoileag:
    JimmyCrackedCorn:
    perhaps this would have been on the way to better:
    (...endless lines of VB code excluded...)

    You haven't heard of http://pastebin.com/ , have you?


    I thought some hadn't.


    http://pastebin.com/TYX4Utax
  • Don 2013-08-12 08:12
    Damien:
    Meh. It has errors too:

    If Right(strEmail, 1) = "." Then
    strReturn = "Email address cannot end with '.'"
    GoTo ExitHandler

    An email address can actually end with a '.'. Its a fully qualified domain name..

    An FQDN cannot impose ambiguity, hence the name FULLY QUALIFIED in the definition. Ending or starting with a . creates ambiguity.

    I think you mean DNS RESOLVERS don't care about the dot...
  • pjt33 2013-08-12 08:17
    faoileag:
    Looking at the article I can not help but to think that the code might have the odd bug regarding false negatives (as others have noticed before), but without knowledge of the documents Andrew sent to the offshore team, it does not represent a wtf per se.

    Regardless of the spec, any code which could be compressed by 90% with a loop or two is a WTF unless it's explicitly commented that the loop was unrolled with a significant impact on performance.
  • Kuba 2013-08-12 08:19
    Citron:
    The real WTF is "alphanumeric characters only". With all these possible e-mail-addresses out there, the only useful thing to do for e-mail validation is to check, if the ser may have misstyped his e-mail-address, by checking for '@' and '.'. Use an opt-in to check if the user has access to the address.
    I fucking don't get why on Earth one just won't point to the applicable RFCs and be done with it. Do we really have to paraphrase internet standards all the time? Don't people have better things to do? Writing "specs" for what is a valid email address is like writing "specs" as to how a valid TCP/IP connection should look on the wire. It's like going full retard and being proud of it.
  • radarbob 2013-08-12 08:19
    Warren:
    OK, so they should have had a return type of boolean and used exceptions for the errors...


    Aaaarrrrggghhhhhh...
  • faoileag 2013-08-12 08:42
    pjt33:
    faoileag:
    think that the code ... does not represent a wtf per se.

    Regardless of the spec, any code which could be compressed by 90% with a loop or two is a WTF unless it's explicitly commented that the loop was unrolled with a significant impact on performance.

    For a peer review, I would agree with you completely. However, this is code delivered by an offshore team. In an ideal world, you run your pre-written unit-tests against it and tell the offshore team which have failed if any fail. You do not look at the codebase itself, unless somewhere in your contract with the overseas company you have a clause that explicitly states that the code itself must also meet certain standards. Which is normally not the case. So who cares if they do the loop unrolling themselves? Let them. Perhaps they get paid by lines of code.
  • Hannes 2013-08-12 08:56
    Don:
    Damien:
    Meh. It has errors too:

    If Right(strEmail, 1) = "." Then
    strReturn = "Email address cannot end with '.'"
    GoTo ExitHandler

    An email address can actually end with a '.'. Its a fully qualified domain name..

    An FQDN cannot impose ambiguity, hence the name FULLY QUALIFIED in the definition. Ending or starting with a . creates ambiguity.

    I think you mean DNS RESOLVERS don't care about the dot...


    DNS Resolvers DO care about the dot. If they wouldn't they couldn't resolve a URL like http://thedailywtf(dot)com(dot). But -surprise surprise- they do resolve it.
  • Mike 2013-08-12 09:13
    This is why VB coders get a bad wrap. If your if statement doesn't get you all the way there just throw in another 100 or so for each possibility and you should be fine.
  • iaoth 2013-08-12 09:21
    bad rap*
  • faoileag 2013-08-12 09:25
    bad rep.
  • anon 2013-08-12 09:28
    It does it is just ugly as sin especially when it is at the end of a string It would be something like.

    strReturn = "Email address cannot contain """

  • Dave 2013-08-12 09:35
    Kuba:
    I fucking don't get why on Earth one just won't point to the applicable RFCs and be done with it.


    Start by looking at the relevant RFC and showing us how you'd code for it. We could use a laugh.
  • English Man 2013-08-12 09:47
    henke37:
    But plussigns and ampserands are legal!
    plus-signs are a great way to see who has leaked your email to marketing/spam lists but sadly are only accepted by 25-50% of sites in my experience.
  • faoileag 2013-08-12 09:48
    Dave:
    Kuba:
    I fucking don't get why on Earth one just won't point to the applicable RFCs and be done with it.


    Start by looking at the relevant RFC and showing us how you'd code for it. We could use a laugh.

    Grzechooo already did that further up in
    Post 414825
  • Cant remember my damn login 2013-08-12 09:54
    no, no, no, no just NO!

    A user incorrectly entering an email address is not exceptional
  • Anonymoose 2013-08-12 09:56
    Sites that don't accept plus signs make me sad and usually turn me away.
  • Abigo 2013-08-12 09:57
    Grzechooo:
    Good that he didn't use a regular expression.


    I think I see it. It's a boat, right?
  • user+suffix@emaildomain 2013-08-12 10:14
    Beyond the ludicrous use of if-then statements instead of a regex, here is another point:

    The "+" character IS valid in the username part of an email address.

    It would be nice if programmers doing email validation would actually READ the documentation regarding this.

    RFC 2822 would be a good place to start.

    www.ietf.org/rfc/rfc2822.txt
  • anonymous 2013-08-12 10:22
    Tim B:
    An email address can actually end with a '.'

    Domain names can end with '.', but email addresses can't. See ttp://tools.ietf.org/html/rfc5321#section-4.1.2
    I tried typing that into my touch-tone phone, but the nice operator lady told me that it wasn't understood.
  • faoileag 2013-08-12 10:24
    user+suffix@emaildomain:
    It would be nice if programmers doing email validation would actually READ the documentation regarding this.

    RFC 2822 would be a good place to start.

    RFC 2822 is not exactly an easy read. Personally, I find en.wikipedia.org/wiki/Email_address#Local_part much more appealing.
  • jkupski 2013-08-12 10:32
    Hannes:
    DNS Resolvers DO care about the dot. If they wouldn't they couldn't resolve a URL like http://thedailywtf(dot)com(dot). But -surprise surprise- they do resolve it.

    Actually, they do not, given that the above is a URL (as you yourself note) and not a domain name. The above is really a lot like misusing they're/their/there while being a grammar nazi.
  • da Doctah 2013-08-12 10:36

    If InStr(strEmail, "..") > 0 Then
    strReturn = "Email address cannot contain '..'"
    GoTo ExitHandler

    Wait. Why the hell can't Email address contain '..'?

    Are you going to tell me that fred..smith@jones.com is invalid?
  • Koch 2013-08-12 10:40
    This ^
  • C-Derb 2013-08-12 10:50
    Why waste so much effort telling the user exactly what is wrong with their email address? Just use a generic "Please enter a valid email address" message. I mean, 99.9% of the time you are expecting users to enter valid data, right?
  • DCRoss 2013-08-12 10:56
    Abigo:
    Grzechooo:
    Good that he didn't use a regular expression.

    I think I see it. It's a boat, right?

    It's not a boat. It's a schooner.

    On the plus side, at least it didn't try to parse HTML.
  • Hannes 2013-08-12 10:59
    jkupski:
    Hannes:
    DNS Resolvers DO care about the dot. If they wouldn't they couldn't resolve a URL like thedailywtf(dot)com(dot). But -surprise surprise- they do resolve it.

    Actually, they do not, given that the above is a URL (as you yourself note) and not a domain name. The above is really a lot like misusing they're/their/there while being a grammar nazi.


    Well, what do URLs mostly contain? Maybe Domain Names?

    Also, a FQDN does indeed have a dot at the end: (dot)com(dot), so the DNS can resolve the root server responsible for "com".

    Or, you could just try and read the wiki article about FQDN. ;)
  • ebinezer dude 2013-08-12 11:19
    He was given this project to gain PM experience. Thus, the right action is to not review the code, allow the dev to check it in, compile and if no errors, launch. Then report to your boss that this project was successfully completed on time, within budget and the clients are incredibly happy.

    Next up-
    Project Management Lesson 2: The Perfect Project - How to Keep Client Complaints from Your Boss and Make Bug Reports Disappear

    Project Management Lesson 3: Ethics Schmethics - It's only wrong if your caught and How to Say Sorry Like a Boss.
  • ANON 2013-08-12 11:39
    faoileag:
    pjt33:
    faoileag:
    think that the code ... does not represent a wtf per se.

    Regardless of the spec, any code which could be compressed by 90% with a loop or two is a WTF unless it's explicitly commented that the loop was unrolled with a significant impact on performance.

    For a peer review, I would agree with you completely. However, this is code delivered by an offshore team. In an ideal world, you run your pre-written unit-tests against it and tell the offshore team which have failed if any fail. You do not look at the codebase itself, unless somewhere in your contract with the overseas company you have a clause that explicitly states that the code itself must also meet certain standards. Which is normally not the case. So who cares if they do the loop unrolling themselves? Let them. Perhaps they get paid by lines of code.


    I don't agree on this. It's in the interest of the customer that the code meets the coding standards and not in the interest of the offshore company. The customer will have to pay the technical debt later. So you should take a look in the code to enforce the interests of your customer.

    Furthermore I read in some studies that white box tests find more errors in the same time than black box tests.
  • fein 2013-08-12 12:03
    All you should check for is "x@y"

    This is the minimum spec for an email address.
  • Poen 2013-08-12 12:05
    The RWTF is they outsourced such a simple task...
  • cellocgw 2013-08-12 12:13
    TRWTF is that not only the dev team, but every comment in this thread chose the wrong meaning of "valid." A character string can pass RFC_whatevernumber but still not be valid because no such email address exists.

    Thus, the validation code needs to go like this:
    (pseudo-pseudocode)

    pingit := pine $address_to_be_validated
    if (pingit != NULL ) {
    # must have got some bounceback nastygram
    Table_reject($address_to_be_validated)
    }

  • herby 2013-08-12 12:30
    Couple of things:

    1) Quality of code:
    Today's lesson will be loops. This is a basic computer construct that everyone should know. Study it well, it will be used in a test later.

    2) On validating email addresses:
    Email addresses can range from the simple to the very complex. There are many standards. Read them. For the most part if you want to have an address that will be globally useful, it most likely contains two things:
    It contains an '@', and it has at least one '.' after the '@'. Sure there are other constraints, but for the most part this is enough. Ask the person who typed it in to do it twice, and they will usually get it right (or it will be intentionally wrong). If you want to validate further, send a confirming email and await a response. If you get one, it is probably good to go. Anything else is getting close to a waste of time.
  • ¯\(°_o)/¯ I DUNNO LOL 2013-08-12 12:30
    cellocgw:
    pingit := pine $address_to_be_validated
    An e-mail address domain is not necessarily ping-able. They use MX records, and an IP address is only a fallback.

    Or was that not a typo and you really meant pine the e-mail client?
  • meinders1337 2013-08-12 12:37
    The real WTF is writing code to validate an email address. Use existing libraries.
  • chubertdev 2013-08-12 12:39
    radarbob:
    Warren:
    OK, so they should have had a return type of boolean and used exceptions for the errors...


    Aaaarrrrggghhhhhh...


    Yeah, people like Warren that abuse exceptions need to be thrown in a deep, dark dungeon somewhere.
  • chubertdev 2013-08-12 12:40
    "Kerbleckistan Considered Harmful"
  • Doozerboy 2013-08-12 12:46
    faoileag:
    pjt33:
    faoileag:
    think that the code ... does not represent a wtf per se.

    Regardless of the spec, any code which could be compressed by 90% with a loop or two is a WTF unless it's explicitly commented that the loop was unrolled with a significant impact on performance.

    For a peer review, I would agree with you completely. However, this is code delivered by an offshore team. In an ideal world, you run your pre-written unit-tests against it and tell the offshore team which have failed if any fail. You do not look at the codebase itself, unless somewhere in your contract with the overseas company you have a clause that explicitly states that the code itself must also meet certain standards. Which is normally not the case. So who cares if they do the loop unrolling themselves? Let them. Perhaps they get paid by lines of code.



    Riiiiiiight. So are your unit tests going to catch that the thing runs like a bag of spanners?
  • Popeye 2013-08-12 12:51
    Surely even an idiotic offshore developer in pakistan would know how to use Google to look up a regex for email validations?

    One line of code PEOPLE.
  • Kasper 2013-08-12 12:54
    meinders1337:
    The real WTF is writing code to validate an email address. Use existing libraries.
    I recently came across some code, which did that. It didn't work. The library rejected any TLD longer than 6 characters.
  • cellocgw 2013-08-12 12:55
    ¯\(°_o)/¯ I DUNNO LOL:
    cellocgw:
    pingit := pine $address_to_be_validated
    An e-mail address domain is not necessarily ping-able. They use MX records, and an IP address is only a fallback.

    Or was that not a typo and you really meant pine the e-mail client?


    Yes, yes I did. You can use elm if you prefer :-).

    And daggonit, this is thedailywtf. Every post is assumed to have <sarcasm> and <satire> tagged
  • foo 2013-08-12 13:02
    C-Derb:
    Why waste so much effort telling the user exactly what is wrong with their email address? Just use a generic "Please enter a valid email address" message. I mean, 99.9% of the time you are expecting users to enter valid data, right?
    But you're not asking the user to enter a valid email address, but to enter an email address that conforms to your arbitrary rules. How is the user supposed to know those rules? It's like saying "I don't like what you entered, but I won't tell you why, keep guessing." Really that makes the WTF twice as bad.
  • gizmore 2013-08-12 13:05
    Looks quite solid and is way faster than a regular expression. no wtf ;)
  • Jay 2013-08-12 13:46
    Rejecting things like plus signs and two dots in a row comes from the philosophy of "I've never seen one like that, it's probably invalid", rather than actually checking the specs.

    I have a personal email address that ends in dot-us. One website I went to rejected that. I tried changing it to dot-com -- of course not my correct email address then, but just to see -- and it accepted it. I guess they don't want a lot of grubby foreigners on their system, but you think they'd allow in Americans who use the us TLD.

    I would think that for something like validating the user's email address, if for whatever reason you can't get exactly the right rules, you would want to err on the side of accepting too much rather than too little. The main point of a validation like that is to catch user brain freezes, like he accidentally types his phone number in the e-mail field. So just checking for "includes an @ and at least one period after the @" is probably a not-bad validation.
  • Jay 2013-08-12 13:49
    Hey, this brings to mind an actual serious thought: How tight should a validation be?

    If you are validating an email address, you COULD keep a list of all the valid TLDs and validate against that. Then if a user trying to type in "com" accidentally typed "cim", you'd catch it. But I've never done that and I doubt I ever will, because it would require keeping that list up to date, which would mean constantly monitoring for the creation of new TLDs. That sounds like way too much trouble.

    Of course on the flip side, some validations must be 100% tight. Like if I'm validating a user's password, I'm not going to say, "oh, okay, the last character was wrong, but that's probably just a typo, we'll let you in."
  • Jay 2013-08-12 13:53
    Kasper:
    meinders1337:
    The real WTF is writing code to validate an email address. Use existing libraries.
    I recently came across some code, which did that. It didn't work. The library rejected any TLD longer than 6 characters.


    Exactly. I don't know how many times I've heard, "What?! You're going to write that function yourself?! That's crazy. Just search for something on the Internet, then you don't have to debug it yourself."

    The assumption there, of course, is that anything downloaded from the Internet is guaranteed to not only be 100% correct but also to 100% meet my requirements. There's absolutely no reason to believe that's true, and plenty of reason to believe it's wildly false.
  • foo 2013-08-12 13:56
    Jay:
    Hey, this brings to mind an actual serious thought: How tight should a validation be?

    If you are validating an email address, you COULD keep a list of all the valid TLDs and validate against that. Then if a user trying to type in "com" accidentally typed "cim", you'd catch it. But I've never done that and I doubt I ever will, because it would require keeping that list up to date, which would mean constantly monitoring for the creation of new TLDs. That sounds like way too much trouble.

    Of course on the flip side, some validations must be 100% tight. Like if I'm validating a user's password, I'm not going to say, "oh, okay, the last character was wrong, but that's probably just a typo, we'll let you in."
    The latter is not validation, but rather checking. Something like validation is done when a user chooses a password, e.g. is it long enough, does it contain enough different characters, does it not contain some characters we don't like etc.
  • chubertdev 2013-08-12 13:58
    [a-zA-z]*[@]gmail[.]com


    Pssht, anyone who knows what they're doing should have a gmail account with normal characters, and type it all lowercase. At least anyone we want associated with our site.

    Addendum (2013-08-12 15:18):
    *
    [a-zA-Z]


    Muphry's Law
  • AN AWESOME CODER 2013-08-12 13:59
    ratchet freak:
    u:
    Citron:
    The real WTF is "alphanumeric characters only". With all these possible e-mail-addresses out there, the only useful thing to do for e-mail validation is to check, if the ser may have misstyped his e-mail-address, by checking for '@' and '.'. Use an opt-in to check if the user has access to the address.


    It states "In following with agreed upon standards" - it is nowhere said that they are following _RFC_ standards.


    isn't TRWTF that Andrew didn't specify that the RFC standard should be followed

    though I'm afraid of the code that would come out of that specification


    Not necessarily. You're assuming this email address field would be accepting all email addresses in the world.

    It's quite possible that the field is on an internal screen, and that the format of email address is known ahead of time. It still might be sub-optimal to have custom validation considering the plethora of times validating email address has been solved, but it's reasonable that a company knows all of their emplyee email address are in the same format.

    Similarly, validating extensions on phone numbers would be dumb, but not if the phone number is the number to my desk, and every employee accessing the program should have one.
  • Rcx 2013-08-12 14:23
    Actually, by the email address RFC(s), it cannot, see

    http://www.faqs.org/rfcs/rfc1123.html section 5.2.18

    "Some systems over-qualify domain names by adding a trailing dot to some or all domain names in addresses or message-ids. This violates RFC-822 syntax."
  • Coyne 2013-08-12 14:35
    Shall we count the ways this violates RFC 2822? (Could take a while.)

    Let's just leave it at, "Very exhaustively wrong."
  • ubersoldat 2013-08-12 14:44
    Grzechooo:
    Good that he didn't use a regular expression.


    I wouldn't put my hands on fire for VB support of RegExp.

    OTOH, I've met my share of "developers" that don't even know what RegExp is. Ask around, it might surprise you.
  • Matt Westwood 2013-08-12 14:47
    chubertdev:
    [a-zA-z]*[@]gmail[.]com


    Pssht, anyone who knows what they're doing should have a gmail account with normal characters, and type it all lowercase. At least anyone we want associated with our site.


    [a-zA-z]

    What the FUCK is that?
  • foo 2013-08-12 14:47
    ubersoldat:
    Grzechooo:
    Good that he didn't use a regular expression.


    I wouldn't put my hands on fire for VB support of RegExp.

    OTOH, I've met my share of "developers" that don't even know what RegExp is. Ask around, it might surprise you.
    Probably nearly as many as those who do know but refuse to use them due to a misinterpreted quote.
  • The Bytemaster 2013-08-12 14:51
    faoileag:
    strReturn = "Email address cannot contain " & Chr(34)

    Don't tell me Visual Basic has no other means to include a quote in string?

    It does, but it is not any prettier
    strReturn = "Email address cannot contain """

    That's right, two quotes together makes a single with the third to close the string. Simmilar syntax as T-SQL
    Declare SingleQuoteChar char(1) = ''''
  • The Bytemaster 2013-08-12 14:53
    ubersoldat:
    Grzechooo:
    Good that he didn't use a regular expression.


    I wouldn't put my hands on fire for VB support of RegExp.

    OTOH, I've met my share of "developers" that don't even know what RegExp is. Ask around, it might surprise you.
    VB..Net has "full" support of RegEx through the same .Net framework classes as C#, though, you are right, many "developers" don't even know what a RegEx is.

    Then again, a lot of developers I work with have been going at this for so long, technologies such as RegEx were just never introduced to them.
  • Arne Nonymous 2013-08-12 15:15
    Meh, yet another email validator that doesn't accept "@ @"@example.com as valid (or the light version "@_@"@example.com ).
  • chubertdev 2013-08-12 15:18
    Matt Westwood:
    chubertdev:
    [a-zA-z]*[@]gmail[.]com


    Pssht, anyone who knows what they're doing should have a gmail account with normal characters, and type it all lowercase. At least anyone we want associated with our site.


    [a-zA-z]

    What the FUCK is that?


    Muphry's Law
  • AN AWESOME CODER 2013-08-12 15:22
    Hannes:
    jkupski:
    Hannes:
    DNS Resolvers DO care about the dot. If they wouldn't they couldn't resolve a URL like thedailywtf(dot)com(dot). But -surprise surprise- they do resolve it.

    Actually, they do not, given that the above is a URL (as you yourself note) and not a domain name. The above is really a lot like misusing they're/their/there while being a grammar nazi.


    Well, what do URLs mostly contain? Maybe Domain Names?

    Also, a FQDN does indeed have a dot at the end: (dot)com(dot), so the DNS can resolve the root server responsible for "com".

    Or, you could just try and read the wiki article about FQDN. ;)



    And it has absolutely nothing to do with email.

    "myhomenetwork" is a valid domain.

    "an_awesome_coder@myhomenetwork" is a valid email address.

    Sending email to "an_awesome_coder@myhomenetwork" will succeed as long as the machine sending the email can resolve MX records for "myhomenetwork."


    I will concede, though, that any email address used in the real world that's input by a user in some non-geeky and non-ops related software will have at least one dot following the @ character though.
  • Roger 2013-08-12 16:00
    You have to admit they didn't go nearly far enough with that method, though. Consider:

    if strEmail == "!@!.!" Or
    strEmail == "!@!.#" Or
    strEmail == "a@!.$" Or

    ... a modest amount of code left out
    strEmail == "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz@zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz.zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz.zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz" Then
    return True


  • XXI 2013-08-12 16:11
    Roger:
    You have to admit they didn't go nearly far enough with that method, though. Consider:

    if strEmail == "!@!.!" Or
    strEmail == "!@!.#" Or
    strEmail == "a@!.$" Or

    ... a modest amount of code left out
    strEmail == "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz@zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz.zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz.zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz" Then
    return True




    ^ This

    It is always much easier to make a whitelist of what you support than trying to blacklist all possible unsupported cases. This is the way it should be done

    Captcha aptent: I would not aptent it any other way
  • erazerazerazerazerazer 2013-08-12 16:51
    Matt Westwood:


    [a-zA-z]

    What the FUCK is that?


    That's all characters between A and z meaning:
    * "a", "b", "c", ... "z"
    * "A", "B", "C", ... "Z"
    * "a", "b", "c", ... "z" (again)
    * "["
    * "\"
    * "]"
    * "^"
    * "_"
    * "`"

    Or put another way: all characters between ASCII 65 and ASCII 122.

  • jkupski 2013-08-12 17:55
    Hannes:
    Well, what do URLs mostly contain? Maybe Domain Names?

    Also, a FQDN does indeed have a dot at the end: (dot)com(dot), so the DNS can resolve the root server responsible for "com".

    Or, you could just try and read the wiki article about FQDN. ;)

    I did not correct your statement about FQDNs, merely your statement that a DNS resolver "resolves" URLs. It doesn't. Being snarky and directing me to wikipedia on another subbject does not change your inaccuracy.

    To be even more pedantic, even "URLs mostly contain domains names" is also not necessarily an accurate statement. It contains protocol information (http), a separator (://), host information (www.example.com), an optional port number (:80), and the full path to a resource on that host (/fake/path/example.htm).

    Hell, the URL to reply to your comment, http://thedailywtf.com/Comments/AddComment.aspx?ArticleId=7636&ReplyTo=414871&Quote=Y, is a great example of your inaccuracy. My DNS resolver can't make heads nor tails of it. :)



  • :o) 2013-08-12 17:56
    faoileag:
    csrster:
    The real WTF is surely not using the Composite pattern to aggregate multiple validation rules in a single rule. Then each individual rule can be ruthlessly and independently unit-tested. Plus you're able instantiate these generalised validation rules using an Abstract Factory Pattern and an appropriate dependency-injection framework. Here, let me show you some UML ...

    I'm missing the XML in your design. Without XML in it, it's definitely not enterprisey enough!

    troll!
  • Someone 2013-08-12 19:08
    Hey everyone, I'm having a problem related to this story. I'm trying to make a list of email addresses that I can validate entries against, but typing it all out is really slow. Can some people help me out?

    Here is what I have so far:

    aaaaaa@aaaaaa.com
    aaaaac@aaaaac.com
    aaaaad@aaaaad.com
    aaaaae@aaaaae.com
    aaaaaf@aaaaaf.com
    aaaaag@aaaaag.com
    aaaaah@aaaaah.com
    aaaaai@aaaaai.com
    aaaaaj@aaaaaj.com
    aaaaak@aaaaak.com
    aaaaal@aaaaal.com
    aaaaam@aaaaam.com

  • foxyshadis 2013-08-12 19:25
    C-Derb:
    Why waste so much effort telling the user exactly what is wrong with their email address? Just use a generic "Please enter a valid email address" message. I mean, 99.9% of the time you are expecting users to enter valid data, right?

    Because when they enter their perfectly valid email address, they're justified in at least knowing why you petulantly reject it.
  • Gaurav 2013-08-12 19:40
    Its better than the boring old regex approach brings spice to your life
  • Friedrice The Great 2013-08-12 19:43
    Mike:
    This is why VB coders get a bad wrap. If your if statement doesn't get you all the way there just throw in another 100 or so for each possibility and you should be fine.

    Would a good wrap be in something like plastic? Or is canvas preferred when disposing of weighted bodies at sea?
  • Coyne 2013-08-12 21:01
    Grzechooo:
    Good that he didn't use a regular expression.


    Especially since that regex implements RFC 822, which is incredibly obsolete...having been superseded by RFC 2822 and then by RFC 5322.
  • grisle 2013-08-12 22:17
    Christian:
    Hi,

    and this is my all time favourite ....

    > If InStr(1, strEmail, "+") > 0 Then
    > strReturn = "Email address cannot contain '+'"
    > GoTo ExitHandler
    > End If


    Why the hell shouldn't an email address contain a +. I use that all the time.

    Greetings
    Christian
    oh, everyone is up in arms about it....

    The + is often used so you can filter SPAM.
    Why do people want your email address?
    Why do you think they don't want the '+' there?

    HINT: People who want your email address probably don't care much about the standards. They want to know they can send you email you will READ (or at least be forced to notice). That is, until the sender is blocked, or your Bayesian SPAM filter sees it for what it is.

    People always assume that "nobody knows that + is valid in email addresses" - I think it's more that "nobody CARES that + is valid in email addresses"
  • asd 2013-08-12 22:53
    insert rant about how data validation is over used....

    It's easy to spot a bodgey address if it's:
    No Way You gunna get my emial MF

    than if it's:
    BodgeyBrothersInc@fakestreet.com


    One of the questions that needs to be considered is "Why do they need an email address?".
    Normally it's so they can send you stuff (in which case send something with a link and see if it works). Sometimes it's under the pretext of verifying your identity (or at least somehow holding you to account for how you use their site - and again, if you NEED the address, verify it via an emailed link). Sometimes it's because they want to send you SPAM (actually in all cases this could be the case - and again SEND A LINK).

    The only time when you don't need to send a link to verify, is in cases where you don't intend to use the email. But if you don't intend using it, why ask for it?

    Of course everyone does, simply because they like having your information, but the only certain way to see that the address exists (even if for only a fleeting moment) is to have an email send there and somehow verified.

    Who gives a shit about the actual format? This is a classic case of programmers overthinking the problem and reengineering the wheel.

    There's a difference between a valid email address and a real email address. Many people ask for valid when they mean real - at the end of the day, a valid email address that doesn't exist is about as useful as an invalid one. And if you demand a valid one, I'll either use my enemies one (and hope you don't verify it) or make up some stupid one that doesn't exist anyway....
  • E 2013-08-13 00:25
    It does, but Chr(34) is more readable than """".
  • foo 2013-08-13 01:18
    Someone:
    Hey everyone, I'm having a problem related to this story. I'm trying to make a list of email addresses that I can validate entries against, but typing it all out is really slow. Can some people help me out?

    Here is what I have so far:

    aaaaaa@aaaaaa.com
    aaaaac@aaaaac.com
    aaaaad@aaaaad.com
    aaaaae@aaaaae.com
    aaaaaf@aaaaaf.com
    aaaaag@aaaaag.com
    aaaaah@aaaaah.com
    aaaaai@aaaaai.com
    aaaaaj@aaaaaj.com
    aaaaak@aaaaak.com
    aaaaal@aaaaal.com
    aaaaam@aaaaam.com

    You don't have to type it yourself. Just ask at N&N (Nigerian Scammers & NSA). Both have very good, up to date, lists of all email addresses in the world.
  • Norman Diamond 2013-08-13 02:07
    Someone:
    Hey everyone, I'm having a problem related to this story. I'm trying to make a list of email addresses that I can validate entries against, but typing it all out is really slow. Can some people help me out?

    Here is what I have so far:

    aaaaaa@aaaaaa.com
    aaaaac@aaaaac.com
    aaaaad@aaaaad.com
    aaaaae@aaaaae.com
    aaaaaf@aaaaaf.com
    aaaaag@aaaaag.com
    aaaaah@aaaaah.com
    aaaaai@aaaaai.com
    aaaaaj@aaaaaj.com
    aaaaak@aaaaak.com
    aaaaal@aaaaal.com
    aaaaam@aaaaam.com

    At least you'll be immune from this prank:
    http://www.youtube.com/watch?v=gJuGKJaSyVU

    Akismet says I should sell you stolen credit cards instead.
  • QJo 2013-08-13 03:40
    Norman Diamond:
    Someone:
    Hey everyone, I'm having a problem related to this story. I'm trying to make a list of email addresses that I can validate entries against, but typing it all out is really slow. Can some people help me out?

    Here is what I have so far:

    aaaaaa@aaaaaa.com
    aaaaac@aaaaac.com
    aaaaad@aaaaad.com
    aaaaae@aaaaae.com
    aaaaaf@aaaaaf.com
    aaaaag@aaaaag.com
    aaaaah@aaaaah.com
    aaaaai@aaaaai.com
    aaaaaj@aaaaaj.com
    aaaaak@aaaaak.com
    aaaaal@aaaaal.com
    aaaaam@aaaaam.com

    At least you'll be immune from this prank:
    http://www.youtube.com/watch?v=gJuGKJaSyVU

    Akismet says I should sell you stolen credit cards instead.

    TRWTF there is going into hysterical panic at the thought of having an ordinary everyday insect on your back. The correct response (to being told "You've got a bee on your back") is to just carry on doing whatever you were doing, sure in the knowledge that once it has finished its innocuous and harmless business on your clothing, then it will simply fly away and go somewhere else.
  • Walky_one 2013-08-13 04:04
    Funny thing that E-Mail addresses like

    MyEmail@somewhere@url.com seem to be valid by the above code... (More funny that nobody pointed that out so far)
  • Tim 2013-08-13 06:12
    TRWTF is that there's not much point in validating the syntax of the email address closely because that doesn't prove that the email address actually exists, let alone that it is the correct one.

    if someone makes a mistake typing in an email address, most of the time what they actually type will be a valid email address and will either bounce or go to the wrong person
  • Zecc 2013-08-13 06:36
    Warren:
    OK, so they should have had a return type of boolean and used exceptions for the errors....
    Not sure if serious or troll.
  • csrster 2013-08-13 07:05
    faoileag:
    csrster:
    The real WTF is surely not using the Composite pattern to aggregate multiple validation rules in a single rule. Then each individual rule can be ruthlessly and independently unit-tested. Plus you're able instantiate these generalised validation rules using an Abstract Factory Pattern and an appropriate dependency-injection framework. Here, let me show you some UML ...

    I'm missing the XML in your design. Without XML in it, it's definitely not enterprisey enough!


    How do you think I'm configuring my dependency-injection framework? Lot's of lovely XML there ...
  • The Fury 2013-08-13 07:16
    Kuba:
    Citron:
    The real WTF is "alphanumeric characters only". With all these possible e-mail-addresses out there, the only useful thing to do for e-mail validation is to check, if the ser may have misstyped his e-mail-address, by checking for '@' and '.'. Use an opt-in to check if the user has access to the address.
    I fucking don't get why on Earth one just won't point to the applicable RFCs and be done with it. Do we really have to paraphrase internet standards all the time? Don't people have better things to do? Writing "specs" for what is a valid email address is like writing "specs" as to how a valid TCP/IP connection should look on the wire. It's like going full retard and being proud of it.


    Actually why would you bother to even do this. Why would you want a developer who wouldn't know to look up the relevant RFC writing your app anyway?
  • n9ds 2013-08-13 09:11
    Seems to me about the only way of finding out for sure if an email address is valid (but not necessarily real) is to an nslookup on the host portion. Of course if your code waits 30 seconds for the resolution to time out then . . .

    As a side note, one nice thing about LotusScript (I'll wait for all the Lotus Notes jokes to die down before continuing) is that you can use "", {} or || as string delimiters, so including " in your string is easy. Very handy when writing HTML codes.

    print {<a class="HelloWorld" style="font-size: 12px; color:black" href="foo.html">Hello World!</a>}

  • Anon 2013-08-13 10:01
    faoileag:
    pjt33:
    faoileag:
    think that the code ... does not represent a wtf per se.

    Regardless of the spec, any code which could be compressed by 90% with a loop or two is a WTF unless it's explicitly commented that the loop was unrolled with a significant impact on performance.

    For a peer review, I would agree with you completely. However, this is code delivered by an offshore team. In an ideal world, you run your pre-written unit-tests against it and tell the offshore team which have failed if any fail. You do not look at the codebase itself, unless somewhere in your contract with the overseas company you have a clause that explicitly states that the code itself must also meet certain standards. Which is normally not the case. So who cares if they do the loop unrolling themselves? Let them. Perhaps they get paid by lines of code.


    The dev who has to clean this up in a few years cares. As does the dev who has to work on the 400 bugs generated by the shit code.
  • Anon 2013-08-13 10:02
    The Fury:
    Kuba:
    Citron:
    The real WTF is "alphanumeric characters only". With all these possible e-mail-addresses out there, the only useful thing to do for e-mail validation is to check, if the ser may have misstyped his e-mail-address, by checking for '@' and '.'. Use an opt-in to check if the user has access to the address.
    I fucking don't get why on Earth one just won't point to the applicable RFCs and be done with it. Do we really have to paraphrase internet standards all the time? Don't people have better things to do? Writing "specs" for what is a valid email address is like writing "specs" as to how a valid TCP/IP connection should look on the wire. It's like going full retard and being proud of it.


    Actually why would you bother to even do this. Why would you want a developer who wouldn't know to look up the relevant RFC writing your app anyway?


    A dev who uses the RFC standard instead of the PHB standard is not long for most jobs.
  • Anon 2013-08-13 10:05
    n9ds:
    Seems to me about the only way of finding out for sure if an email address is valid (but not necessarily real) is to an nslookup on the host portion. Of course if your code waits 30 seconds for the resolution to time out then . . .


    Not sure if serious.
  • Vlad Patryshev 2013-08-13 11:30
    They may be stupid, but they are also wrong. '+' SHOULD be allowed.
  • Herr Otto Flick 2013-08-13 12:38
    Andrew completed his functional design document detailing valid email address requirements - the address must contain an "@" symbol, must include a domain name, alphanumeric characters only, and punctuation like underscores, hyphens, periods are all OK

    Andrew sounds exactly like the kind of tool who thinks he is clever for incorrectly specifying how to solve a problem that was fixed back in the 80s. No need to know anything, just spout shit and hope it sticks - much like his equally inept "remote coding team".
  • Jay 2013-08-13 12:54
    XXI:
    It is always much easier to make a whitelist of what you support than trying to blacklist all possible unsupported cases. This is the way it should be done


    Well, I'd say "often easier", not necessarily "always".

    Like if your function won't work with a quote mark but will work with any other Unicode character, it's sure easier to say char!='"' then char=='a' or char=='b' or char=='c' or ... or char=0x128 or char=0x129 or ... etc

    Or if you accept user ids of any length except 9, it's easier to write len!=9 rather than len==1 or len==2 or len==3 ... or len==8 or len==10 or ...

  • Jay 2013-08-13 13:06
    asd:
    insert rant about how data validation is over used....

    It's easy to spot a bodgey address if it's:
    No Way You gunna get my emial MF

    than if it's:
    BodgeyBrothersInc@fakestreet.com


    One of the questions that needs to be considered is "Why do they need an email address?".
    Normally it's so they can send you stuff (in which case send something with a link and see if it works). Sometimes it's under the pretext of verifying your identity (or at least somehow holding you to account for how you use their site - and again, if you NEED the address, verify it via an emailed link). Sometimes it's because they want to send you SPAM (actually in all cases this could be the case - and again SEND A LINK).

    The only time when you don't need to send a link to verify, is in cases where you don't intend to use the email. But if you don't intend using it, why ask for it?

    Of course everyone does, simply because they like having your information, but the only certain way to see that the address exists (even if for only a fleeting moment) is to have an email send there and somehow verified.

    Who gives a shit about the actual format? This is a classic case of programmers overthinking the problem and reengineering the wheel.

    There's a difference between a valid email address and a real email address. Many people ask for valid when they mean real - at the end of the day, a valid email address that doesn't exist is about as useful as an invalid one. And if you demand a valid one, I'll either use my enemies one (and hope you don't verify it) or make up some stupid one that doesn't exist anyway....


    If you really absolutely need to be sure that the email is real, sure.

    But for most purposes, sending the user an email and requiring a response is an unnecessary pain. We have to write code to create the emails, and then more code to receive the replies and update the database that this email is now confirmed. More important, the user has to reply to our email. Are we going to hold up processing his order until we get the email response? What if he never replies to the email? We could be losing an order, i.e. money, just because the user forgot to respond to the email or decided it was too much trouble or deliberately gave us a wrong address because he doesn't want to receive spam. If the purpose of collecting the email was just so we could send him order status messages or future advertising, do we want to lose an order for that? Etc.

    By the same reasoning, you could say that if we ask for a phone number, instead of validating format we should call the number and make sure someone answers. Or that if, say, a web site that sells auto parts asks the user what model car he needs parts for, instead of just verifying that this model is in our database we should send someone to his house to verify that he really owns such a car. Etc.

    There might be times when such additional rigor is necessary. But often it is just too much trouble.
  • Jay 2013-08-13 13:09
    Someone:
    Hey everyone, I'm having a problem related to this story. I'm trying to make a list of email addresses that I can validate entries against, but typing it all out is really slow. Can some people help me out?

    Here is what I have so far:

    aaaaaa@aaaaaa.com
    aaaaac@aaaaac.com
    aaaaad@aaaaad.com
    aaaaae@aaaaae.com
    aaaaaf@aaaaaf.com
    aaaaag@aaaaag.com
    aaaaah@aaaaah.com
    aaaaai@aaaaai.com
    aaaaaj@aaaaaj.com
    aaaaak@aaaaak.com
    aaaaal@aaaaal.com
    aaaaam@aaaaam.com



    I was just working on a program the other day that used a similar approach to validate that a date was within a date range. Instead of doing something lazy like "date >= start_date and date <= end_date", the programmer wrote a loop that generated all the dates between the start and the end dates, and then checked that every date given was found on this list.

    I cried.
  • Jay 2013-08-13 13:16
    Tim:
    TRWTF is that there's not much point in validating the syntax of the email address closely because that doesn't prove that the email address actually exists, let alone that it is the correct one.

    if someone makes a mistake typing in an email address, most of the time what they actually type will be a valid email address and will either bounce or go to the wrong person


    Agree. If you check that it includes an @ sign and a period, that's a good indication that the person did indeed type an email address and not that he got confused and typed his zip code there by mistake or some such. Beyond that, yeah, if someone's email address is "bob@foobar.com" and he bounces on the keyboard and types "bobb@foobar.com", no format test is going to catch that.
  • neminem 2013-08-13 13:49
    Oh... my god. I think I just found v2 of this function. I just got an email about a contest I could enter if I gave them my email and said they could spam me. I don't mind that, but I do like knowing if they gave out my email to anyone else, so I used the gmail trick where you add +[the site name] to the end of your email address. I was told it wasn't a valid email address. I was like... this is going to be good, I'm going to look what they did. It was better than I expected:

    They do check to make sure you don't have any plus, dot, space, comma, semicolon, colon, forward slash, backslash, bang, open or close parens, hashes, open or close curly or square brackets, or dollar signs. They also check to make sure you don't have more than one @, and then after that, also check that you don't have two @s or two periods next to each other.

    Finally, and this is the huge kicker, they make sure that your domain isn't in a large list of mispellings of common domains, and that your tld isn't in a list of common tld mispellings (and a lot of these overlap). So if you wanted to run your own email server at, say, yaho.com, or al.com, or rocketmaill.com, or hoymail.com... tough luck. Also if you live in China, because .cn is apparently a mispelling of .com, and therefore invalid.

    If I saw this code written by someone at my company, you can frelling bet I'd submit it, cause holy frack is it awful. (There's even a commented out "endsWithGoodDomain" function that only accepts a handful of domains, with a comment that they used to call it, but they "had so many folks in Europe that wanted to subscribe", they had to comment it out. Apparently they only care about Europe, though, not China. :D)

    I don't think you can submit just random code you find online, though, sadly? I recommend you visit it and see for yourself, though, anyway, if you want a laugh. http://staticcdn13.tastingtable.com/javascript_v2/mc_main.js
  • anonymous 2013-08-13 14:18
    Arne Nonymous:
    Meh, yet another email validator that doesn't accept "@ @"@example.com as valid (or the light version "@_@"@example.com ).
    Come back when you've got yourself a real e-mail address, son. RFC 5321: "a host that expects to receive mail SHOULD avoid defining mailboxes where the Local-part requires (or uses) the Quoted-string form".

    Yes, it's valid, but if it's really your e-mail address, you need a different one.
  • PiisAWheeL 2013-08-13 17:17
    Here is an interesting fact:
    The asterisk (*) is valid in email addresses.

    my*email@email.com is valid.
  • foo 2013-08-13 18:48
    anonymous:
    Arne Nonymous:
    Meh, yet another email validator that doesn't accept "@ @"@example.com as valid (or the light version "@_@"@example.com ).
    Come back when you've got yourself a real e-mail address, son. RFC 5321: "a host that expects to receive mail SHOULD avoid defining mailboxes where the Local-part requires (or uses) the Quoted-string form".

    Yes, it's valid, but if it's really your e-mail address, you need a different one.
    So if some site just wants my address for no purpose (or rather, just for spamming), I can use such an address because I don't really expect to receive mail from them.
  • Maarten 2013-08-14 06:18
    Before I was able to use aliases, I used + signs a lot in my only address. It was rejected by half of the contact forms. I'm not sure where and why this (in)validation is coming from but it seems all too common :(
  • Anon 2013-08-14 10:51
    Maarten:
    Before I was able to use aliases, I used + signs a lot in my only address. It was rejected by half of the contact forms. I'm not sure where and why this (in)validation is coming from but it seems all too common :(


    Sites that reject valid email addresses should be blackholed.
  • anonymous 2013-08-14 16:51
    foo:
    anonymous:
    Arne Nonymous:
    Meh, yet another email validator that doesn't accept "@ @"@example.com as valid (or the light version "@_@"@example.com ).
    Come back when you've got yourself a real e-mail address, son. RFC 5321: "a host that expects to receive mail SHOULD avoid defining mailboxes where the Local-part requires (or uses) the Quoted-string form".

    Yes, it's valid, but if it's really your e-mail address, you need a different one.
    So if some site just wants my address for no purpose (or rather, just for spamming), I can use such an address because I don't really expect to receive mail from them.
    If you're just giving them a fake e-mail address anyway, why would you use an address that probably won't pass validation?
  • FreddyFrogg 2013-08-15 04:29
    And why reject email with + in? That's so annoying.
  • Gunslnger 2013-08-15 05:32
    QJo:
    Aha! I know this - TRWTF is using Goto! Do I win a prize?

    Apart from that, all perfectly cromulent. Oh, apart from not leaving a neat space between the instances of the function names (Len, instr etc.) and their arguments.

    Look how much better 'If InStr (strEmail, "@") = 0 Then' looks.


    I hate people that code like that. Don't ever do that.
  • anonymous 2013-08-15 13:18
    QJo:
    Aha! I know this - TRWTF is using Goto! Do I win a prize?

    Apart from that, all perfectly cromulent. Oh, apart from not leaving a neat space between the instances of the function names (Len, instr etc.) and their arguments.

    Look how much better 'If InStr (strEmail, "@") = 0 Then' looks.
    There is already a neat space that's specifically designed to separate a function name from its arguments. It is called the space occupied by the left parenthesis.
  • chubertdev 2013-08-15 19:35
    anonymous:
    QJo:
    Aha! I know this - TRWTF is using Goto! Do I win a prize?

    Apart from that, all perfectly cromulent. Oh, apart from not leaving a neat space between the instances of the function names (Len, instr etc.) and their arguments.

    Look how much better 'If InStr (strEmail, "@") = 0 Then' looks.
    There is already a neat space that's specifically designed to separate a function name from its arguments. It is called the space occupied by the left parenthesis.


    People who use InStr() need to be slapped around a bit with a large trout.


    if(strEmail.IndexOf("@") == -1) { ... }

    If strEmail.IndexOf("@") = -1 Then
    ...
    End If
  • fuzzix 2013-08-16 10:33
    "MyEmail@somewhere"@url.com would be valid, some mail agents will so the quoting for you.
  • Iain 2013-08-19 10:19
    I laughed for 10 minutes straight at this. If I hadn't have laughed, I'd have cried, because I've just had the same experience with a (locally-based) contractor we hired.
  • gr 2013-08-20 00:57
    ^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$

    Whoops.
  • anonymous 2013-08-20 10:54
    chubertdev:
    anonymous:
    QJo:
    Aha! I know this - TRWTF is using Goto! Do I win a prize?

    Apart from that, all perfectly cromulent. Oh, apart from not leaving a neat space between the instances of the function names (Len, instr etc.) and their arguments.

    Look how much better 'If InStr (strEmail, "@") = 0 Then' looks.
    There is already a neat space that's specifically designed to separate a function name from its arguments. It is called the space occupied by the left parenthesis.


    People who use InStr() need to be slapped around a bit with a large trout.


    if(strEmail.IndexOf("@") == -1) { ... }

    If strEmail.IndexOf("@") = -1 Then
    ...
    End If
    If Not InStr(strEmail, "@") Then
    
    ...
  • chubertdev 2013-08-20 18:41
    anonymous:
    If Not InStr(strEmail, "@") Then
    
    ...


    And...............trout.
  • anonymous 2013-08-21 13:56
    chubertdev:
    anonymous:
    If Not InStr(strEmail, "@") Then
    
    ...


    And...............trout.
    Sorry, not really comprehending how or why you think that If InStr and If Not InStr are not good form. They say exactly what they mean.

    While I'm at it, what's the fascination with the magic number -1? At least zero is ubiquitously understood to be equivalent to the Boolean value False (and non-zero integers are similarly understood to be True). The meaning of -1 is typically True, which is entirely nonsensical: the index of needle within haystack is True?
  • MDMoore313 2013-08-22 13:18
    The sad part is that domain names truly end with '.', just try navigating to 'google.com.'. By convention everyone leaves it off but it's the root.
  • urza9814 2013-08-27 22:56
    Kuba:
    Citron:
    The real WTF is "alphanumeric characters only". With all these possible e-mail-addresses out there, the only useful thing to do for e-mail validation is to check, if the ser may have misstyped his e-mail-address, by checking for '@' and '.'. Use an opt-in to check if the user has access to the address.
    I fucking don't get why on Earth one just won't point to the applicable RFCs and be done with it. Do we really have to paraphrase internet standards all the time? Don't people have better things to do? Writing "specs" for what is a valid email address is like writing "specs" as to how a valid TCP/IP connection should look on the wire. It's like going full retard and being proud of it.


    Well, in this example it can take thirty seconds to write a basic validator that will be good enough for 99% of cases...and maybe a couple hours to write one that fits the RFC. More possibilities for bugs too if you try to match the RFC, as there will be far more code. Also very easy for coders to misinterpret the RFC, as it's not the easiest thing to read. Unless you have some reason where you *really* need to be 100% certain the emails are valid to the RFC (like if you're coding a mail server or something), I'd argue you should almost *never* try to validate everything in the RFC -- when in doubt, accept it all. You should be sending a confirmation email to verify anyway, right?

    Even if you validate to the RFC, all you've verified is that it's *a* valid email address, not that it is *their* email address, which is what you really want. So you have to do extra steps which, as an extra bonus, will fully validate the address for you!
  • urza9814 2013-08-28 18:00
    anonymous:
    chubertdev:
    anonymous:
    If Not InStr(strEmail, "@") Then
    
    ...


    And...............trout.
    Sorry, not really comprehending how or why you think that If InStr and If Not InStr are not good form. They say exactly what they mean.

    While I'm at it, what's the fascination with the magic number -1? At least zero is ubiquitously understood to be equivalent to the Boolean value False (and non-zero integers are similarly understood to be True). The meaning of -1 is typically True, which is entirely nonsensical: the index of needle within haystack is True?


    They use -1 because zero is the first character in the string....
  • anonymous 2013-09-03 10:33
    urza9814:
    anonymous:
    chubertdev:
    anonymous:
    If Not InStr(strEmail, "@") Then
    
    ...


    And...............trout.
    Sorry, not really comprehending how or why you think that If InStr and If Not InStr are not good form. They say exactly what they mean.

    While I'm at it, what's the fascination with the magic number -1? At least zero is ubiquitously understood to be equivalent to the Boolean value False (and non-zero integers are similarly understood to be True). The meaning of -1 is typically True, which is entirely nonsensical: the index of needle within haystack is True?


    They use -1 because zero is the first character in the string....
    Returning -1 implies that you didn't even look at my string, you lazy bastard. If you're going to return an invalid index, at least return strlen so that I know you looked.
  • Peter Scott 2013-10-15 06:44
    Good point.

    If you want to check if an email address is valid and really exists, you should use a service like e.g. http://www.email-validator.net. We have been using their API for 6 months now and are really impressed by the quality of their service and the fast turnaround.