• configurator (unregistered)

    So it throws an Exception when invalid, returns 1 for valid and 0 for null.

    TRWTF is Java, of course.

  • Fred (unregistered)

    If this line is not blank then maybe 5 seconds can make a frist. If this line is not blank then maybe 5 seconds can make a frist. {snip} If this line is not blank then maybe 5 seconds can make a frist.

  • Victor (unregistered)

    /** Valid signs */

    You don't need to read any further than that to know you're not dealing with a programmer here.

    But when he went home at night I bet he was thinking yeah I'm sooooo cool I wrote some code today yeehaw!

  • drake (unregistered)

    If he had used Regular Expressions, he would have had 10 problems

  • Walter (unregistered)

    Well at least he implemented a whitelist not a blacklist. Even if I offered a $10,000 bonus, I don't think I could get our lead web "developer" to grasp that concept.

  • Smug Unix User (unregistered) in reply to drake

    10 problems are less than 101 problems.

  • Yasmin (unregistered)

    You might think this could be reduced to a single function, but you would be forgetting the unique messages:

    "Vsadr5 with invalid sign"

    (And, of course, your users are naturally intuitively going to know what Vsadr5 means... to say nothing of invalid sign.)

    WARNING: CAPTCHA not invalid!

  • Foo Bar (unregistered)

    If Cedille, Eszett, and Y acute are all considered valid address characters, either the company expects to ship all over Europe, or the "programmer" is having too much fun with keyboard symbols.

  • (cs)

    That's awesome how all the tests have the same logic, just the exception message and the variable names are different.

  • Dazed (unregistered)

    I suppose this might have been written with one of the early Java versions that didn't have regexes. But that's not much of an excuse ...

  • (cs)

    Ah the classic case of a clueless/lazy programmer who, rather than sit back after the first one and think "Hey wait a minute, this seems really inefficient. What if we need to validate more than one line?" and then goes and refactors out the code properly, does it once, says "It's done Boss" and then copy-pastes that function when they need to do it again.

    It's a damn shame these bozos are more common than they should be.

  • (cs) in reply to Dazed
    Dazed:
    I suppose this might have been written with one of the early Java versions that didn't have regexes. But that's not much of an excuse ...

    Even if it was, that's not an excuse for not refactoring this out into a method so the code doesn't have to be repeated. That's the real WTF here, not the way it's implemented but that it's not reusable so it has to be copied.

  • F (unregistered) in reply to ObiWayneKenobi
    ObiWayneKenobi:
    Ah the classic case of a clueless/lazy programmer who, rather than sit back after the first one and think "Hey wait a minute, this seems really inefficient. What if we need to validate more than one line?" and then goes and refactors out the code properly, does it once, says "It's done Boss" and then copy-pastes that function when they need to do it again.

    It's a damn shame these bozos are more common than they should be.

    That's what you should expect when people get rated on lines of code instead of quality.

    He's missed a trick, though. He could have added code for address lines 6 to 10, `"to allow for future enhancement".

  • That Guy (unregistered) in reply to Foo Bar
    Foo Bar:
    If Cedille, Eszett, and Y acute are all considered valid address characters, either the company expects to ship all over Europe, or the "programmer" is having too much fun with keyboard symbols.

    I was more worried about the semicolon, back slash, and the open and close parens. What, curly braces are too good for your addresses?

  • (cs)

    The clunking tedium of creating an array of characters: 'A', 'B', .... yecch. My immediate thought (in absence of any knowledge of the existence of regexps) would be:

    String validChars = "ABCDE ... "

    But apart from repeating the business end 5 times (oh, and failing to handle exceptions and null cases correctly) I've seen plenty worse.

  • (cs)

    He should have written it like this:

    for (i = 0; i < address.length(); i++) {
        if (address[i:i] != 'A' then {
            if  (address[i:i] != 'B' then {
                .... etc. ad barf
            }
        }
    }
    
  • MarkMc (unregistered)

    My eyes. The goggles aren't helping. How many times do we need to write something like the following?

      private boolean ValidateShippingAddress( String input )
      {
         String whitelist = 
         "^[a-zA-Z0-9\.,\'-/&!\"$%()*+:;=\üéâäàçêëèïîìßôöòûùßáíóúñÑÄÖÜ#åÿýÁÂÀÅÇÉÊËÈÍÎÏÌÓÔÒÚÛÙÝ]*$";				
         Pattern pattern = Pattern.compile(whitelist);
         return pattern.matcher(input).matches();
      }
    

    Or more likely refactor this to generalize for all the other input fields that need validation, providing a standardized regex pattern.

  • (cs) in reply to That Guy
    That Guy:
    Foo Bar:
    If Cedille, Eszett, and Y acute are all considered valid address characters, either the company expects to ship all over Europe, or the "programmer" is having too much fun with keyboard symbols.

    I was more worried about the semicolon, back slash, and the open and close parens. What, curly braces are too good for your addresses?

    I feel great joy when I ask how do we know what we want. Suddenly people realize that they never bothered to figure out what they actually want.

  • (cs) in reply to MarkMc
    MarkMc:
    My eyes. The goggles aren't helping. How many times do we need to write something like the following?
      private boolean ValidateShippingAddress( String input )
      {
         String whitelist = 
         "^[a-zA-Z0-9\.,\'-/&!\"$%()*+:;=\üéâäàçêëèïîìßôöòûùßáíóúñÑÄÖÜ#åÿýÁÂÀÅÇÉÊËÈÍÎÏÌÓÔÒÚÛÙÝ]*$";				
         Pattern pattern = Pattern.compile(whitelist);
         return pattern.matcher(input).matches();
      }
    

    Or more likely refactor this to generalize for all the other input fields that need validation, providing a standardized regex pattern.

    What, like this:

      private boolean ValidateBusinessString( String input, String whitelist )
      {
         Pattern pattern = Pattern.compile(whitelist);
         return pattern.matcher(input).matches();
      }
    

    Possibly even compile the pattern statically and pass the pattern in. At which stage there is a school of thought that suggests you might as well factor out the method itself as it contains but one line. I would probably keep the method just in case it needs to get more complicated in future.

  • jay (unregistered) in reply to Dazed
    Dazed:
    I suppose this might have been written with one of the early Java versions that didn't have regexes. But that's not much of an excuse ...

    It must also have been written with one of the early versions of Java where you couldn't call the same function in two different places with two different values for the parameter.

  • Adin (unregistered) in reply to drake

    Ha ha. Absolutely agree.

  • Jeff Grigg (unregistered)

    Well, I find it helpful that they allow the single quote character ('), for SQL injection attacks. And the ampersand character (&) for HTML injection/cross-site scripting attacks. They're always working harder to make my life easier!!!

    ;->

  • Meep (unregistered) in reply to @Deprecated
    @Deprecated:
    That's awesome how all the tests have the same logic, just the exception message and the variable names are different.

    Which is perfectly reasonable since you might want to change the logic for one field.

    What's awesome is how the author didn't simply put the common functionality in another method. "I'll write a method for each field, but I'd rather copy and paste my code five times than create a sixth."

  • instigator (unregistered) in reply to Matt Westwood

    If only ASCII was arranged in a logical manner, then you wouldn't even need regular expressions. You could do something like:

    for( int i = 0; i < vsadr1.length(); i++ ) { char c = vsadr1.charAt(i) if(!( (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') || (c >= '0' && c <= '9') || (c == ' ') // || ... )) return false; } return true;

    captcha: sagaciter: The SAGACITER loved to reference "Lord of the Rings"

  • Darth Paul (unregistered)

    The other WTF - yet another data modeller/BA who thinks that addresses consist of 5 fields.

    No doubt it is near impossible to create a report to the effect of "sales by state" and the excuse will be "no-one has ever asked for that in all the years I have been here."

  • Anonymous (unregistered)

    The requirement is mostly bogus, too... We've all seen the multi-page regex to (mostly) get email addresses right, and postal addresses are a lot more irregular. There is no way to construct a whitelist, except closely studying the postal guidelines of all the countries you're shipping to, going insane, then settle on a blacklist of the most esoteric Unicode blocks – it's most likely none of ⟺ or ☻ or 💂.

    Then again, why bother? Just print it on the label and have the postman figure it out.

  • urza9814 (unregistered) in reply to instigator
    instigator:
    If only ASCII was arranged in a logical manner, then you wouldn't even need regular expressions. You could do something like:

    for( int i = 0; i < vsadr1.length(); i++ ) { char c = vsadr1.charAt(i) if(!( (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') || (c >= '0' && c <= '9') || (c == ' ') // || ... )) return false; } return true;

    captcha: sagaciter: The SAGACITER loved to reference "Lord of the Rings"

    Wouldn't need a regex, but it's still far simpler...because you can do the same basic thing. Why type out "(c >= 'A' && c <= 'Z') ||" when you could just use "[A-Z]"?

  • Alex Emelianov (unregistered)

    And then you need to ship to Japan. Oops (times five)!

  • moreON (unregistered) in reply to instigator

    So that's:

    if(expr){
        return !expr;
    }
    return expr;
    

    why not just

    return !expr;
    ?

  • Mr. Shine (unregistered)

    I think y'all are being too hard on this programmer. He did recognize that he needed only one validSign array, after all.

  • (cs) in reply to Anonymous
    Anonymous:
    The requirement is mostly bogus, too... We've all seen the multi-page regex to (mostly) get email addresses right, and postal addresses are a lot more irregular. There is no way to construct a whitelist, except closely studying the postal guidelines of all the countries you're shipping to, going insane, then settle on a blacklist of the most esoteric Unicode blocks – it's most likely none of ⟺ or ☻ or 💂.

    Then again, why bother? Just print it on the label and have the postman figure it out.

    You don't know how true that is. I know someone who used to work on systems for automated reading of postal labels and routing of letters and parcels based on that (he might still be doing that, building post sorting machines for post offices) and he said that it was incredibly hard. He particularly noted that the UK was especially bad, as there is no standard address form (though we do have detailed postal codes to make things easier, and they're in common use). Ultimately, the best policy is indeed to just put whatever the user gave you on the label and let the post office deal with it.

  • Jeff Grigg (unregistered) in reply to dkf
    dkf:
    Anonymous:
    The requirement is mostly bogus, too... ... postal addresses are a lot more irregular. There is no way to construct a whitelist, except closely studying the postal guidelines of all the countries you're shipping to, going insane, then settle on a blacklist of the most esoteric Unicode blocks – it's most likely none of ⟺ or ☻ or 💂. ...

    You don't know how true that is. ... it was incredibly hard. ... the UK was especially bad, as there is no standard address form ... Ultimately, the best policy is indeed to just put whatever the user gave you on the label and let the post office deal with it.

    I would suggest that one should first trust the user to be able to type in an address that will get it to them. If they can't do that, then you have more serious problems than playing games with what characters to allow.

    OK, you should probably not allow non-printable characters.

    But I would suggest that instead of making up "requirements," maybe one should consult the appropriate standards. For United States addresses, it would be "Publication #28."

    http://pe.usps.gov/cpim/ftp/pubs/pub28/pub28.pdf

  • Norman Diamond (unregistered) in reply to Foo Bar
    Foo Bar:
    If Cedille, Eszett, and Y acute are all considered valid address characters, either the company expects to ship all over Europe, or the "programmer" is having too much fun with keyboard symbols.
    Not really. The company expects to ship to Scandinavia and Germany but not to Greece, Turkey, and some Slavic countries.
  • Norman Diamond (unregistered) in reply to Jeff Grigg
    Jeff Grigg:
    But I would suggest that instead of making up "requirements," maybe one should consult the appropriate standards. For United States addresses, it would be "Publication #28."

    http://pe.usps.gov/cpim/ftp/pubs/pub28/pub28.pdf

    USPS standards aren't there to be obeyed, you know. One standard says that senders should put their country name at the end of the return address. Government operations such as one called "IRS" (not that there's anything internal or service about it) put "PA" at the end of their return address, which is the ISO abbreviation for Panama. Whenever the sender puts a nondeliverable destination address on a letter, the receiving country has to send it back to Panama. Guess who gets blamed and who gets their refunds stolen.

  • (cs)

    5 x the lines 5 x the functions 5 x the productivity 5 x the maintenance

    5 x the pay, right?

    TRWTF is that he didn't make 5 copies of the array. WIW is wrong with the programmer?

    Sigh. How are we supposed to win when the turkeys outnumber the eagles 5:1?

  • Dotan Cohen (unregistered) in reply to Jeff Grigg
    Jeff Grigg:
    I would suggest that one should first trust the user to be able to type in an address that will get it to them. If they can't do that, then you have more serious problems than playing games with what characters to allow.

    OK, you should probably not allow non-printable characters.

    This is my address without nonprintable characters: 64/2 רמב"ם

    This is my address with nonprintable characters: ‫64/2 רמב"ם

    So even those are important.

  • Kasper (unregistered) in reply to Walter
    Walter:
    Well at least he implemented a whitelist not a blacklist.
    But a blacklist would have been easier to get right since it would have been a lot shorter than a whitelist.

    I have yet to come across a situation where the validity of an input field can be correctly validated using a list of valid characters. There are however cases, where a list of valid characters can provide a good enough approximation.

    If for example an input field contains a value such as a number or an IP address, then a whitelist of permitted characters could make sense. It would still permit lots of invalid inputs, but you'd rule out all inputs that could perform some sort of injection attack.

    As soon as your inputs start looking even slightly similar to free text, then forget about validating using a list of valid characters. The only proper approach to such fields is to consider all characters to be valid and apply proper escaping. Most of the possible escaping bugs can be caught by simply testing that you can indeed use all different characters.

  • David (unregistered) in reply to Jeff Grigg
    Jeff Grigg:
    Well, I find it helpful that they allow the single quote character ('), for SQL injection attacks. And the ampersand character (&) for HTML injection/cross-site scripting attacks. They're always working harder to make my life easier!!!

    ;->

    Because there's no need for anyone at Plummer's Landing to reliably get their mail, is there? Programmers don't get to mangle people's data for their convenience.

  • (cs) in reply to moreON
    moreON:
    So that's:
    if(expr){
        return !expr;
    }
    return expr;
    

    why not just

    return !expr;
    ?

    I hope you're trolling.

  • fjf (unregistered) in reply to dkf
    dkf:
    Anonymous:
    The requirement is mostly bogus, too... We've all seen the multi-page regex to (mostly) get email addresses right, and postal addresses are a lot more irregular. There is no way to construct a whitelist, except closely studying the postal guidelines of all the countries you're shipping to, going insane, then settle on a blacklist of the most esoteric Unicode blocks – it's most likely none of ⟺ or ☻ or 💂.

    Then again, why bother? Just print it on the label and have the postman figure it out.

    You don't know how true that is. I know someone who used to work on systems for automated reading of postal labels and routing of letters and parcels based on that (he might still be doing that, building post sorting machines for post offices) and he said that it was incredibly hard. He particularly noted that the UK was especially bad, as there is no standard address form (though we do have detailed postal codes to make things easier, and they're in common use). Ultimately, the best policy is indeed to just put whatever the user gave you on the label and let the post office deal with it.
    Interestingly, this is basically an extended version of the SPOT (Single Point Of Truth) principle. Though here, the point of truth is not in the application at all, but external (post office). I guess for some programmers and managers, this is too hard to swallow, so they rather do a half-assed, buggy job than not doing anything.

  • LK (unregistered)

    Of course TRWTF is that he replaces the valid characters in that string with spaces instead of empty strings. So he needs the 2nd loop to check each character instead of checking if the string is empty... :-)

  • fjf (unregistered) in reply to LK
    LK:
    Of course TRWTF is that he replaces the valid characters in that string with spaces instead of empty strings. So he needs the 2nd loop to check each character instead of checking if the string is empty... :-)
    If you think that's the RWTF, look again. For starters, the fact that he's doing a full string replacement for each valid character (whether it occurs or not) is much worse for performance. Once they support Unicode, these will be thousands.
  • Decius (unregistered)

    At least packages won't get lost in the æther.

  • Jimmy the Greek (unregistered) in reply to fjf
    fjf:
    dkf:
    Anonymous:
    The requirement is mostly bogus, too... We've all seen the multi-page regex to (mostly) get email addresses right, and postal addresses are a lot more irregular. There is no way to construct a whitelist, except closely studying the postal guidelines of all the countries you're shipping to, going insane, then settle on a blacklist of the most esoteric Unicode blocks – it's most likely none of ⟺ or ☻ or 💂.

    Then again, why bother? Just print it on the label and have the postman figure it out.

    You don't know how true that is. I know someone who used to work on systems for automated reading of postal labels and routing of letters and parcels based on that (he might still be doing that, building post sorting machines for post offices) and he said that it was incredibly hard. He particularly noted that the UK was especially bad, as there is no standard address form (though we do have detailed postal codes to make things easier, and they're in common use). Ultimately, the best policy is indeed to just put whatever the user gave you on the label and let the post office deal with it.
    Interestingly, this is basically an extended version of the SPOT (Single Point Of Truth) principle. Though here, the point of truth is not in the application at all, but external (post office). I guess for some programmers and managers, this is too hard to swallow, so they rather do a half-assed, buggy job than not doing anything.
    Generally, it is best to keep user data as is for things like this.
    Who are the experts in delivering to (hopefully) the right address - a programmer who thinks he understands addresses or a Post Office who deals with addresses every day? Of course, there may be some sense in ensuring there's no data injection, but beyond that the validity of the data requires an expert to assess it - so short of the the PO providing explicit rules (alla the pdf link someone posted) on how the address is parsed, the address should be left up to them to deal with....

  • Worf (unregistered)

    There is only one field that will be present on ALL postal addresses - the country. It's often omitted (and thus local is assumed), but it's present in all addresses.

    And this makes it a lot easier. If the country is your own, most post offices offer an address validation system - either online or offline subscription which will contain every valid address in the country. You can use it to validate in-country addresses.

    FOr out of country addresses, the postal service looks at the country field and routes it there and lets the local post office handle it.

    And let's not forget about Mojibake making things even more interesting.

  • A. Nonymous (unregistered) in reply to Victor
    Victor:
    /** Valid signs */

    You don't need to read any further than that ...

    This!

  • Some dude but not the other some dude (unregistered) in reply to Worf
    Worf:
    If the country is your own, most post offices offer an address validation system - either online or offline subscription which will contain every valid address in the country.
    ORLY? What is most? Do you mean most post offices in a certain country, or do you mean most postal sevices/systems?

    Pretty sure they don't in my country....

  • Norman Diamond (unregistered) in reply to Worf
    Worf:
    There is only one field that will be present on ALL postal addresses - the country.
    I wonder. When sending letters to Hong Kong now, I write Hong Kong, China. But in the past I didn't write Hong Kong, UK, I just wrote Hong Kong. I write Puerto Rico, USA, but if I were writing to Guam I'm not sure if I would write USA.

    Yahoo auctions resemble eBay. When I give my address to sellers, I omit the country because it's assumed that everyone is in Japan, and in case anyone ever doubted because of my foreign name they'd see my address starting with a postal code and 東京都 (Tokyo). Well, one time a seller shipped from China. They didn't insert 日本国 (Japan) which would go at the beginning in Chinese or Japanese order or at the end in common international addressing order. They just copied my address and mailed the package from China. The Chinese post office figured out what country Tokyo is in. (That predated war moves on the Senkaku islands. Who knows if they'd be cooperate today.)

  • argle bargle (unregistered) in reply to Kasper
    But a blacklist would have been easier to get right since it would have been a lot shorter than a whitelist.

    In the name of all that is holy, stay far away from me. Blacklists are a great way to ensure that your code is insecure. The fact that it may be easier to get right doesn't make it the right option. The submitter's code is terrible, but only because it is redundant, difficult to maintain, and needlessly inefficient. Doing it with a blacklist would make it redundant, difficult to maintain, inefficient AND insecure.

  • Jon Haugsand (unregistered) in reply to Norman Diamond
    Norman Diamond:
    Foo Bar:
    If Cedille, Eszett, and Y acute are all considered valid address characters, either the company expects to ship all over Europe, or the "programmer" is having too much fun with keyboard symbols.
    Not really. The company expects to ship to Scandinavia and Germany but not to Greece, Turkey, and some Slavic countries.

    Actually, not Norway and Denmark as the letters 'Æ' and 'Ø' are missing, e.g. "Færøy", http://www.faeroy.no/.

    /Jon

Leave a comment on “Classic WTF: Five Wrongs Don't Make a Right”

Log In or post as a guest

Replying to comment #:

« Return to Article