The Daily WTF: Curious Perversions in Information Technology

foxyshadis · 2005-04-18 Reply Admin

At least his regex doesn't have a list of every possible valid area code and every valid exchange within each area code. ^_~

2005-04-18 Reply Admin

Actually the code works. Those periods are inside double quotes, so they are not operators. Here some sample PHP code:

$foo = "12";
$bar = "34";
echo $foo.$bar

Returns: 1234

But as in the example:

$foo = "12";
$bar = "34";
echo "$foo.$bar"

Returns: 12.34

loneprogrammer · 2005-04-18 Reply Admin

I don't think there is anything wrong with this code. Seriously.

If the regexp were matching against $phone1.$phone2.$phone3 (as in 1234567890) then it would have to be "^[[:digit:]]{10}$" to match 10 digits. But, doing it that way would mean you could supply a 2 digit area code, a one digit exchange, and a 7 digit number, and that would pass. The periods are there to prevent that. At the end, the periods are no longer needed, so the "normalized" phone number is stored as 10 digits.

Regexps are not a WTF, and the syntax of PHP using periods for string concatenation is not a WTF. If you find these things suprising or confusing, then the problem is with yourself, and not the code.

2005-04-18 Reply Admin

loneprogrammer:

I don't think there is anything wrong with this code. Seriously.
Regexps are not a WTF, and the syntax of PHP using periods for string concatenation is not a WTF. If you find these things suprising or confusing, then the problem is with yourself, and not the code.

WTF is not preserved like mass or energy! One can easily combine two non-WTF parts into a very WTF whole. This guy has created a whole lot of WTF out of nothing.

Charles Nadolski · 2005-04-18 Reply Admin

Whoever says this is NOT a WTF, Look more carefully at the code:

<font>"(^[[:digit:]]{3}).([[:digit:]]{3}).([[:digit:]]{4})$"</font>, <font>"$phone1.$phone2.$phone3"</font>

In the first half of the expression, the escape character is put before the period ( . ), so it's a literal period, in the second half of the expression, no escape character is used before the period, so there's some ambiguity here.

Personally, something exotic that's not a syntactic character, like the vertical-pipe |, should have been used.

Beek · 2005-04-18 Reply Admin

I agree, this seems like good code to me. Perhaps it needs a comment, and maybe using a character other than a period as the delimiter inside the tested string.

Beek · 2005-04-18 Reply Admin

Charles Nadolski:
Personally, something exotic that's not a syntactic character, like the vertical-pipe |, should have been used.

I believe the vertical pipe is a syntactic character in a regex. At least in perl it is.

Charles Nadolski · 2005-04-18 Reply Admin

Beek:
Charles Nadolski:
Personally, something exotic that's not a syntactic character, like the vertical-pipe |, should have been used.

I believe the vertical pipe is a syntactic character in a regex. At least in perl it is.

Okay nevermind then :) Are there any funky characters (accessible by the keyboard at least) not used in perl?

2005-04-18 Reply Admin

He's talking about what is in the else block....

$phone = $phone1.$phone2.$phone3;

that's not in quotes, therefore $phone would just be a 10 digit number.

loneprogrammer · 2005-04-18 Reply Admin

Charles Nadolski:
In the first half of the expression, the escape character is put before the period ( \. ), so it's a literal period, in the second half of the expression, no escape character is used before the period, so there's some ambiguity here.

I don't buy it. Not knowing the rules for how your language works is not an excuse, when it's clearly documented to work that way.

Manni · 2005-04-18 Reply Admin

I believe the WTf is that it looks ugly. In reality, I can't really think of a better way to do it. I come from the VB world, so the first thought is "Just check every part to see if IsNumeric() is true!" That way you know if someone entered non-numeric characters. This problem is compounded by the fact that a 1 or 2-digit area code would pass the test, so now you have to check IsNumeric and verify the length. You're not done yet though, because area codes of "3.2" or "-65" are both numeric and have the appropriate length. OK so make sure there are no hyphens or periods in it either.

That's three checks you have to do for each of the three portions of the phone number...obviously you'd write a function to do that for you. If the regexp code works, then I think it's more efficient than the type of validation I mentioned. Regexp just looks ugly to the untrained eye (which I have two of).

Manni · 2005-04-18 Reply Admin

Hrm, then again, in re-reading the initial post, maybe it's that the user is forced to enter the phone number into three separate boxes when ultimately the number is being stored as a single 10-digit number. I disagree with lone's post about being able to supply a "2 digit area code, a one digit exchange, and a 7 digit number", because there is a "maxlength" property on the text box. Sure there are ways around that, but for 99.9% of your user base (average non-malicious users who don't know crap about HTML), it's not an issue. Make a single text box, 10 character maxlength, and verify that you have a 10 digit number on the PHP side. Much easier regexp to deal with.

Besides, there is way too much emphasis put on this type of data validation. If the user isn't competent enough to type in their own phone number without screwing it up, then they're most likely at your site by accident when they were really searching for pancakes or something. To hell with 'em.

2005-04-18 Reply Admin

And this is why British phone numbers (01xxx xxxxxxx) are so much easier to deal with. Pity they had to complicate things like 020 xxx xxxx for London... 07xxx xxxxxxx is also a small problem for mobile phones if you take 01 for granted.

2005-04-18 Reply Admin

I don't see anything seriously wrong with the code.

For the regexp, you want the string to have '.', because it would be a much more complicated recipe to write if it weren't there. Any character could be used I suppose, but '.' is just as good as any other.

And for the final result, you don't want the '.' as it usually makes more sense to just store the numbers.

2005-04-18 Reply Admin

The WTF is he is using PHP!

2005-04-18 Reply Admin

Manni:
Hrm, then again, in re-reading the initial post, maybe it's that the user is forced to enter the phone number into three separate boxes when ultimately the number is being stored as a single 10-digit number. I disagree with lone's post about being able to supply a "2 digit area code, a one digit exchange, and a 7 digit number", because there is a "maxlength" property on the text box. Sure there are ways around that, but for 99.9% of your user base (average non-malicious users who don't know crap about HTML), it's not an issue. Make a single text box, 10 character maxlength, and verify that you have a 10 digit number on the PHP side. Much easier regexp to deal with.

Besides, there is way too much emphasis put on this type of data validation. If the user isn't competent enough to type in their own phone number without screwing it up, then they're most likely at your site by accident when they were really searching for pancakes or something. To hell with 'em.

I disagree that a 10 digit textbox is a better solution. Few people just enter 1234567890 as their phone number... they'll try entering 123-456-7890 or whatever and wont notice or care that their phone number is stored as "123-456-78".

If you split up the phone number into multiple, seperate form fields, you're more likely to get a valid number in the first place. If you have valid numbers, you don't need to screw around with validation routines to catch the 50,000 different ways people like to write phone numbers.

IMHO, this isn't a WTF... it may not be pretty, but it works. I've seen crappier solutions where VB and Java jockeys create unintelligible object systems to capture stupid data like this.

2005-04-18 Reply Admin

The WTF is the fact that their still using php! Silly Hippies!

2005-04-18 Reply Admin

Okay nevermind then :) Are there any funky characters (accessible by the keyboard at least) not used in perl?

Yes, but you need a APL keyboard for that...

2005-04-18 Reply Admin

This kind of thing drives me nuts.... First off, the three input boxes force the user to tab between them to type in phone number, or use the mouse, and you have to remember to only type the numbers.... All to save what? a little validation code? People, we write software for a living, if you're afraid to validate the various combinations of common phone entries (and we're only talking US types in this example) then what the heck are you doing writing code anyway....

Secondly, the guy knows a little regex, but clearly not much, nor how to Google. For something as common as a phone validation regex, there's a million. Here's one I pulled off regexlib (great site for n00b regex help) after about fifteen seconds of searching, and it even does a little number validation:

^([$]{1}[0-9]{3}[$]{1}[\.| |\-]{0,1}|^[0-9]{3}[\.|\-| ]?)?[0-9]{3}(\.|\-| )?[0-9]{4}$

-Andy

Charles Nadolski · 2005-04-18 Reply Admin

Anonymous:
^([$]{1}[0-9]{3}[$]{1}[\.| |\-]{0,1}|^[0-9]{3}[\.|\-| ]?)?[0-9]{3}(\.|\-| )?[0-9]{4}$

It's stuff like this that makes me glad I'm coding in C++/C# for non-web based applications :( It's like trying to read a whole paragraph written in 13375p33|<

loneprogrammer · 2005-04-18 Reply Admin

Charles Nadolski:
Anonymous:
^([$]{1}[0-9]{3}[$]{1}[\.| |\-]{0,1}|^[0-9]{3}[\.|\-| ]?)?[0-9]{3}(\.|\-| )?[0-9]{4}$

It's stuff like this that makes me glad I'm coding in C++/C# for non-web based applications :( It's like trying to read a whole paragraph written in 13375p33|<

I'll break it down for you.

$three_digits    = "[0-9]{3}";
$four_digits     = "[0-9]{4}";
$separator       = "(\.| |\-)?";    # Matches a period, space, dash, or nothing
$paren_area_code = "\($three_digits\)";
$area_code       = "(${paren_area_code}|${three_digits})${separator}";
$seven_digit_no  = "${three_digits}${separator}${four_digits}";
$phone_no        = "^(${area_code})?${seven_digit_no}$";

Can you follow that more easily?

Charles Nadolski · 2005-04-18 Reply Admin

loneprogrammer:
Charles Nadolski:
Anonymous:
^([$]{1}[0-9]{3}[$]{1}[\.| |\-]{0,1}|^[0-9]{3}[\.|\-| ]?)?[0-9]{3}(\.|\-| )?[0-9]{4}$

It's stuff like this that makes me glad I'm coding in C++/C# for non-web based applications :( It's like trying to read a whole paragraph written in 13375p33|<

I'll break it down for you.
$three_digits    = "[0-9]{3}";
$four_digits     = "[0-9]{4}";
$separator       = "(\.| |\-)?";    # Matches a period, space, dash, or nothing
$paren_area_code = "$$three_digits$";
$area_code       = "(${paren_area_code}|${three_digits})${separator}";
$seven_digit_no  = "${three_digits}${separator}${four_digits}";
$phone_no        = "^(${area_code})?${seven_digit_no}$";
Can you follow that more easily?

Yes! Thank you. My eyes have stopped bleeding now.

Brain · 2005-04-18 Reply Admin

If everyone made regexp like you do, noobs wouldn't be so afraid of it!
I know a lot of guys that just freak out when they see a simple regexp even if it's a simple one.

loneprogrammer · 2005-04-18 Reply Admin

Brain:
If everyone made regexp like you do, noobs wouldn't be so afraid of it!
I know a lot of guys that just freak out when they see a simple regexp even if it's a simple one.

This Perl library has a giantic (80 lines long) expression on their web page, just to scare newbies. In the code for the library, it is broken down into small, easy parts.

Brain · 2005-04-18 Reply Admin

loneprogrammer:
Brain:
If everyone made regexp like you do, noobs wouldn't be so afraid of it!
I know a lot of guys that just freak out when they see a simple regexp even if it's a simple one.

This Perl library has a giantic (80 lines long) expression on their web page, just to scare newbies. In the code for the library, it is broken down into small, easy parts.

Wow, that's huge! just imagine that you had to debug that expression without having it split in smaller easier parts... I believe, most of us would commit suicide :)

2005-04-18 Reply Admin

Anonymous:
This kind of thing drives me nuts.... First off, the three input boxes force the user to tab between them to type in phone number, or use the mouse, and you have to remember to only type the numbers.... All to save what? a little validation code? People, we write software for a living, if you're afraid to validate the various combinations of common phone entries (and we're only talking US types in this example) then what the heck are you doing writing code anyway....

Actually, I don't think this is WTF-worthy at all...

True, he could have done a better job of it, but at least each segment of ph# wasn't on a separate page... "enter first 3 digits, then press continue.." Three input boxes are also not too bad - Javascript can make the cursor automatically tab to the next when the required number of letters is recieved.

If I had to implement something like this I'd trigger a JS event every time a letter was pressed, and remove it if it wasn't 0-9 or '-'.... Makes the validation side MUCH easier.

foxyshadis · 2005-04-18 Reply Admin

When did simply stripping punctuation characters from the input become so hard? It annoys me off when I try to insert a - or . and it won't work for no good reason - mainly because firefox has some bugs where the textbox becomes uneditable in rare situations, so I think I'll have to refresh and restart. It's not quite as bad as the common "we're going to make you enter it one way and not tell you what that way is", but it's annoying. Server-side or post-insert normalization is friendlier. Not as big a deal for phone numbers, but say zip codes or name fields, I see some irritating restrictions on what's allowed (but on the other hand their bandend is probably too shabbily coded to handle it anyway).

Rant rant rant. Sorry.

I challenge you people to enter a phone number into three separate boxes over a cheap nokia phone, then come back and tell me about things like javascript and tab. hahaha.

And the alternative implementation to the backend isn't merging, btw, it's validating using three simple regexes (basically this one chopped in three), but since this is what regexes were made to do, it's a little silly.

2005-04-18 Reply Admin

Except for the (lack of) internationalization thinking, I don't think there is a major problem with the code.
In general, it is recomended not to validate phone numbers, if you want to ever have someone from abroad as customer.
Even in the U.S., why should I not be able to say 1-800-COLLECT ? Or whay no extension, in this case?

For international phones, if you need to split into fields, then provide a 4 of them
-country code
-area code
-phone number
-extension

Don't enforce the length or digits only. There are countries with no area codes.
The max. you can do is make sure the phone number is not empty.
And maybe ask for non-empty country code or area code (but not both, except when country is 1, US).

2005-04-18 Reply Admin

i think how the data is being saved also has some bearing in how they choose to obtain it. If for example, they are saving area codes specifically for statistics lets say, it might make sense to have a seperate field for entry for area code because it can be considered a seperate piece of data...

Lithorien · 2005-04-18 Reply Admin

I wonder why people don't just have one giant text box, and instructions? Such as:

Please enter your phone number in the form XXX-XXX-XXXX in the box below.
[ ] (space for 12 characters)

That would make more sense to me than trying to validate every single combination of phone number inputs there are..

2005-04-18 Reply Admin

The WTF is mostly all those people who reply and don’t seem to understand what the dots mean in the regex, in the string, and as operators.

The code is bizarre, but not a WTF. I’ve seen much, much worse things than this. Typical pretzel logic committed by a beginner, or maybe it actually had a purpose in the beginning, f.ex, deal with cases where someone enters their entire phone number in the first field.

Drak · 2005-04-19 Reply Admin

Lithorien:
I wonder why people don't just have one giant text box, and instructions? Such as:

Please enter your phone number in the form XXX-XXX-XXXX in the box below.
[ ] (space for 12 characters)

That would make more sense to me than trying to validate every single combination of phone number inputs there are..

There's but one reason for it (and sadly I get to fix 'bugs' like this every day...): people do not read.

You can put up warnings about deleting a file and it's unrecoverable and still someone will call our helpdesk saying 'ooh I accidentally deleted the file, can I get it back. Warning? What warning? I didn't see any warning, I just clicked yes blindly.'

Drak

felix · 2005-04-19 Reply Admin

So, what's the problem with having one big textbox? Let the user type the number any way s/he likes, then, on the server side, strip everything that's not a digit then warn if it's too long or too short. And check for valid area codes. Or whatever.

KISS :-)

P.S. Who cares whether it's done with regexps or not? I find re's the easiest way to deal with such a task, but it's really a matter of personal preferences, just like the programming language.

ammoQ · 2005-04-19 Reply Admin

even if it was a bug, it would hardly qualify as a WTF

2005-04-19 Reply Admin

Hear, hear! Thank you for pointing out the obvious to these lost souls mired in regular expressions.

I work on telecom software systems. I get to see all the interesting ways numbers are entered. Removing the punctuation, and checking for 10 digits, solves everything.

nsimeonov · 2005-04-19 Reply Admin

The real WTF for me is the language itself making you put an extra symbol in front of every variable and an expression for string concatenation involving the dot character.

Seeing this language reminds me of a real WTF situation - I heard a local contest in a school here, where the kids are competing, who will write a perl script using least characters, that should do as much as possible (being totally unreadable even by the author a couple of hours later). To be successful noone except the coder itself, should be able to figure out what the hell is going on, without using a computer to actually run the script.

2005-04-19 Reply Admin

lol - I agree.

loneprogrammer · 2005-04-19 Reply Admin

Typically, there are two kinds of coding contests, golf, and obfuscation.

In Golf, you try to come up with a program to do a certain task in the least amount of characters. Lowest score is the winner.

In Obfuscation, you write something where it is very difficult to figure out how the program works. Here is one in Perl. Not that Perl is the only way to write Obfuscation. The IOCCC is an annual contest for C programs.

nsimeonov · 2005-04-19 Reply Admin

loneprogrammer:
Typically, there are two kinds of coding contests, golf, and obfuscation.

In Golf, you try to come up with a program to do a certain task in the least amount of characters. Lowest score is the winner.

In Obfuscation, you write something where it is very difficult to figure out how the program works. Here is one in Perl. Not that Perl is the only way to write Obfuscation. The IOCCC is an annual contest for C programs.

Terrible! This competition is like "let's see who can produce the worst code" - a real maintenance nightmare. When I was younger I went to a few contests, where the only hard part was the task itself and all you should do was to solve the problem, noone was actually looking at the code - the judges were running a couple of tests and comparing the output to whatever was supposed to be.

Using it as a joke is ok IMO, but taking it seriously and trying to produce code like this is creating really bad habits, doesn't it?

I remember a meaningful contest about a 2 decades ago about writing assembler code in less than X bytes. This actually makes sense, doesn't it? They were also computing the CPU cycles neccessary to run and if 2 programs were about the same size the fastest was winning. The first one of these, as far as I recall, was to write a 7 bytes assembler code to divide a number by 7 for motorola's 6502 CPUs.

2005-04-19 Reply Admin

Good clean and compact code. I haven't yet heard a (good) argument that makes this a WTF.
(about the . and "" things... i mean... if you code PHP surely you know by heart how . and "" works, so no problem there either)

loneprogrammer · 2005-04-19 Reply Admin

nsimeonov:
Using it as a joke is ok IMO, but taking it seriously and trying to produce code like this is creating really bad habits, doesn't it?

You are right. I think point isn't to learn how to write code like this, but to be able to figure it out when your evil coworker writes this code and you have to fix it ;-)

Charles Nadolski · 2005-04-19 Reply Admin

One winner of the code obfuscation contest in C wrote a program consisting almost entirely of sequences of the letter A. I wish I had a link to it...

Stan Rogers · 2005-04-19 Reply Admin

felix:
So, what's the problem with having one big textbox? Let the user type the number any way s/he likes, then, on the server side, strip everything that's not a digit then warn if it's too long or too short. And check for valid area codes. Or whatever.

KISS :-)

P.S. Who cares whether it's done with regexps or not? I find re's the easiest way to deal with such a task, but it's really a matter of personal preferences, just like the programming language.

The problem is scalability. Back-end (server-side) validation is a must for data integrity, but wilfully generating server-side requests that could have been avoided by a simple (and yes, this can be simple) client-side check is architecturally unforgivable. And makes the user hang around waiting for nothing.

Irrelevant · 2005-04-19 Reply Admin

Andy:
^([$]{1}[0-9]{3}[$]{1}[\.| |\-]{0,1}|^[0-9]{3}[\.|\-| ]?)?[0-9]{3}(\.|\-| )?[0-9]{4}$

I would actually vote that to be more of a WTF than the TLP. Let's see...

"[$]{1}" -- OK, that's exactly one repetition "{1}" (eg one-to-three would be "{1,3}") of a character class "[]" (matches any of the chars in it) containing a literal "\" open-bracket "(". In other words, this could all be represented as "\(". The same for "[$]{1}". Also, if he _had_ to put the bracket in a character class, the escaping is unnecessary, as you only need to escape "]" and "-" in charclasses.

"[\.| |\-]{0,1}" that's between zero and one of [a literal dot, a pipe, a space, a pipe, or a literal dash]. "{0,1}" should be "?" (as the writer seems to become aware later on), and those pipes are clearly mixing the character class syntax with alternatives ("(foo|bar)", note parens).

"(\.|\-| )" is like "[\.| |\-]" done so it at least works, but still should be []s with no |s.

As a (relatively) minor point, "[0-9]" could be "\d" (digit), and unneccessary parts are repeated.

So, to recap...
^($\d{3}$|\d{3})?[ .\-]?\d{3}[ .\-]?\d{4}$
would do the same job as
^([$]{1}[0-9]{3}[$]{1}[\.| |\-]{0,1}|^[0-9]{3}[\.|\-| ]?)?[0-9]{3}(\.|\-| )?[0-9]{4}$
but would actually work.

Jasp · 2005-04-19 Reply Admin

Anonymous:
f.ex

WTF!? Let's reinvent the wheel :)

I'm of the opinion that the code isn't a WTF, just a bit silly

"$phone1$phone2$phone3" =~ /^\d{10}$/ (or its php equivalent) would have been much easier to understand.

Charles Nadolski · 2005-04-19 Reply Admin

Jasp:
Anonymous:
f.ex

WTF!? Let's reinvent the wheel :)

I'm of the opinion that the code isn't a WTF, just a bit silly

"$phone1$phone2$phone3" =~ /^\d{10}$/ (or its php equivalent) would have been much easier to understand.

Yes, but at somebody already pointed out, you could have 1 digit in phone1, 2 digits in phone2, and 7 digits in phone3 and still wind up with ten digits, but it would not be a proper phone number. (counter-argument would be of course that you implement a maximum/minimum character thingie in the boxes of the original web form)

sas · 2005-04-19 Reply Admin

Charles Nadolski:
Beek:
Charles Nadolski:
Personally, something exotic that's not a syntactic character, like the vertical-pipe |, should have been used.

I believe the vertical pipe is a syntactic character in a regex. At least in perl it is.

Okay nevermind then :) Are there any funky characters (accessible by the keyboard at least) not used in perl?

Well, no, actually, there aren't! :-) Some don't mean anything in a regex, but generally, you just escape them (as in our example).

I detest stupid friggin' input forms that force me to enter my phone, socialist security, or credit card number in a particular way. Even when coded correctly, which most are not. Look, all you need to do is slurp up 10 digits. Why should you care if your user likes to enter his phone number as say, (12) 34 56-789-9 ?

Extract the digits, format it and resdisplay for confirmation if you like.

2005-04-19 Reply Admin

Why not just

$phone1 =~ /^\d{3}$/ && $phone2 =~ /^\d{3}$/ && $phone3 =~ /^\d{4}$/

Seems to me this avoids a concatenation while still ensuring correct semantics. Plus, it is probably going to be faster because of the simplicity of the regex (at least in perl it would be, not sure about php).

Jasp · 2005-04-20 Reply Admin

Charles Nadolski:
Jasp:
Anonymous:
f.ex

WTF!? Let's reinvent the wheel :)

I'm of the opinion that the code isn't a WTF, just a bit silly

"$phone1$phone2$phone3" =~ /^\d{10}$/ (or its php equivalent) would have been much easier to understand.

Yes, but at somebody already pointed out, you could have 1 digit in phone1, 2 digits in phone2, and 7 digits in phone3 and still wind up with ten digits, but it would not be a proper phone number. (counter-argument would be of course that you implement a maximum/minimum character thingie in the boxes of the original web form)

The counter argument has already been implemented in the OP. That's why I suggested the 10 digit match only. There's no other way, unless the submitter of the form has hacked it at his end, for there to be ten digit.

2005-04-20 Reply Admin

I agree it is needlessly obtuse. Regexlib is a graveyard of ugly broken hacks that have locale/usage-specific hidden assumptions.

But on the other hand, regex is a shorthand notation to describe some syntax or a token that would look piss awful if you wrote it out in real code, too. I wouldn't want to read 10-20 lines or whatever it takes to replicate the validation that this regex here will do.

But to be fair, this particular one is a tad offensive, partly because it does needless quoting and has useless character classes. It's written by a moron who doesn't know regex.

For instance, {1} is noise that you could just take away entirely. Also, it uses (a|b|c) sometimes to do alteration and sometimes [abc] so it isn't consistent. Also, it sometimes confuses ( ) and [ ] with each other. And it has tons of unnecessary capturing groups ('( )') when it wants mere grouping ('(?: )').

If I were to rewrite this, it should look something like this:

^(?:(\d{3})|^\d{3})?[. -]?\d{3}[. -]?\d{4}

But even then, I don't think it really pays off to write a silly regex whose purpose is just to allow strings of form:

(123)-123-1234

with - being replaceable with nothing, . or space. And the leading parenthesis omitable. This doesn't even work in Finland because our phone numbers have varying amounts of digits and because people have always been allowed to group the numbers freely. Try this regex and find out that half of people don't have a "valid" phone number!

--
Alan

Doing The Splits

Leave a comment on “Doing The Splits”