- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Paranoid programming strikes again, I see.
Admin
Zeroth
Admin
My own technique for ensuring a field is a number is to use the language's internal convert-to-number method and then to trap out the exception. :-)
Admin
Paranoid programming indeed. There's no way this snippet will throw an exception and the isset() check is useless as a null value will also be caught by the is_numeric() check.
Admin
But the regex will prevent the function from returning negative numbers or stuff like "1.2e12", which is_numeric would allow, right?
Admin
@Void: you are correct. This function only return positive number, which use the decimal mark ".". So, if the database connection uses another locale, this will never return a number.
Admin
Also the string "Infinity" is a number which is blocked by the regex. I bet this is not a WTF and this double-checking was include because of performance optimization: If is_numeric() is much faster than checking with the regEx directly, and if in most cases there would be a non-number string submitted (let's say in 90% of all calls) then first applying is_numeric() would return in 90% of the calls very fast and only in 10% of the calls it would return slower, so the average performance increased as if you would have called it with the regex only in all cases.
Admin
Even if you don't trust the data source, you could just replace this entire function with floatval($value)
Admin
This regard doesn't make sense to me
/^[0-9]+(.[0-9]*|)$/
Must start with one or more numbers. Then must be followed by a period. Then can be followed by 0 or more numbers. Then the unescaped pipe symbol meaning or? Should it be ^[0-9]+.?[0-9]*$
Admin
Passing a non-scalar value to floatval() will emit a E_NOTICE level error, so i would at the very least have to be acompanied by a call to is_scalar()
Admin
Regard = regex
Typed it on a tablet
Admin
Actually must be followed by any character. Not just a period
Admin
A regex? Will it summon Cthulhu if you pass HTML to it?
Admin
So it looks like this was actually meant to do something like returning only the decimal portion of a number (the parentheses around the decimal part of the regex seem to suggest that). There's no built-in way to do do this in PHP and in this StackOverflow question, half of the answers indeed suggest using string operations to do it: http://stackoverflow.com/questions/6619377/how-to-get-whole-and-decimal-part-of-a-number
The bug in the code is a) the | operator, which ensures that the decimal part is actually optional, and b) the fact that instead of returning the decimal portion, it returns the original value. -_-
As for it being called only on DB values that are already numbers, it was written as a generic function and coding defensively in case someone else decides to use it on input that wasn't pre-validated is actually a good thing in my opinion. But yeah, the code doesn't do what it thinks it does or needs to do.
Admin
Go and read the article again; that has a key backslash which Markdown swallowed…
Admin
Ah you're right, but still it means that a decimal must be included but doesn't have to by followed by a number. Also wtf is up with the pipe at the end?
Admin
Also, the pattern match will fail on negative numbers.
Admin
(.[0-9]|) means “either .[0-9] OR an empty string”, effectively making the decimal part optional.
Admin
Ha ha ha
Admin
Er didn't know that. Good to learn. Thanks!
Admin
Actually I still don't get it. Would it not match
Without anything after the decimal?
Admin
True, but if it is used only in the context noted (decimals coming from the database), the original remark of 'copy/paste' is probably correct. They find a nice bit of code, and just use it everywhere-- even in places it's not needed.
Admin
Then I would say it's not a reusable function anyway. It only works for values coming from a statically typed database and only for those specific columns. Maybe the function should be renamed to "get_decimal_value_from_numeric_database_column_value". Not saying it would fix the issue of having this monstrosity, but at least it would clarify intent.
Admin
"So it looks like this was actually meant to do something like returning only the decimal portion of a number (the parentheses around the decimal part of the regex seem to suggest that)"
Facepalm
I can't believe I'm reading this. There isn't even an emoticon at the end to indicate it is tongue-in-cheek
Admin
Remy's disdain for dynamically typed languages because of type errors is misplaced I feel. In the hands of disciplined coders conventions in naming and documentation are sufficient. (Of course this website is all about programmers who should be disciplined - preferably by someone dressed in rubber with a whip.) I code in Python every day and type errors are rare in my team. If they did rise to the level of a problem Python 3 has optional type annotations on function calls. Still not a compile time answer but good unit tests should be able to catch bad function usages.
Admin
DELEte this! THis Website???
Delete e l e t e
this shitty fuuuuuuuuuuuuuuuuuuuuuuuuuuuuuucking website
Admin
Paranoid programming is good if done right. This is actually naive and paranoid programming. One leads to the other.
For example, I just would not bother with is_numeric because I am paranoid about it. It does way too many things that might change between versions to really mean anything and in reality you almost never ever need all of those things altogether. It's far too easy to be naive about the method even when you don't want to be. When you aren't naive it has so many quirks and edge cases that you have to consider them all.
Proper paranoid programming might be as such:
Pre checks...
Check the type is a string (because otherwise why would you run this one anything but data to be deserialised, if you have an int you know it's an int and don't touch it). Some people might be lazy and not think about what they are doing though, just running their function all the time on input. That is a kind of paranoia through laziness. You filter untrusted data in reality, not trusted data, except in really sensitive hotspots on special occasions. You might expect another type or a couple of types. Either way you should check that it's exactly what you expect and you should know what to expect. Anything else, you haven't anticipated or coded for. Alternatively, why code for lots of cases that you don't need to?
Check that the length is within reasonable bounds. You will probably always want this. It basically helps protect against overflow exploits, DDOS, etc.
The next things depend on circumstances. You may want to escape, filter or convert it. Converting it to the right type such as double is fine but the converter should be strict (it should throw an error on anything that could not be produced by the reverse conversion). It should actually only be filtered or converted on demand except in very specific cases. You only escape for HTML just before interpolating into HTML. You only except for SQL just before interpolating into an SQL string. If you use DOM or parametised SQL then that can be done for you.
Admin
When will those spambots learn the difference between hackers and crackers? :(
Admin
Once upon a time, I think that sort of thing was called detective work. Of course, once upon a time, that sort of thing required getting out of your chair.
And then as now, you still get in trouble for doing it without a license.
Admin
All regexes do!
Admin
Just note that if you expect to get a lof failed (non-number) cases, throwing all those exceptions can get expensive. Some comparison here: http://stackoverflow.com/questions/237159/whats-the-best-way-to-check-to-see-if-a-string-represents-an-integer-in-java
Admin
Just note that if you expect to get a lof failed (non-number) cases, throwing all those exceptions can get expensive. Some comparison here: http://stackoverflow.com/questions/237159/whats-the-best-way-to-check-to-see-if-a-string-represents-an-integer-in-java
Admin
Ummm, ten points for discussing the regex, but i do declare a FAIL by all; read the article much? Read the bit where it said the $value came from a DB and was guaranteed to both exist and be a number, perhaps becvause it was stored in a number-only data type column?
Fail
Admin
suzan white (unregistered) - genius. Come at me, bro.
Admin
Close. GET_potentiallydecimalNUMBER_FROM_potentiallydecimalNUMBER_AND_CONFIRM_DATABASE_UNDERSTANDS_NUMBER_IS_NUMBER_AND_THROW_EXCEPTION_IF_NUMBER_IS_STORED_IN_WORDS_BUT_NOT_ROMAN_NUMERALS
Admin
Hey, Copy Paste is one of the most productive programmers out there. Don't knock him.
Admin
For those of you questioning why do the regex after the IsNumeric call, have you considered that it might be running in a multithreaded process? I think this is good practice and will instruct all my subordinates to use this method going forward.
Admin
/^[0-9]+(.[0-9]*|)$/
I didn't even know you could to that.
I assume it's the same as...
/^[0-9]+(.[0-9]*$|$)/
That . is broken though and it supports 1. without a trailing number. My guess is that it wants to be... /^[0-9]+(.[0-9]+)?$/D
None of this deals with if the input is already an it or float properly or things like overflow, sign, etc. In an ideal world PHP would expose strict conversion functions that would do the job well enough. Honestly in this case it would probably be better off just to replace the whole lot with +$var. To understand why /D, RTFM.
Admin
There is a subtle difference between
(...|)
and(...)?
, and that's that the first one always captures a result and the second one doesn't (in JavaScript the relevant value would be""
vs.undefined
depending).Admin
not sure about in this programming language, but some places (like some SQL variants) isnumeric also accepts hexadecimal numbers as numeric (like "1A23456"), potentially causing problems if you're trying to insert into a numeric column.