The Daily WTF: Curious Perversions in Information Technology

2016-10-24 Reply Admin

Frist

2016-10-24 Reply Admin

Is this missing an introductory paragraph?

2016-10-24 Reply Admin

Yeah, you kind of dropped that comment out of nowhere.

2016-10-24 Reply Admin

Somebody wants to explain the last, beautiful, entry more in detail?

2016-10-24 Reply Admin

No.

2016-10-24 Reply Admin

What completely baffles me is how someone could create or maintain that regular expression in the last example without it occurring to them that this is a really stupid way to go about things

2016-10-24 Reply Admin

What if out there somewhere is the secret to ancient magic and curing cancer and infinite power, or perhaps just rainbow unicorns, and it's just hidden in enterprise code to find a document ID?

2016-10-24 Reply Admin

After recently discovering that a valid RegEx pattern can cause an infinite loop (http://stackoverflow.com/questions/1200655/how-to-avoid-infinite-loops-in-the-net-regex-class), I've decided to never use RegEx again. Never.

2016-10-24 Reply Admin

the last in-house regex-like language seems to missing an ignore uppercase/lowercase functionality. So "hello" would become "[H|h][E|e][L|l][L|l][O|o]" - not very readable. Please note that in a real regex-language [H|h] would mean the 3 characters "H", "|" and "h" whereas here you must write a pipe as an or-method. Wouldn't [Hh] just work here, too? I guess not, so other characters in those square backets most likely have special meaning, too.

Second, the in-house regex-language is missing an question-mark-function and a \d function. So they wrote: ([C|c][P|p][K,<|k,<][0-9]{11})||([:#.$",'#-/|][C|c][P|p][K,<|k,<][0-9]{11} )||( [C|c][P|p][K,<|k,<][0-9]{11}[ :.$",'#-/|l\])||([:.$",'#-/|][C|c][P|p][K,<|k,<][0-9]{11}[:.$",'#-/|l\]) But this is similar to: y|xy|yx|xyx with: let x=[C|c][P|p][K,<|k,<][0-9]{11} let y=[:.$",'#-/|]

In a real regex language you could just have written (x)?y(x)? or if you have back-references: (x)?y(\1)?

That means: instead of having a 403 character-expression, you would have an 11-character expression in a real regex: ([:.$",'#-/|])?cpk\d{11}(\1)?

The real WTF is their limited in-house-regex. But maybe they don't need readablity, because the regex is generated code.

2016-10-24 Reply Admin

Maybe Kate should have read the documentation for parse_ini_file() before assuming it had the same behavior as using RegEx:

Note: There are reserved words which must not be used as keys for ini files. These include: null, yes, no, true, false, on, off, none. Values null, off, no and false result in "". Values on, yes and true result in "1". Characters ?{}|&~!()^" must not be used anywhere in the key and have a special meaning in the value.

Since PHP 5.3 you can use: parse_ini_file($iniFile, true, INI_SCANNER_RAW). Before that it was quite normal to use RegEx to avoid this behavior.

2016-10-24 Reply Admin

Remy missed one other way that REs are like a multi-tool: if the job is difficult enough that you should use the "real" knife, screwdriver, or pliers, using the multitool results in a bloody mess of broken parts.

dkf · 2016-10-24 Reply Admin

After recently discovering that a valid RegEx pattern can cause an infinite loop

Only because of the type of matching engine used. The engines that use stacks (as all the ones that trace a vague heritage from Perl do, most of them via PCRE) have a number of weaknesses, and this is one of those examples. Though I'm not sure that the loop is infinite; it might just be O!M!G! huge. (10¹⁰⁰ for sure isn't infinity, but it takes along time to count through all the same.)

Engines that use automata-theoretic approaches don't have this weakness, but can take a lot longer to compile REs and are far harder to debug when anything goes wrong. Nobody really understands finite-state automata at the best of times…

dkf · 2016-10-24 Reply Admin

And holy shitballs, the code that drives this site is bad. Suddenly, proper forum software looks better than I thought previously…

2016-10-24 Reply Admin

This site has saved my hide more than once. https://regex101.com/

2016-10-24 Reply Admin

Speaking of .ini files... Anyone know of a decent .ini parsing library in C? This would be for a microcontroller reading a micro SDcard.

Thanks, Wolf

2016-10-24 Reply Admin

Testing that on regex101: https://regex101.com/r/D50Fr4/1

pcre (php) gives: "Catastrophic backtracking" while Javascript produces a match.

2016-10-24 Reply Admin

On regular expressions:

Now you have two problems.

In my experience, if a regular expression is over ONE line, you really do have problems. If it more than 40 characters, you should start looking at your methodology.

2016-10-24 Reply Admin

Best fun I ever had was writing a regular expression to validate a UK vehicle registration plate.

Sorry, I've lost the use of the key that inserts irony tags into a comment.

2016-10-24 Reply Admin

Instead of using RegExs because "a valid RegEx pattern can cause an infinite loop", you're going to use a full programming language, where a valid program can include an infinite loop, or trashing any number of files, or opening a backdoor on the system? Seems that's out of the frying pan and into the fire.

2016-10-24 Reply Admin

Regular expressions are like threading: it isn't horrible to write, but maintenance gets worse and worse as time goes on. Both are incredibly powerful, sometimes even necessary, but if you don't use them properly, they will crush you in technical debt.

My soft rule for RegEx: If you have one giant RegEx doing several things, try to split it up into several small, easy to recognize RegEx. If you can write a RegEx to do exactly what you need, but it takes 3 lines, you can probably also write a series of 4 RegEx that are much easier to read, and can be combined to produce the same result.

For example, that giant RegEx is just one huge friggin OR statement. Instead of one huge RegEx, you could easily split it into 50 smaller ones, one for each format of whatever you are searching for. It is less efficient probably, though not much, but it also doesn't make the next guy's brain melt searching for that wayward close peren.

2016-10-24 Reply Admin

Also remember, do not parse html or xml with RegEx.

http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags

2016-10-24 Reply Admin

This is highly regular, Dave.

2016-10-24 Reply Admin

OK, the first two have bugs, but they're a reasonable use of regexen. That span in the first example could easily contain non-numbers as well as the number being incremented (e.g. a button or logo or text label), so it's not insane to fish out the number, increment or decrement it, and then put it back. And sanitising inputs is obviously a task for a regex.

The problem the larger regexen have is not using the /x switch (or equivalent in other languages). Obviously a regex is borderline illegible if you're not allowed whitespace or any other kind of formatting; but the same is true for any code.

2016-10-24 Reply Admin

Write it yourself, tailored to your needs; really, parsing INI files is as trivial as parsing can get, even if you get fancy and handle semicolon and hash comments.

2016-10-25 Reply Admin

I like to use The Regexp Coach for that, it works on Windows, but also in Wine.

2016-10-25 Reply Admin

Note: There are reserved words which must not be used as keys for ini files. These include: null, yes, no, true, false, on, off, none.

What about "FILE_NOT_FOUND"?

2016-10-25 Reply Admin

It takes a special kind of person to take the only class of formal languages where pretty much every interesting question is decidable (*) and implement it in a way where infinite loops are possible.

(*) The only counter example that comes immediately to my mind is 'Is language L regular?' which is an undecidable problem.

2016-10-28 Reply Admin

http://ndevilla.free.fr/iniparser/html/index.html

2016-11-03 Reply Admin

RegExes can even test prime numbers: /^1?$|^(11+?)\1+$/

2016-11-13 Reply Admin

"You have a problem and solve it with RegEx? Now you have two problem..."

urkerab · 2016-11-29 Reply Admin

([^]]*(]")?)+ is a very confusingly written regex though; I think it means the same as ([^]]|]")*, which is much simpler for the regex engine too.

urkerab · 2016-11-29 Reply Admin

Regexes can even verify Sudoku solutions! In Ruby, it's as short as ^(?!.*(?=(.))(.{9}+|(.(?!.{9}*$))+|(?>.(?!.{3}*$)|(.(?!.{27}*$)){7})+)\1).

anotherusername · 2016-12-28 Reply Admin

RegExes can even test prime numbers: /^1?$|^(11+?)\1+$/

:wtf: that's extremely clever, but you left out the important part. It's looking for a sequence of one 1, or some subsequence of length > 1 repeated exactly some number of times > 1. In other words, a non-prime number of 1s. First you have to make a string of the digit n repeated n times. Then the regexp returns false if n is prime and true if it's not. So the test for whether a number is prime would be like:

function isPrime(n) { return n % 1 == 0 && !/^1?$|^(11+?)\1+$/.test('1'.repeat(n)); }

Of course, the number can't be negative, or larger than the maximum string size, or it'll throw a RangeError... also, I'm not sure what the purpose of that ? is in the (11+?) term. It seems like it's not doing anything.

anotherusername · 2016-12-28 Reply Admin

(reply, since it's not letting me addendum that)

It's looking for a sequence of zero or one 1, or some subsequence of length > 1 repeated exactly some number of times > 1.

Keeping it Regular

Leave a comment on “Keeping it Regular”