- Feature Articles
- CodeSOD
-
Error'd
- Most Recent Articles
- Secret Horror
- Not Impossible
- Monkeys
- Killing Time
- Hypersensitive
- Infallabella
- Doubled Daniel
- It Figures
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Frist
Admin
Is this missing an introductory paragraph?
Admin
Yeah, you kind of dropped that comment out of nowhere.
Admin
Somebody wants to explain the last, beautiful, entry more in detail?
Admin
No.
Admin
What completely baffles me is how someone could create or maintain that regular expression in the last example without it occurring to them that this is a really stupid way to go about things
Admin
What if out there somewhere is the secret to ancient magic and curing cancer and infinite power, or perhaps just rainbow unicorns, and it's just hidden in enterprise code to find a document ID?
Admin
After recently discovering that a valid RegEx pattern can cause an infinite loop (http://stackoverflow.com/questions/1200655/how-to-avoid-infinite-loops-in-the-net-regex-class), I've decided to never use RegEx again. Never.
Admin
the last in-house regex-like language seems to missing an ignore uppercase/lowercase functionality. So "hello" would become "[H|h][E|e][L|l][L|l][O|o]" - not very readable. Please note that in a real regex-language [H|h] would mean the 3 characters "H", "|" and "h" whereas here you must write a pipe as an or-method. Wouldn't [Hh] just work here, too? I guess not, so other characters in those square backets most likely have special meaning, too.
Second, the in-house regex-language is missing an question-mark-function and a \d function. So they wrote: ([C|c][P|p][K,<|k,<][0-9]{11})||([:#.$",'#-/|][C|c][P|p][K,<|k,<][0-9]{11} )||( [C|c][P|p][K,<|k,<][0-9]{11}[ :.$",'#-/|l\])||([:.$",'#-/|][C|c][P|p][K,<|k,<][0-9]{11}[:.$",'#-/|l\]) But this is similar to: y|xy|yx|xyx with: let x=[C|c][P|p][K,<|k,<][0-9]{11} let y=[:.$",'#-/|]
In a real regex language you could just have written (x)?y(x)? or if you have back-references: (x)?y(\1)?
That means: instead of having a 403 character-expression, you would have an 11-character expression in a real regex: ([:.$",'#-/|])?cpk\d{11}(\1)?
The real WTF is their limited in-house-regex. But maybe they don't need readablity, because the regex is generated code.
Admin
Maybe Kate should have read the documentation for parse_ini_file() before assuming it had the same behavior as using RegEx:
Note: There are reserved words which must not be used as keys for ini files. These include: null, yes, no, true, false, on, off, none. Values null, off, no and false result in "". Values on, yes and true result in "1". Characters ?{}|&~!()^" must not be used anywhere in the key and have a special meaning in the value.
Since PHP 5.3 you can use: parse_ini_file($iniFile, true, INI_SCANNER_RAW). Before that it was quite normal to use RegEx to avoid this behavior.
Admin
Remy missed one other way that REs are like a multi-tool: if the job is difficult enough that you should use the "real" knife, screwdriver, or pliers, using the multitool results in a bloody mess of broken parts.
Admin
Only because of the type of matching engine used. The engines that use stacks (as all the ones that trace a vague heritage from Perl do, most of them via PCRE) have a number of weaknesses, and this is one of those examples. Though I'm not sure that the loop is infinite; it might just be O!M!G! huge. (10100 for sure isn't infinity, but it takes along time to count through all the same.)
Engines that use automata-theoretic approaches don't have this weakness, but can take a lot longer to compile REs and are far harder to debug when anything goes wrong. Nobody really understands finite-state automata at the best of times…
Admin
And holy shitballs, the code that drives this site is bad. Suddenly, proper forum software looks better than I thought previously…
Admin
This site has saved my hide more than once. https://regex101.com/
Admin
Speaking of .ini files... Anyone know of a decent .ini parsing library in C? This would be for a microcontroller reading a micro SDcard.
Thanks, Wolf
Admin
Testing that on regex101: https://regex101.com/r/D50Fr4/1
pcre (php) gives: "Catastrophic backtracking" while Javascript produces a match.
Admin
On regular expressions:
Now you have two problems.
In my experience, if a regular expression is over ONE line, you really do have problems. If it more than 40 characters, you should start looking at your methodology.
Admin
Best fun I ever had was writing a regular expression to validate a UK vehicle registration plate.
Sorry, I've lost the use of the key that inserts irony tags into a comment.
Admin
Instead of using RegExs because "a valid RegEx pattern can cause an infinite loop", you're going to use a full programming language, where a valid program can include an infinite loop, or trashing any number of files, or opening a backdoor on the system? Seems that's out of the frying pan and into the fire.
Admin
Regular expressions are like threading: it isn't horrible to write, but maintenance gets worse and worse as time goes on. Both are incredibly powerful, sometimes even necessary, but if you don't use them properly, they will crush you in technical debt.
My soft rule for RegEx: If you have one giant RegEx doing several things, try to split it up into several small, easy to recognize RegEx. If you can write a RegEx to do exactly what you need, but it takes 3 lines, you can probably also write a series of 4 RegEx that are much easier to read, and can be combined to produce the same result.
For example, that giant RegEx is just one huge friggin OR statement. Instead of one huge RegEx, you could easily split it into 50 smaller ones, one for each format of whatever you are searching for. It is less efficient probably, though not much, but it also doesn't make the next guy's brain melt searching for that wayward close peren.
Admin
Also remember, do not parse html or xml with RegEx.
http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags
Admin
This is highly regular, Dave.
Admin
OK, the first two have bugs, but they're a reasonable use of regexen. That span in the first example could easily contain non-numbers as well as the number being incremented (e.g. a button or logo or text label), so it's not insane to fish out the number, increment or decrement it, and then put it back. And sanitising inputs is obviously a task for a regex.
The problem the larger regexen have is not using the /x switch (or equivalent in other languages). Obviously a regex is borderline illegible if you're not allowed whitespace or any other kind of formatting; but the same is true for any code.
Admin
Write it yourself, tailored to your needs; really, parsing INI files is as trivial as parsing can get, even if you get fancy and handle semicolon and hash comments.
Admin
I like to use The Regexp Coach for that, it works on Windows, but also in Wine.
Admin
What about "FILE_NOT_FOUND"?
Admin
It takes a special kind of person to take the only class of formal languages where pretty much every interesting question is decidable (*) and implement it in a way where infinite loops are possible.
(*) The only counter example that comes immediately to my mind is 'Is language L regular?' which is an undecidable problem.
Admin
http://ndevilla.free.fr/iniparser/html/index.html
Admin
RegExes can even test prime numbers: /^1?$|^(11+?)\1+$/
Admin
"You have a problem and solve it with RegEx? Now you have two problem..."
Admin
([^]]*(]")?)+
is a very confusingly written regex though; I think it means the same as([^]]|]")*
, which is much simpler for the regex engine too.Admin
Regexes can even verify Sudoku solutions! In Ruby, it's as short as
^(?!.*(?=(.))(.{9}+|(.(?!.{9}*$))+|(?>.(?!.{3}*$)|(.(?!.{27}*$)){7})+)\1)
.Admin
:wtf: that's extremely clever, but you left out the important part. It's looking for a sequence of one 1, or some subsequence of length > 1 repeated exactly some number of times > 1. In other words, a non-prime number of 1s. First you have to make a string of the digit n repeated n times. Then the regexp returns false if n is prime and true if it's not. So the test for whether a number is prime would be like:
function isPrime(n) { return n % 1 == 0 && !/^1?$|^(11+?)\1+$/.test('1'.repeat(n)); }
Of course, the number can't be negative, or larger than the maximum string size, or it'll throw a RangeError... also, I'm not sure what the purpose of that ? is in the (11+?) term. It seems like it's not doing anything.
Admin
(reply, since it's not letting me addendum that)