- Feature Articles
- CodeSOD
-
Error'd
- Most Recent Articles
- Secret Horror
- Not Impossible
- Monkeys
- Killing Time
- Hypersensitive
- Infallabella
- Doubled Daniel
- It Figures
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
But wait - if he'd have fixed the translation class frist, it would have continued to work!
Admin
moral of the story: don't reinvent the wheel.
Admin
But then I'll be out of a job D:
Admin
I like to think of this as leading to Java classes like AbstractAbstractAbstractFactoryFactory
Admin
abstracturbation? abshitraction!
Admin
Ok, this was like a horror movie where you think 'now it can't get worse' and then Boom
The Boom thing here was that little line about 'So he did the next best thing- he wrote a “translation” module that would, using regular expressions, convert the new-style XML files back into the old-style XML files.' The horror
And I even work with Perl daily and by and large like it (yes I know, just look for my horns and all :-)
But even I know enough that you should never parse XML with regexes! (Obligatory stackoverflow link: https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags)
Yazeran.
Plan: To go to Mars one day with a hammer.
Admin
Sometimes, when confronted with a problem, you think “I know, I'll use regular expressions.” Now you have two problems!
Admin
Mostly because your regular expressions become irregular expressions :P
Admin
Ah, one of those cases where abstraction actually just leads to obfuscation and confusion.
Admin
It's been a while since I looked at it, but I believe if you only use a subset of HTML and only use a well-formed version of that subset, you can create an HTML parser with regex. The issue is that HTML was standardized with tags that can nest out of order or in ways that can trip up a regex. That's not a bad thing, mind. Since HTML is a markup language, it works well for what it does.
If you write HTML that follows strict self-imposed rules created with regex parsing in mind, you can use a regex to parse it.
I also thought I read that XML could be parsed using regex, but the better question for either XML or HTML is: Why on earth would you want that?
Admin
I think I'd be using some expressions regularly when considering this code…
“WTF?!” “WTF?!” “WTF?!” “WTF?!” “WTF?!” “WTF?!”
Admin
This guy didn't reinvent the wheel; he reinvented the whole Ford.
He also made it out of Lego because then you can upgrade aerodynamics on the fly, or maybe replace the wheels with a better shape than round just in case something better is ever found.
Admin
That is one great StackOverflow answer
Admin
"This guy didn't reinvent the wheel; he reinvented the whole Ford."
I think this guy actually invented a car transporter to put a Model T Ford on so it could keep up with modern traffic.
Admin
ahhhh, dammitol, I was going to post that SO page. Well, maybe we can write a combination XML_and_datestring regex converter now.
Admin
I've actually worked with someone who did that for a data import process. Instead of tweaking processes to match the format(s) provided so we'd have ones that worked with those specific processes, the data was imported, then the table structures were massaged to match one specific format so we could just import everything from that process. That kind of ignored the way different systems could handle more information in some cases so that was just lost. I was a bit sad when I saw how all of that data conversion work was being handled, but ... it was being handled and completely outside of my domain. It was the whole "when all you (think you) have is a hammer, everything's a nail" mindset.
Admin
b̵̨͎̞̬̂̓̚̚͠l̶̜̤͔̊a̵̯̍͛̊k̵̨̥͓̮̦̅̚͝ȅ̵̠̳̭̅̽ỷ̷̢̹̪͝r̴̹̙͖̘̼͌́̽̋̏a̸͈͐̾̂t̷͈͛͑̑̽ ̷̺͍͇̦̿k̶̨̥͉͖̑̓̌̐͝n̷̢̩̰͔͛̒̈͊̂o̴͈͇͋͛̔͌c̴̡̳̮͈̓̍͌̒̚ḳ̴̨͇̮̔͛͗͝s̷̩̦̗̥̉̽̂̈́ ̸̨̛̺̱̋̈̂͘ạ̵̛͎͚̀̿t̵̯̹̮͂ ̴̰̹̭̟̓̀͆̈̓t̷̪̱̬̱̾̇̃h̸̭͐é̸̪͕̌ ̸̟̰̇͝d̶̨̤̲̊͌ǒ̷̡̝̱̆͠ȍ̵̭̫͙͉̜̇̇́̕r̶̠̳̤̱̈́̚ ̶̢̭͍̙̯̓̅͑y̸͇̋̂̓͝ȍ̴͕̙̟̙͕̊͝ȕ̸͎̲̮ ̶͇͖̊̋̔d̷͚̺̾̿͛̐ờ̶̹̜̀ ̷̨̤̺̎̽̈́n̵̮̹͙̔̀ơ̵̯͔͔̐͌t̷̢̗̟̒ͅͅ ̵̝̹̗͗͐ằ̵͙̹̰̏̃͌n̶̛̫̮̣̍̑̾̈́s̸͉̓͌̿͆w̶͎͊ͅe̷̥̳̔r̴̨̗̯̫͊́ ̶̧͖́b̷̡̹̞̭̈́̈́u̴̳͙̤͇͚̇͛͠t̴̹̹̲̽͗̅̏̈ ̵̱̟̰͚̱̂ḧ̷̪̤́͛͘͝ȩ̷͖̜̇̈́̍̕ ̶͚̱͍̞̍̉͌̅͒k̷͈̠̟͙͐̍͊͠ͅṉ̴̡̅͆͘ͅo̶̦͎̍w̸͈̲̄̽̽͂̄s̸̰̟̄̈̆̉͝ ̷̨̣͙̈̈̈͘ú̸̲̒̕͘͝n̶̠̘̱̱̳̄̓͐̎d̸͓̣̟̣̝̋̀̋̎e̶̼̬̱̊̿̀͗̽r̷̨̟͓͎͂̈́̂͛ ̶̘̯̝͗t̵̛̩̩̙̝̮̋̓͌̉ẖ̴̆e̶̝̍̀̂͘ ̶͍̈́̽̈́͘͘d̴̜̻̼͒̇̊̀͝ͅo̵̢̲̜̲̺̔͋ǒ̶̼̘̘̈́̄r̴̹̥̝̀̕ ̴̫̠̀͌͌͛͠a̶̟͇͈̬̠̎̈̈́̎ ̷͔̀͌͂̀m̸̟͊̔̑̽a̷̲̫̜̓͋ŝ̷̳̱̾͊͊s̷̹̈́̚͠ȉ̶̧̨̢͖̣͂͑v̴̢̻̹̺͂ê̸̛̬͍̳̟͈̇̄̓ ̴͓̺̙̓͐̂r̴̠̮̙̘̅͝͠a̵̢͓̙̒͌̾̚ͅt̸̪͍̀̀̕ ̵̭̔͛̓͘͜ť̶̗͔̖̯ḁ̴̻̤̰͎̓͗͂̇̕ỉ̴̫̃͝l̷̬̝̓̄ ̵̨̤̌̈͛̄͒ǐ̵͎̘̤̌̉n̴̡͎̾̂̀̄t̴̢̟͍̼͇̄͒r̶͎͎̐̏̚ͅų̵̱̲̬̦́͑̓̈̕d̵̮͖͖̏͜͝ȩ̵͑͐s̸͍͓̼̳̀ ̶̙̯̻̳̓̽̉́̕g̵̨̠̭̋̉̅͋͠r̴̡̡̦̰̉̿͒ͅa̸̙̠͎̅s̷̮̫̮̍̄́̊͘͜p̶̜͈̩̔͐̀̅͘i̵̡̛̱͌͗̾͝ͅn̴̖͛g̷̤͛͛͠ ̷͕͑̌͒̽̃t̶͈̘̲̦̃ḣ̸̘̣é̶́͝ͅ ̷̹̪̯̗̕͝a̷̖͇͛ͅḯ̴̲̭̩̊̌̽͝ŗ̴̧̫̝̏͐̔ ̶̹̥͍̗͖̉ṷ̵̡̣̃̍̀͋͜͠ͅṉ̶̀͝͝ṫ̴̻̳̝̤͗̐͗i̴̘̖̣̭̠̇́̉͠l̶͉͔̺͆̈ ̷̩͈̬̹͛͝c̷̢̮͕̻͐̿̊͛̀ĺ̶̫̖̱̯͌͊͝ȉ̷͔ͅc̶̛̞͕̬̈́͛k̴̛̯̪͋̽͠ ̸̱̅͘g̵̢̠̠̺͆̿͐ͅo̷̩̩̐̚̚͝͝ĕ̷̹͎̘ś̸̛͙̼͎̈́͑ ̵̺̔̎̇̿y̸̺̅̆̾̈́̈́o̶̳͔͂̋̈̕͠ų̷̪̟̻͙̍͒ṛ̸̱̟̥̅͜ ̴͕̹͍̪͎̀̏͝d̶̪͕̩̒̚͜ò̶̳̙̀́̚o̵̥̞͌̄̅͝r̷̦͆ ̵̧̭̝́̓͊̐͘a̷̛̟̭̝̰̓͂̚ń̴̜̼̽̈̈͘d̶̼̬̳̟̳̓̽͋̆ ̴̦̘̍̄͐̿͐į̵͔̫̬͔̉͒͌n̵̘̪̊̽ ̵̥̲͉̏͜č̷̣̳́o̸͉̪͘m̴͎͓̥̓e̸̤̠͒͛̍̕s̶͔̹͘ ̶͙͕̒̄̊t̶̡͖͙̝̺͆h̵̦͍͛͒̈́̔͘ȩ̷̭̈́ ̵̖̭͖̙̟͑̊̅̓͝ŕ̸͙͚̝͕̥̔̉a̷̻͖͌́͐t̴̡̻͙̠͇͆͗͝ ̷̦̆̾͌̀n̵̦̘͙͆́́̋̿o̷̳̔ẁ̸̩̇̔ ̴͓̝̻͛͋i̶͉̻̣̟̍̃͂̒ͅt̴̹͒̀̚͘s̸͙̣̞̍̅̊ ̸̻̎̅̃͒i̴̘̓̄s̴̺̖͔̑ ̶̳̯͔͙̫̊͊̇̀̚ö̴͙́v̵͎̘̫̱̔̏̌̈́e̶͉̝̾̿̌̾̚r̴̨̨͙̈́̎͊ ̴̥̩̫͘t̷͎̅̕h̵̠̹̫͍̉́̈́͘͠e̷͕̹͔̬̅ ̷͚̭͇̙̭͝e̸̡̲͓͔̅̈̔́ǹ̵̛̩͚̖d̴̞̟͍̩̕
Admin
When I got to the phrase "...it was a bit of a crapshoot...", my brain read that as "crashpoot". Which I have decided I really like and will have to start using in the future for WTF code that randomly flakes out.
Admin
Sometimes, when trying to make a joke, you think, "I know, I'll quote XKCD." Now everyone knows you have no sense of humor or creativity.
Admin
Sometimes, when trying to attribute a quote, you think, "I know, I'll attribute XKCD." Now everyone and Jamie Zawinski knows you're an ignoramus.
Admin
https://stevehartken.wordpress.com/
"For all we know, Steven Hawkin, British Emperialist, holds back the very essence of time itself. Don't trust anyone named Steven Hawkin. He could easily be Hitler's Genes!"
Admin
oh, you made me think of a story about Regina Vacuum cleaners: they told their engineers to ignore the fact that Plastic melts at high temperatures, thus producing plastic motors that wore out VERY quickly, THEN tried to lie about their sales figures by NOT counting the numerous returns on their quarterly reports...when they got found out, that was pretty much IT for them!
Admin
Did Steven change the name of the system as well? How many people would want to keep working on a system with that name? How about "Pearl Utility for Systematic Survey and Investigation - Enhanced Synergy"?
Addendum 2018-02-16 00:14: I meant "Perl", but my fingers settled on "Pearl".
Admin
Huh-huh. DICS. Huh-huh.
Admin
https://twitter.com/urgentProgram
Admin
Inner Platform Effect anyone?
Also, best comment thread ever.
Admin
So... if you yourself create the HTML and only Netscape 2 HTML at that, in other words, if you have no reason to parse your HTML using regexes since you already know exactly what went into it in the first place, then and only then you can parse HTML using regexes. Meanwhile, I can write haiku in Finnish, provided someone write one Finnish haiku for me and I stick to copying that haiku.
Admin
It is only do you find that life is an inner platform of continually death until you really find peace. Do no harm, listen to Jesus, and you will live forever. Just a serious of doors opening.
That doesn't mean you plan your day, or write code like you are writing the Bible!
Admin
@Andrew A. Gill
You sort of can and you can't represent everything in regex. It depends on what you mean by regex. Regex as it's defined theoretically doesn't have any memory other than its current state and current byte then the next combination of bytes. It's common for people to get this wrong either when they are using regex with advanced features or multiple regexes and doing what regex can't in their language of choice.
You can however make a regex representing all permutations that'll be able to match a lot of things you can't match with the normal thinking that comes with regex as long as your domain isn't "infinite". Regex can sometimes represent infinite sequences non-infinitely but not all. For finite sequences such as HTML to a depth of 10 you can represent all possible sequences with that in regex as literals. If you don't want to represent them literally you need something more powerful.
I once someone use regex in such a way using webscale architecture. Essentially generating nearly all valid permutations which took over a petabyte, then storing it on the cloud. You see that's how the cloud works. It doesn't matter how inefficient your solution is anymore. If you need a brazillian gigabytes the cloud's got your back. It brings a new meaning to overhead. I noticed the generator already has the logic needed to validate the text being processed. Five hours later and a brand spanking new 100 lines of code company operating expenses dropped by several million a year. I was then fired for making things all automatic. The CTO insisted that my solution had destroyed the company because it was no longer possible to change something in only one sequence. I learnt that day just how important webscale is and how empty our lives would be without it. I moved on to be a chef.
Admin
I would so have been tempted to create a Binary Alphanumeric Getter Of Files to incorporate into DICS.
Admin
So, what you are saying is that the important thing here is to pre-select a non-infinite domain, and then choose a flavor of "regexp" that extends the traditional Finite State Automaton by incorporating a theoretically unlimited look-ahead mechanism? Yes, that would work.
However, I beg leave to doubt that you have ever worked for a company that transfers petabytes of data to and from Teh Cloudz whenever such a regexp requires it. Best wishes in your new career as a chef!
Admin
http://peace4patience.s3-website-us-east-1.amazonaws.com/
Admin
There are some cases where it comes in useful to use such a subset. For example, imagine a program that finds prime numbers and prints them, one prime to a paragraph. Each time it detects a prime, it appends the new prime with p tags before and after to the already existing file and then tacks on the closing body and html tags in a footer. You know what goes in the file, and you're using a well formed subset. Other people on the internet can download your file and easily parse it with regular expressions.
It still doesn't explain WHY you'd want that in HTML as opposed to CSV or something else more useful, but the name of the game is Technically Possible.
Now if you'll excuse me, I have some soup waiting for me in a flour strainer.
Admin
HTML 2 (and pre-standard HTML) was far, far more of an unparseable abomination than XHTML, so I dunno where you're going with that.
Admin
And the Programmer spake unto his Computer, Three times shalt thou loop, and three times correspondingly shalt thou test thine exit condition. And on the First loop, He created the Factory and the XML Object And on the Second loop, He Tested the Return from that Factory, and saw that verily, it was very True. And on the Third loop, He Rested, and entered a Sleep, and threescore thrice times ten Milliseconds slept He.
And on the Fourth loop, exited He the Loop, and looked upon his works, and saw that indeed, it was Object Oriented.
Admin
Worked with a contractor who wrote code like this. It's SOLID you see as in everything becomes a separate class and everything is abstracted such that it is meaningless. He basically admitted to me that the reason for doing so was to keep himself in a job. Something that could be done in a few lines would suddenly become 20 classes and 10 design patterns of abstraction away such that you couldn't understand what was going on. The worst part was trying to help him fix bugs as no simple fix was ever enough, every simple fix to be applied involved creating some other class and using some other obscure design pattern that didn't really fit. The biggest problem was he was badly mismanaged and the projects requirements were constantly in flux meaning he had plenty of excuses to keep writing code in this manner. To quote Dijkstra:
"The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise."
Abstraction != Generic.
Admin
Perl has recursive regexes which can e.g. ensure XML tags match, and .NET has balancing groups which can be leveraged to a similar end, and there are probably other extensions around that give you enhanced capabilities beyond the usual regex.
Admin
Obfuscation-oriented programming at its finest!
Admin
Why, a class of course!
Admin
I literally gasped when I read the punchline.