- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
\frist!\
Admin
regex with 42 lookaheads can parse answer to life, the universe and everything
Admin
Um, (?:...) is just a non-capturing group. (?=...) is a lookahead assertion.
Non-capturing groups are generally a good idea in regexex, since capturing groups need to allocate and copy the captured strings.
Admin
Then when some company threatens to sue due to trademark infringement, P's peer will take it down, 75% of the web will automatically update their dependencies for some reason, and the npm people will have to put it back so everything doesn't break.
TRWTF is npm and almost everything about it, especially its users. The number of dependencies that are installed (400 subfolders for one project in particular) the few times I've run
npm install
is astounding.Admin
This is why we can't have nice things anymore, such as URLS for ftp : ftp://mysite....
Admin
And now tell me how this expression makes sure that the URL is valid in the way that a fetch succeeds with 100% accuracy.
Admin
Just decided to have a look through those 400 folders; there's some quality stuff.
I could do this all day. And of course each of these has its own README, LICENSE, package.json and even test cases in some instances. WTF? These people almost make me ashamed to say I write JavaScript.
(and is just me, or do yanks have this really odd obsession with appending "-ize" to everything they can?)
Admin
crap, those were meant to be on separate lines. Sorry. Is TRWTF me or the lack of a preview button?
Admin
No, TRWTF is Markdown.
Admin
I wonder how many registered TLDs (https://www.iana.org/domains/root/db) it's failing on so far?
(And I agree: (?:...) isn't a lookahead).
Admin
"Make your terminal beep caseless" Do upper-case beeps sound different to lower-case ones?
Admin
When Wiley drops a crate of Acme anvils on the beep-beep, it will sound flat.
Admin
As Nope already said, qualifier ?: just means non-capturing group.
Admin
Remy is TRWTF. Nope is right. Read the docs: (?:x) is not a lookahead. (?=x) is.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp
Admin
And people think APL is a write-only language.
Admin
When he attempts to drop a crate of Acme anvils on the beep-beep, HE will sound flat.
Admin
You don't tend to get those 400 subfolder deep dependencies any more because it uses a flat structure now. But your point still stands, far too many dependencies.
Admin
It is possible to write readable useful regex. Mostly by not overusing it. But APL?
Admin
They do in the assert module beeper.
Admin
This story seems familiar. There have been many repeats of this story on here. Sometime ago I needed a regex to validate an IP and optionally a port. I decided to have some fun with it and spend an extra half hour trolling in the code to essentially make a regex that would ensure the integers are in the right range. This creates some unpleasant regex. Next step was to not have to repeat it for each IP address segment. I ended up doing this with lookaheads and encountered some nasty quirks and peculiarities of the regex implementation along the way. I submitted this with the comment "This is ZALGO." expecting to generate some amusement later from other developers but apparently it went through. I've made another version that's ever so slightly simpler. I'm going to make one more and then have the code run all of them and then choose whether it matches or not by sum(matches) > 1 and see if that makes a stir. I guess the problem is that no matter what I do, it works.
Admin
Similar to the difference between a girl beep and a boy beep.
Admin
But the cactus that Wiley will inevitably run into will feel sharp. Oh look, a Mach 2 bird.
Admin
Do any of them cope with IPs like 2002:abef:abef:abef:abef:abef:abef:192.168.1.1 or ::FFFF:192.168.1.1 or FF02::1 ?
Admin
I was visiting this page from my phone, and I was like: "wow, that's a huge regex!"
And then I realized the code snippet scrolls horizontally...
Admin
What kind of work do we live if .pron is not valid ?
Admin
This deserves a retweet!
Admin
Here it is on Debuggex
Admin
So I was learning how to program, and I was using PHP. I made a link shortener, and I'm sure that there are many WTFs in my code, but I had one particular idea to check if a URL was valid. Make a request myself and see what gets returned. So I used cURL, and only got the headers (thus invalidating links that didn't direct to a web server).
Then I generate a short ID with some random base 36 data (first checking to see if that ID is in the database) and a simple lookup when presented with either the short ID or the original URL.
Surprisingly robust as it still works, despite having moved around many servers, database types (it's now redis powered instead of mysql), and even PHP versions. XD
Admin
A related question: do upper case numerals compute the same as lower case numerals?
Admin
Much better to have spaghetti code and write all the code yourself then to rely on modular dependencies, am I right? I mean, just look at Linux -- a little utility for almost everything you can imagine.
Admin
Original poster here. I tried to talk sense to them not to do that. They still do it anyway. The codebase is also a mess with bad overall design, the practices used in various places are not consistent, and there isn't even proper error handling (Error handling in JS isn't just
try/catch
; Promise errors can only be handled with.error
, and for streams,.on('error', handler)
). Once an error occurs the bot just crashes, often without any logs. Their solution? Run a .bat file that runs the bot in an infinite loop. As for this WTF, I did some research after they showed me this monstrosity, and it's indeed the most comprehensive one out of any other such regexes in public. I guess you can say they really out-performed the competition.Unfortunately I think this is the reality we have all day now: Reddit and Discord are filled with teens who write tiny pet projects, but have no awareness of good practices, or the discipline to look for/follow them. Their code are dubious, badly designed, and often clueless. Unlike typical CS students, they have the time and passion to delve into the wrong path very, very far (like this WTF). They treat anyone who can write barely functional code their "peer teacher". The list goes on. If you think junior devs/interns are clueless, think again.
At least in the era of IRC bots in Perl, we don't get a place where everyone can post all the bad code collectively for others to see them. Enter GitHub: piles upon piles of such code are posted there every day, with lots of the fellows copying/following these code because, well, they don't know better either.
(Meanwhile, at the opposite end we have Minecraft modding: the ecosystem really dislikes open sourcing your mods for some reasons. However! Since Minecraft is obfuscated Java build, every mod release only works for one version of Minecraft, and now Forge because nobody writes non-Forge mods anymore. Once the author abandons the mod or is gone, the mod is dead. Dead mods are everywhere. But that's for another WTF.)
Admin
You mean the snowflake that decides to unpublish all his popular npm modules because of one of his modules has the same name as an IM software, and hence got claims?? The WTF is hardly npm, it's authors who can't bother just renaming their modules or anything. Instead of trying to communicate like a normal person, they have to scream oppression and dirty capitalism with ears covered, of course. Just like how they treat Microsoft all the time.
But to be fair, on the other side of the coin, lots of JS devs are very clueless and they contribute to the huge dependencies of npm modules too. They can't be bothered to solve even the most trivial problems that can be done with basic knowledge of JS features, they have to add another module that happens to be on the top of google search results. It's how you get the likes of isArray.
Admin
The most WTF thing is the article itself mistaking non-capturing group to look-ahead.
The regex is still crappy, though. They haven't tested it against more recent TLDs and internationalized domain name. It might be understandable if that regex is used years ago when support for validating and parsing URL are scant, but on modern browser, URL can be validated with JavaScript API https://developer.mozilla.org/en-US/docs/Web/API/URL/URL or a
Admin
Frenk is watching you
Admin
actually, there is a whole Perl class. See the beauty: https://metacpan.org/source/ABIGAIL/Regexp-Common-2017060201/lib/Regexp/Common/URI Taking the RFCs into account.. It may not be small or whatever, but it is accurate.
Admin
Or better still, my favourite IP address, 2130706433
Admin
Yeah, one goes beep and the other one goes BEEP.
Admin
I also agree with the two commentators that "(?:" is just a non-capturing group. This allows to exactly specify the data you need to have extracted. So IMHO I would say this posted WTF is a WTF itself.