• RLB (unregistered)

    \frist!\

  • bvs23bkv33 (unregistered)

    regex with 42 lookaheads can parse answer to life, the universe and everything

  • Nope (unregistered)

    Um, (?:...) is just a non-capturing group. (?=...) is a lookahead assertion.

    Non-capturing groups are generally a good idea in regexex, since capturing groups need to allocate and copy the captured strings.

  • someone (unregistered)

    Then when some company threatens to sue due to trademark infringement, P's peer will take it down, 75% of the web will automatically update their dependencies for some reason, and the npm people will have to put it back so everything doesn't break.

    TRWTF is npm and almost everything about it, especially its users. The number of dependencies that are installed (400 subfolders for one project in particular) the few times I've run npm install is astounding.

  • LCrawford (unregistered)

    This is why we can't have nice things anymore, such as URLS for ftp : ftp://mysite....

  • Error404 (unregistered)

    And now tell me how this expression makes sure that the URL is valid in the way that a fetch succeeds with 100% accuracy.

  • someone (unregistered) in reply to someone

    Just decided to have a look through those 400 folders; there's some quality stuff.

    assert-plus: This library is a super small wrapper over node's assert module beeper: Make your terminal beep caseless: wrap an object to set and get property with caseless semantics but also preserve caseing. decamelze: Convert a camelized string into a lowercased one with a custom separator detect-newline: Detect the dominant newline character of a string getpass: Get a password from the terminal. isarray: Array#isArray for older browsers. jsonify: This module provides Douglas Crockford's JSON implementation without modifying any globals. once: Only call a function once. randomatic: Generate randomized strings of a specified length, fast. slash: Convert Windows backslash paths to slash paths

    I could do this all day. And of course each of these has its own README, LICENSE, package.json and even test cases in some instances. WTF? These people almost make me ashamed to say I write JavaScript.

    (and is just me, or do yanks have this really odd obsession with appending "-ize" to everything they can?)

  • someone (unregistered) in reply to someone

    crap, those were meant to be on separate lines. Sorry. Is TRWTF me or the lack of a preview button?

  • RLB (unregistered) in reply to someone

    No, TRWTF is Markdown.

  • (nodebb)

    I wonder how many registered TLDs (https://www.iana.org/domains/root/db) it's failing on so far?

    (And I agree: (?:...) isn't a lookahead).

  • (nodebb) in reply to someone

    "Make your terminal beep caseless" Do upper-case beeps sound different to lower-case ones?

  • (nodebb) in reply to Watson

    When Wiley drops a crate of Acme anvils on the beep-beep, it will sound flat.

  • No use for a name (unregistered)

    As Nope already said, qualifier ?: just means non-capturing group.

  • Andrew (unregistered)

    Remy is TRWTF. Nope is right. Read the docs: (?:x) is not a lookahead. (?=x) is.

    https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp

  • my name is missing (unregistered)

    And people think APL is a write-only language.

  • trainbrain27 (unregistered) in reply to Applied Mediocrity

    When he attempts to drop a crate of Acme anvils on the beep-beep, HE will sound flat.

  • that other guy (unregistered) in reply to someone

    You don't tend to get those 400 subfolder deep dependencies any more because it uses a flat structure now. But your point still stands, far too many dependencies.

  • Rick Poleshuck (google) in reply to my name is missing

    It is possible to write readable useful regex. Mostly by not overusing it. But APL?

  • Foo AKA Fooo (unregistered) in reply to Watson

    They do in the assert module beeper.

  • isthisunique (unregistered)

    This story seems familiar. There have been many repeats of this story on here. Sometime ago I needed a regex to validate an IP and optionally a port. I decided to have some fun with it and spend an extra half hour trolling in the code to essentially make a regex that would ensure the integers are in the right range. This creates some unpleasant regex. Next step was to not have to repeat it for each IP address segment. I ended up doing this with lookaheads and encountered some nasty quirks and peculiarities of the regex implementation along the way. I submitted this with the comment "This is ZALGO." expecting to generate some amusement later from other developers but apparently it went through. I've made another version that's ever so slightly simpler. I'm going to make one more and then have the code run all of them and then choose whether it matches or not by sum(matches) > 1 and see if that makes a stir. I guess the problem is that no matter what I do, it works.

  • Little Bobby Tables (unregistered) in reply to Watson

    Similar to the difference between a girl beep and a boy beep.

  • (nodebb) in reply to Applied Mediocrity

    But the cactus that Wiley will inevitably run into will feel sharp. Oh look, a Mach 2 bird.

  • (nodebb) in reply to isthisunique

    Do any of them cope with IPs like 2002:abef:abef:abef:abef:abef:abef:192.168.1.1 or ::FFFF:192.168.1.1 or FF02::1 ?

  • Luis (unregistered)

    I was visiting this page from my phone, and I was like: "wow, that's a huge regex!"

    And then I realized the code snippet scrolls horizontally...

  • And .pron ? (unregistered)

    What kind of work do we live if .pron is not valid ?

  • (nodebb) in reply to Applied Mediocrity

    When Wiley drops a crate of Acme anvils on the beep-beep, it will sound flat.

    This deserves a retweet!

  • Ross Presser (google)

    Here it is on Debuggex

  • fox (unregistered)

    So I was learning how to program, and I was using PHP. I made a link shortener, and I'm sure that there are many WTFs in my code, but I had one particular idea to check if a URL was valid. Make a request myself and see what gets returned. So I used cURL, and only got the headers (thus invalidating links that didn't direct to a web server).

    Then I generate a short ID with some random base 36 data (first checking to see if that ID is in the database) and a simple lookup when presented with either the short ID or the original URL.

    Surprisingly robust as it still works, despite having moved around many servers, database types (it's now redis powered instead of mysql), and even PHP versions. XD

  • DCL (unregistered) in reply to Watson

    A related question: do upper case numerals compute the same as lower case numerals?

  • Name (unregistered) in reply to that other guy

    Much better to have spaghetti code and write all the code yourself then to rely on modular dependencies, am I right? I mean, just look at Linux -- a little utility for almost everything you can imagine.

  • P (unregistered)

    Original poster here. I tried to talk sense to them not to do that. They still do it anyway. The codebase is also a mess with bad overall design, the practices used in various places are not consistent, and there isn't even proper error handling (Error handling in JS isn't just try/catch; Promise errors can only be handled with .error, and for streams, .on('error', handler)). Once an error occurs the bot just crashes, often without any logs. Their solution? Run a .bat file that runs the bot in an infinite loop. As for this WTF, I did some research after they showed me this monstrosity, and it's indeed the most comprehensive one out of any other such regexes in public. I guess you can say they really out-performed the competition.

    Unfortunately I think this is the reality we have all day now: Reddit and Discord are filled with teens who write tiny pet projects, but have no awareness of good practices, or the discipline to look for/follow them. Their code are dubious, badly designed, and often clueless. Unlike typical CS students, they have the time and passion to delve into the wrong path very, very far (like this WTF). They treat anyone who can write barely functional code their "peer teacher". The list goes on. If you think junior devs/interns are clueless, think again.

    At least in the era of IRC bots in Perl, we don't get a place where everyone can post all the bad code collectively for others to see them. Enter GitHub: piles upon piles of such code are posted there every day, with lots of the fellows copying/following these code because, well, they don't know better either.

    (Meanwhile, at the opposite end we have Minecraft modding: the ecosystem really dislikes open sourcing your mods for some reasons. However! Since Minecraft is obfuscated Java build, every mod release only works for one version of Minecraft, and now Forge because nobody writes non-Forge mods anymore. Once the author abandons the mod or is gone, the mod is dead. Dead mods are everywhere. But that's for another WTF.)

  • P (unregistered) in reply to someone

    Then when some company threatens to sue due to trademark infringement, P's peer will take it down, 75% of the web will automatically update their dependencies for some reason, and the npm people will have to put it back so everything doesn't break. TRWTF is npm and almost everything about it, especially its users. The number of dependencies that are installed (400 subfolders for one project in particular) the few times I've run npm install is astounding.

    You mean the snowflake that decides to unpublish all his popular npm modules because of one of his modules has the same name as an IM software, and hence got claims?? The WTF is hardly npm, it's authors who can't bother just renaming their modules or anything. Instead of trying to communicate like a normal person, they have to scream oppression and dirty capitalism with ears covered, of course. Just like how they treat Microsoft all the time.

    But to be fair, on the other side of the coin, lots of JS devs are very clueless and they contribute to the huge dependencies of npm modules too. They can't be bothered to solve even the most trivial problems that can be done with basic knowledge of JS features, they have to add another module that happens to be on the top of google search results. It's how you get the likes of isArray.

  • I Have A Regex Hammer (unregistered)

    The most WTF thing is the article itself mistaking non-capturing group to look-ahead.

    The regex is still crappy, though. They haven't tested it against more recent TLDs and internationalized domain name. It might be understandable if that regex is used years ago when support for validating and parsing URL are scant, but on modern browser, URL can be validated with JavaScript API https://developer.mozilla.org/en-US/docs/Web/API/URL/URL or a

  • Frenk's buddy (unregistered)

    Frenk is watching you

  • anthrpopmorph (unregistered)

    actually, there is a whole Perl class. See the beauty: https://metacpan.org/source/ABIGAIL/Regexp-Common-2017060201/lib/Regexp/Common/URI Taking the RFCs into account.. It may not be small or whatever, but it is accurate.

  • (nodebb) in reply to Steve_The_Cynic

    Or better still, my favourite IP address, 2130706433

  • aaaaaa123456789 (github) in reply to Watson

    Yeah, one goes beep and the other one goes BEEP.

  • Chris (unregistered) in reply to Watson

    I also agree with the two commentators that "(?:" is just a non-capturing group. This allows to exactly specify the data you need to have extracted. So IMHO I would say this posted WTF is a WTF itself.

Leave a comment on “Look Ahead. Look Out!”

Log In or post as a guest

Replying to comment #:

« Return to Article