• (disco)

    At least they didn't use a hash symbol to separate all those branches...

  • (disco)

    This hurts my soul.

  • (disco)

    [:#.$",'#-/|]? [B|b|C|c|G|g|H|h|J|j|K|k|L|l|M|m|S|s|U|u|Y|y]? How does this syntax work, exactly? The regex parser implementation must be pretty interesting...

  • (disco)

    [B|b|C|c|G|g|H|h|J|j|K|k|L|l|M|m|S|s|U|u|Y|y]

    I can see case-insensitivity doesn't exist here.

  • (disco)

    My reaction after seeing that regex: [image]

  • (disco)

    When it comes to "write only languages", APL (with the original greek character set) is still king!!!!

  • (disco)

    Tried putting it into Debuggex

    Debuggex

    Did not work. Did I do something wrong?

    Filed Under: I need a graphical representation of that regex, please

  • (disco) in reply to Kuro
    Kuro:
    Tried putting it into Debuggex

    Debuggex

    Did not work. Did I do something wrong?

    Filed Under: I need a graphical representation of that regex, please

    the system doesn’t use widely implemented “Perl-compatible regular expressions” syntax, but instead, uses its own, slightly tweaked version.

    Online tools ain't gonna work.

  • (disco) in reply to Zacrath

    the system doesn’t use widely implemented “Perl-compatible regular expressions” syntax, but instead, uses its own, slightly tweaked version.

    :wtf::question: Why are they using their own syntax?

  • (disco) in reply to RaceProUK

    I suppose it's to anonymize the WTF...

    Filed Under: Or because this tweaked regex is so beautiful

  • (disco) in reply to RaceProUK
    RaceProUK:
    >the system doesn’t use widely implemented “Perl-compatible regular expressions” syntax, but instead, **uses its own, slightly tweaked version.**

    :wtf::question: Why are they using their own syntax?

    Because they're professionals. They're too good for regular regex. They need… Enterprise regex.


    Filed under: Regular regular expressions

  • (disco) in reply to RaceProUK

    At least it's not Asterisk's "Kinda-regex thing that's grossly underpowered. PCRE? Oh, we have that as well! No, you can't use it here where you really need it, what are you, crazy?"


    Filed under: Bonus: every phone manufacturer has their own incompatible implementation. Documented in Engrish with half on an example provided

  • (disco)

    'Perl is jokingly referred to as a “write-only language”'

    It's no joke.

  • (disco) in reply to RFoxmich
    RFoxmich:
    'Perl is jokingly referred to as a “write-only language”'

    It's no joke.

    Actually, there was one time when I successfully read Perl code. True story.


    Filed under Myth Busted

  • (disco) in reply to Zacrath
    Zacrath:
    Actually, there was one time when I successfully read Perl code
    The 'Hello World' example doesn't count :stuck_out_tongue:
  • (disco) in reply to RFoxmich

    Ok. I have to confess, that a lot of our code is written in Perl, and we do use RE's (mainly for sanitising user input in conjunction with taint mode), and yes Perl code can be written so it is maintainable for others than the original coder, but where is the fun in that... :-)

    But even I would say that that piece of RE is an abomination in the extreme. It is almost as bad as the email address walidation one ( http://www.ex-parrot.com/pdw/Mail-RFC822-Address.html )

  • (disco)

    I like regular expressions. They're handy and useful, really not that hard to understand, and the hate against them is poorly-misdirected cargo cultism. But after seeing today's article, last night's partially-digested dinner is soaking into my keyboard and the submitter owes me a new one.

  • (disco)

    So, was the keyboard still able to be used after it got sat on or did they need to get a new one?

  • (disco) in reply to Kuro
    Kuro:
    Filed Under: I need a graphical representation of that regex, please

    https://www.debuggex.com/r/lZb4WgK4HfkOHoWQ ?

  • (disco)

    Does the regex take more or less than 2 minutes to compile?

  • (disco) in reply to boomzilla

    I dunno but it is definitely bigger than a bread box.

  • (disco)

    There were a lot of repetitions in a way that makes me think it may have been automatically generated. In any case, I've filtered out the common parts into pseudocode. All constants are encased in tildes, eg ~CONST~

    Preserving newlines in the source and the order of operations with ||, we get http://pastebin.com/iMd3xUVA Ignoring newlines and reordering the || operations, we get http://pastebin.com/iMd3xUVA

    PAT_1 =24[A|a|B|b|C|c|F|f|K|k|M|m|T|t][0-9] PAT_2 =35[A|a|B|R|r|S|s|T|t|U|u][0-9] PAT_3 =04[C|c|D|d|F|f|V|v][0-9] PAT_4 =02[A|a|B|b|C|c|D|d|E|e|F|f][0-9] PAT_5 =21[A|a|C|c|D|d][0-9] PAT_6 =32[A|a|F|f|H|h|X|x|Y|y|Z|z][0-9] PAT_7 =40[A|a|C|c|D|d|S|s][0-9] PAT_8 =23[A|a|B|b|C|c|D|d|L|l|M|m][0-9] PAT_9 =15[D|d|E|e|R|r|T|t][0-9] PAT_10=[:#.$",'#-/|] PAT_11=[:#.$",'#-/|l\\] PAT_12=06[B|b|C|c|G|g|H|h|J|j|K|k|L|l|M|m|S|s|U|u|Y|y][0-9] PAT_13=01[A|a|C|c|D|d|E|e|R|r][0-9] PAT_14=[C|c][P|p][K,<|k,<][0-9] PAT_15=05[M|m|A|a][0-9] PAT_16=17[A|a|E|e|L|l|M|m|P|p|S|s|U|u|W|w][0-9] PAT_17=07[U|u][0-9] PAT_18=08[A|a][0-9] PAT_19=35[A|a|B|b|R|r|S|s|T|t|U|u][0-9] PAT_20=09[A|a|B|b|C|c|D|d|F|f][0-9] PAT_21=34[A|a][0-9] PAT_22=10[M|m|F|f][0-9] PAT_23=13[A|a][0-9] PAT_24=14[A|a][0-9] PAT_25=39[C|c|P|p][0-9] PAT_26=25[A|a][0-9] PAT_27=18[A|a][0-9] PAT_28=46[A|a|B|b][0-9]

    FUNC_1(ARG_1)= ~( ~ARG_1~{7} ) ~||(~PAT_10~~ARG_1~{7} ) ~||( ~ARG_1~{7}~PAT_11~) ~||([:#-/|]~ARG_1~{7}~PAT_11~)

    FUNC_2(ARG_1)= ~( ~ARG_1~{7} ) ~||(~PAT_10~~ARG_1~{7} ) ~||( ~ARG_1~{7}~PAT_11~) ~||(~PAT_10~~ARG_1~{7}~PAT_11~)

    FUNC_3(ARG_1)= ~( ~ARG_1~{9} ) ~||(~PAT_10~~ARG_1~{9} ) ~||(~ARG_1~{9}~PAT_11~) ~||(~PAT_10~~ARG_1~{9}~PAT_11~)

    FUNC_4(ARG_1)= ~( ~ARG_1~{9} ) ~|| (~PAT_10~~ARG_1~{9} ) ~||( ~ARG_1~{9}~PAT_11~) ~||(~PAT_10~~ARG_1~{9}~PAT_11~)

    FUNC_7(ARG_1)= ~( ~ARG_1~{11} ) ~||(~PAT_10~~ARG_1~{11} ) ~||( ~ARG_1~{11}[:.$",'#-/|l\]) ~||([:.$",'#-/|]~ARG_1~{11}[:.$",'#-/|l\])

    FUNC_8(ARG_1)= ~FUNC_2(~ARG_1~) ~||FUNC_3(~ARG_1~)


    RESULTING "REGEXP"

    ([:-.,;/\(]{0,2}(FUNC_8(~PAT_1~) ||FUNC_2(~PAT_2~) ||FUNC_8(~PAT_3~) ||FUNC_3(~PAT_4~) ||FUNC_1(~PAT_4~) ||FUNC_8(~PAT_5~) ||FUNC_8(~PAT_6~) ||FUNC_8(~PAT_7~) ||FUNC_8(~PAT_8~) ||FUNC_8(~PAT_9~) ||FUNC_8(~PAT_12~) ||FUNC_1(~PAT_13~) ||FUNC_3(~PAT_13~) ||FUNC_7(~PAT_14~) ||FUNC_8(~PAT_15~) ||FUNC_8(~PAT_16~) ||FUNC_8(~PAT_17~) ||FUNC_8(~PAT_18~) ||FUNC_3(~PAT_19~) ||FUNC_4(~PAT_20~) ||FUNC_2(~PAT_20~) ||FUNC_8(~PAT_21~) ||FUNC_8(~PAT_22~) ||FUNC_8(~PAT_23~) ||FUNC_8(~PAT_24~) ||FUNC_8(~PAT_25~) ||FUNC_8(~PAT_26~) ||FUNC_8(~PAT_27~) ||FUNC_8(~PAT_28~) )[-.,;:Il|/\]{0,2} )

    ... Nope, Still makes no sense.
    Filed under: Using quotes because the code block's preview is threatening to mangle the output with all kinds of discoshit
  • (disco) in reply to mott555

    You might want to invest in a keyboard like this. It can be rinsed under water after such incidents:

    [image]
  • (disco)

    If that's not Codethulu, I don't know what is.

  • (disco) in reply to chreng
    chreng:
    It can be rinsed under water

    Most keyboards survive a trip through the dishwasher quite well, as long as you don't use detergent and give them a week to dry out again before plugging them back in.

  • (disco) in reply to RaceProUK

    Doctor, I feel sharp pain every morning when I put on my slippers. Any idea what it might be?

  • (disco) in reply to VinDuv
    VinDuv:
    [:#.$",'#-/|]? [B|b|C|c|G|g|H|h|J|j|K|k|L|l|M|m|S|s|U|u|Y|y]? How does this syntax work, exactly?

    Well, either they're using a nonstandard regex parser that overloads the [ ] operator to mean both "character class match" and "non-capturing group to limit the scope of the | operator", or whoever wrote that regex doesn't really understand how character classes work.

    It would be interesting to run the big ugly regex against input containing a lot of | characters to see if it matches a whole bunch of things it shouldn't. And by "interesting" I mean "something it would be amusing to hear about over morning coffee" as opposed to "something I am even vaguely motivated to do myself".

  • (disco) in reply to flabdablet
    flabdablet:
    whoever wrote that regex doesn't really understand how character classes work.

    My bet is on this one. :ive_seen_some_things.dll:

  • (disco)

    Yes, regular expressions are nice, but if they take up more than a line (for various definitions of line) they become VERY write only. If you want the next guy (or yourself in 6 months) to understand it, keep it SHORT!

    Obviously this wasn't a consideration for the expression in the article. The author should be kicked to the curb!

  • (disco) in reply to chreng

    That doesn't look clicky enough.

  • (disco) in reply to Yazeran

    There's nothing really wrong with that one, past being complex. It's pretty legible:

    sub make_rfc822re {
    #   Basic lexical tokens are specials, domain_literal, quoted_string, atom, and
    #   comment.  We must allow for lwsp (or comments) after each of these.
    #   This regexp will only work on addresses which have had comments stripped 
    #   and replaced with lwsp.
    
        my $specials = '()<>@,;:\\\\".\\[\\]';
        my $controls = '\\000-\\031';
    
        my $dtext = "[^\\[\\]\\r\\\\]";
        my $domain_literal = "\\[(?:$dtext|\\\\.)*\\]$lwsp*";
    
        my $quoted_string = "\"(?:[^\\\"\\r\\\\]|\\\\.|$lwsp)*\"$lwsp*";
    
    #   Use zero-width assertion to spot the limit of an atom.  A simple 
    #   $lwsp* causes the regexp engine to hang occasionally.
        my $atom = "[^$specials $controls]+(?:$lwsp+|\\Z|(?=[\\[\"$specials]))";
        my $word = "(?:$atom|$quoted_string)";
        my $localpart = "$word(?:\\.$lwsp*$word)*";
    
        my $sub_domain = "(?:$atom|$domain_literal)";
        my $domain = "$sub_domain(?:\\.$lwsp*$sub_domain)*";
    
        my $addr_spec = "$localpart\@$lwsp*$domain";
    
        my $phrase = "$word*";
        my $route = "(?:\@$domain(?:,\@$lwsp*$domain)*:$lwsp*)";
        my $route_addr = "\\<$lwsp*$route?$addr_spec\\>$lwsp*";
        my $mailbox = "(?:$addr_spec|$phrase$route_addr)";
    
        my $group = "$phrase:$lwsp*(?:$mailbox(?:,\\s*$mailbox)*)?;\\s*";
        my $address = "(?:$mailbox|$group)";
    
        return "$lwsp*$address";
    }
    
  • (disco) in reply to flabdablet
    flabdablet:
    Most keyboards survive a trip through the dishwasher quite well, as long as you don't use detergent and give them a week to dry out again before plugging them back in.
    [I would prefer to take them apart though](http://www.howtogeek.com/65915/how-to-clean-your-filthy-keyboard-in-the-dishwasher-without-ruining-it/)...

    Even the trusty IBM Model M can have water seeping into the conductive layers, and I don't think those will dry unless you take them apart.

  • (disco) in reply to JBert
    JBert:
    Even the trusty IBM Model M can have water seeping into the conductive layers, and I don't think those will dry unless you take them apart.
    Bake in an autoclave for a while.
  • (disco) in reply to Zacrath
    Zacrath:
    Because they're professionals. They're too good for regular regex. They need… Enterpriseadvanced regex.
    [image]
  • (disco) in reply to mott555

    I think people are often too afraid to actually parse text into tokens, you know, take advantage of actual loops and conditionals that computers are good at, rather than limiting themselves to a mathematical model of computers (nondeterministic finite automatons) that's often too limited and hard to think about. Maybe programming languages don't make it as easy as it should.

  • (disco)
    TDWTF:
    "Perl is jokingly referred to as a “write-only language”. This is because Perl’s primary solution to any problem is to throw a regular expression at it."
    False. Perl has this reputation because of its dense, cryptic, shortcut-loving syntax. Insanely complicated regexes can be written in any language.
  • (disco) in reply to Kuro
    Kuro:
    I need a graphical representation of that regex, please
    operagost:
    If that's not Codethulu, I don't know what is.

    Right.

    http://tvtropes.org/pmwiki/pmwiki.php/Main/EldritchAbomination

  • (disco) in reply to oesor

    I think you can sanitise that even more by using qr() instead of ""

  • (disco) in reply to oesor
    oesor:
    It's pretty legible

    @Maciejasjmj's law #1: every discussion about regexes, no matter how insane, will always have at least one person stating that it's easy, makes perfect sense, and that they have no idea what everyone else's complaining about.

  • (disco) in reply to herby
    herby:
    Yes, regular expressions are nice, but if they take up more than a line (for various definitions of line) they become VERY write only.
    This makes them fun to use in InDesign, where the text box for the regex for a GREP style (character formatting that’ll be applied to what the regex matches) is maybe 30 characters wide, so if you’re trying to do anything more complex than matching some characters — like doing lookbehinds and lookaheads — you’ll often find yourself not being able to even see the whole regex.

    Add some poor UI design (or a bug, not sure what it is) so that if you press left arrow key when the insertion point is at the end of the text box (not the end of the regex), the box contents scroll right as well as moving the insertion point — keeping the insertion point at the end, so you can’t see what’s to the right of it. Oh, and a minor bug which makes the arrow keys not move the insertion point at all if you’ve changed focus from InDesign and back while the style edit window is open.

  • (disco)

    Trying to parse the regex, but I just can't understand it at all.

    https://www.youtube.com/watch?v=2aegP8j5al0

    EDIT: This regex... it truly is the code of noodles.

  • (disco)

    Okay, so take a closer look at the start of this regex ...

    ([:-.,;/\

    That's not actually a regex -- it's an emoticon of someone puking up.

  • (disco) in reply to dkf
    dkf:
    :ive_seen_some_things.dll:

    I wrote a regex that parses a regex and runs it part by part because the only function that supports regexes can only handle a single backreference...

    :ive_done_some_things.deleteme:

  • (disco) in reply to operagost
    TheCPUWizard:
    When it comes to "write only languages", APL (with the original greek character set) is still king!!!!

    Wrong-o! APL is quite understandable once you get used to it. Just read right to left and the import of any statement is mathematically clear.

    This stuff goes ...well I don't know how it goes ... I don't think this is even anchored to our universe.

    Not to mention that it gives one a ... contaminated feeling...

    Case in point:

    chreng:
    You might want to invest in a keyboard like this. It can be rinsed under water after such incidents:

    See, at least APL is clean, if a little dense. This is: Unclean! Unclean!

    How unclean?

    operagost:
    If that's not Codethulu, I don't know what is.

    See? Really, really: Unclean!

  • (disco) in reply to CoyneTheDup
    CoyneTheDup:
    Wrong-o! APL is quite understandable once you get used to it.
    Maciejasjmj:
    @Maciejasjmj's law #1: every discussion about regexes, no matter how insane, will always have at least one person stating that it's easy, makes perfect sense, and that they have no idea what everyone else's complaining about.

    Corollary: this can be extended to any other language.

  • (disco) in reply to anonymous234
    anonymous234:
    I think people are often too afraid to actually parse text into tokens, you know, take advantage of actual loops and conditionals that computers are good at

    It certainly is possible to do it and it can be much more easily maintainable, because the logic can be made explicit and so can the tokens. The difficulty is that the next person along the line may not be the kind of programmer that understands natural language parsing, whereas so many think that regexes are kewl. And they will then be faced with the need to extend the model and start trying to use regexes, thus producing a complete fustercluck.

  • (disco) in reply to Maciejasjmj
    Maciejasjmj:
    Corollary: this can be extended to any other language.

    Has anyone ported PCRE to Brainfuck yet?

  • (disco)

    Are we going to have a fight about who used / knows the worst language?

    No PHP, that's no fair, like Chuck Norris.

  • (disco)
    kupfernigk:
    and I don't want to turn into an orangutan
    Ook. Ook? Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook.
    Ook. Ook. Ook. Ook. Ook! Ook? Ook? Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook.
    Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook? Ook! Ook! Ook? Ook! Ook? Ook.
    Ook! Ook. Ook. Ook? Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook.
    Ook. Ook. Ook! Ook? Ook? Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook?
    Ook! Ook! Ook? Ook! Ook? Ook. Ook. Ook. Ook! Ook. Ook. Ook. Ook. Ook. Ook. Ook.
    Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook! Ook. Ook! Ook. Ook. Ook. Ook. Ook.
    Ook. Ook. Ook! Ook. Ook. Ook? Ook. Ook? Ook. Ook? Ook. Ook. Ook. Ook. Ook. Ook.
    Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook! Ook? Ook? Ook. Ook. Ook.
    Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook? Ook! Ook! Ook? Ook! Ook? Ook. Ook! Ook.
    Ook. Ook? Ook. Ook? Ook. Ook? Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook.
    Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook! Ook? Ook? Ook. Ook. Ook.
    Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook. Ook.
    Ook. Ook? Ook! Ook! Ook? Ook! Ook? Ook. Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook.
    Ook? Ook. Ook? Ook. Ook? Ook. Ook? Ook. Ook! Ook. Ook. Ook. Ook. Ook. Ook. Ook.
    Ook! Ook. Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook.
    Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook! Ook!
    Ook! Ook. Ook. Ook? Ook. Ook? Ook. Ook. Ook! Ook.
    
  • (disco) in reply to Maciejasjmj

    You lost me at "Ook!", could you provide a little contextualisation? Or a banana?

Leave a comment on “Regularly Expressing Hate”

Log In or post as a guest

Replying to comment #:

« Return to Article