• (disco)

    I imagine this is what movie producers would like to see on hackers screens....

  • (disco)

    I saw regexes and bailed. (Although I do seem to love regex abuse, but it's only okay when I do it).

  • (disco)

    Those variable names smell decompiled. Or obfuscated.

  • (disco)

    Just be glad it wasn't regex replaces transforming XSLT to HTML

  • (disco) in reply to PleegWat
    PleegWat:
    Those variable names smell decompiled. Or obfuscated.
    A decompiler would likely not append nearly everything with `tempX` and then not include others, or sometimes short things like `obj3` versus `pattern1`.

    All signs point to exit stage left.

  • (disco) in reply to PleegWat

    There's a person on my team who likes to join subqueries using aliases like TEMP, TEMP1, TEMP2, etc.

  • (disco)

    I'm not sure which was a better read, the article, or the Stack Overflow comment linked in it. I'm leaning towards the SO comment.

  • (disco) in reply to boomzilla
    boomzilla:
    join subqueries using aliases like TEMP, TEMP1, TEMP2, etc
    Ah, I must have met his padawan, who uses T1, T2, T3, etc.

    Filed under: But it saves three bytes for each query! That's optimized!

  • (disco) in reply to Fox
    Fox:
    I'm not sure which was a better read, the article, or the Stack Overflow comment linked in it. I'm leaning towards the SO comment.

    Apart from the fact that it has been linked here pretty much every time regexes or parsing HTML were mentioned ...

  • (disco) in reply to aliceif

    Well I'm newish here, so sue me for not having seen it before.

  • (disco) in reply to Fox

    It has been linked all over the internet in every regex discussion.

    Better now?

    Just look for HTML Regex Zalgo.

    Oh and: http://blog.codinghorror.com/parsing-html-the-cthulhu-way/

    And: http://meta.stackoverflow.com/questions/261561/please-stop-linking-to-the-zalgo-anti-cthulhu-regex-rant

  • (disco) in reply to aliceif

    Okay well I haven't had the misfortune of seeing or being involved with anything regex+html related, so it's new to me.

    Better?

  • (disco) in reply to boomzilla
    boomzilla:
    There's a person on my team who likes to join subqueries using aliases like TEMP, TEMP1, TEMP2, etc

    I used to work with someone who aliased tables in joins by going through the alphabet

    select columns
    from table1 a
    join table2 b
    on things
    join table3 c
    on things
    

    Trying to work out where g.column was coming from in a complicated join was a nightmare

  • (disco)

    can't believe this is real...

  • (disco) in reply to PleegWat
    tmepList7
    

    You sure about that?

  • (disco) in reply to Jaloopa

    You must work for one of our clients

  • (disco) in reply to giammin
    giammin:
    can't believe this is real...

    I rewrote much worse than this when I was starting out in independent contracting in the early 2000's. LOTS of VB6 and VBA that wasn't even up to this level. Some of those were fodder for the early days of TDWTF, when it was still an MSFN blog! (and MSFN was weblogs.asp.net for that matter.)

  • (disco) in reply to Vault_Dweller

    Mental self defence, I guess - multiple arguments have been brought in suggesting someone actually typed this garbage manually.

  • (disco)

    I woke up unhappy about the state of things. Now I have a seething hatred for this clumsy world.

  • (disco)

    I still maintain a project that parses HTML with regexes. It just scrapes a few bits of text so it's not that bad. CSS selector syntax like with jQuery would be a bit easier to use though.

  • (disco) in reply to Fox

    Congratulations! You are one of the Ten Thousand!!!! Ignore anyone who bitches and moans that it's new to you.

  • (disco) in reply to hifi

    Just last week I had to extract some attributes from a collection of HTML markup documents. My initial reaction was to use a regex because that's the quickest solution, right? In this case I would not have cared about the two problems and all that. The markup was somewhat predictable.

    It turned out that the regex solution was harder. After failing for half an hour to get the regexes working, I threw in a DOM parser and was done in ten minutes. The reason I didn't try that in the first places was that I was worried about it rejecting some documents for malformedness. Yet the tolerant parser didn't fail for any document. As a bonus it will be much more robust.

    I realize that most of my reservations about using proper parsers is that they were too strict. Also early DOM interfaces wanted to look like a C interface which made them very cumbersome touse. (That part has not been true for ten years now, but it poisons my experience.)

    hifi:
    CSS selector syntax like with jQuery would be a bit easier to use though.

    That's what xpath is for. I know xpath syntax is dense like a regex, but at the same time it's more powerful than a CSS selector.

  • (disco) in reply to WernerCD

    I'm another of the 10,000. And, I threw up in my mouth, a little.

  • (disco) in reply to gleemonk
    gleemonk:
    hifi:
    CSS selector syntax like with jQuery would be a bit easier to use though.

    That's what xpath is for. I know xpath syntax is dense like a regex, but at the same time it's more powerful than a CSS selector.

    We used to have a rudimentary xpath parser implemented in a regex-based ruling engine. I'm glad we're rid of it, though the xpath library has its own problems.

  • (disco)

    The three hardest things in computer software development are naming things.

  • (disco)

    https://texaslynn.files.wordpress.com/2014/02/concept-welcome-to-my-world.jpg

  • (disco) in reply to Vault_Dweller

    Wait, so @accalia is the one who committed this monstrosity?

  • (disco)

    Man, that thing is like fractally bad. The more I look at it, the worse it seems. Right off the bat, I had a feeling that it could be replaced by a couple dozen lines in any decent XML parser. Then when I started looking, I also noticed that I can't tell what it's supposed to be doing. Where the output of this thing? I can't see... wait a second, is that... yes, yes they really are. The only output appears to be messing with the XML in the StringBuilder via StringBuilder.Replace. And that was when I noticed to GOTOs. Yup, there are liberally sprinkled GOTOs. I think I better stop trying to read that thing before I really lose my sanity.

  • (disco) in reply to PleegWat
    PleegWat:
    We used to have a rudimentary xpath parser implemented in a regex-based ruling engine.

    DOES NOT COMPUTE

    PleegWat:
    I'm glad we're rid of it

    EMPATHY

    PleegWat:
    though the xpath library has its own problems.

    I wouldn't want to implement xpath myself. After years of occasional use of xpath with various libraries I'm still not sure whether differing results are my incomplete understanding of the standard or incomplete implementation of it. I've always decided to reword the query until results matched my expectation :smiley:

  • (disco) in reply to gleemonk

    Back when that xpath parser was implemented, there had to be an absolute cap on parsing state, including buffered document contents. You cannot implement full xpath under those conditions, and indeed our current parser requires building the full XML/DOM tree before it'll start applying the xpath expression.

    The old parser only accepted patterns of the form /foo/bar/baz, but it handled them on an input stream with less than 100 bytes of parsing state.

  • (disco) in reply to ufmace
    ufmace:
    And that was when I noticed to GOTOs. Yup, there are liberally sprinkled GOTOs

    Holy crap, I didn't even notice that before!

    [image]
  • (disco) in reply to rc4

    regex abuse, like all abuse, is only fun when it's consensual :stuck_out_tongue_winking_eye:

  • (disco) in reply to WernerCD

    If this is also your first time seeing that particular strip, then you're doubly lucky. That would make you one of today's 0.318, or something like that.

  • (disco) in reply to DCRoss
    DCRoss:
    seeing that particular strip

    :giggity:. Also, what is a particular? Can we just noun a word like that?

  • (disco) in reply to Tsaukpaetra

    *insert appropriate xkcd comic here*

  • (disco) in reply to aliceif
    aliceif:
    Oh and: http://blog.codinghorror.com/parsing-html-the-cthulhu-way/
    Is it time to mention how the bbhtmarkcode parser used in pissforce handles HTML?
  • (disco) in reply to ufmace
    ufmace:
    before I really lose my sanity

    YMBNH. I'd like to know how you can believe you had any left long before your arrival at TDWTF, though.

    gohere: // stuff goto gohere;

    Looks like a clumsy workaround for the lack of ComeFrom.

    And here we have entry 3+7i

    Yay for Gaussian indices!


    And that predecessor must be a regex newbie - (s)he uses the string methods IndexOf and Replace instead of using the far more appropriate RegEx methods, and I didn't find any groups in the pattern, not even simple ones.

    Without that, and maybe Lookahead and Lookbehind, how could this code ever hope to get as a positive example into a book about code maintainability?

Leave a comment on “Shadow Over XML”

Log In or post as a guest

Replying to comment #:

« Return to Article