• (cs) in reply to Cbuttius
    Cbuttius:
    Tab is commonly used which is ok except that Microsoft Word seems to be the only text editor that allows you to find-replace using it
    Wait what?!?
    Cbuttius:
    Microsoft Word seems to be the only text editor
    OK, you clearly don't know what a text editor is!

    (Hint: Notepad is not a text editor; as the name suggests, it's a pad for your notes; also Wordpad is not a text editor).

    Just three examples of real text-editors: Ultraedit; Programmer's file editor; Notepad++.

    A google search for "Windows text editor" will come up with lots and lots of alternatives.

  • Hogan (unregistered) in reply to rss is broken
    rss is broken:
    article breaks rss
    Or RSS breaks article.
  • Captcha:vereor (unregistered) in reply to Nagesh
    Nagesh:
    Also, at a more principled level, UTF-8 has no business deciding that any codepoints Unicode defines are unworthy of being encoded. Its mandate is to be able to express any imaginable sequence of codepoints, and it takes that reasonably seriously.
    Well, not UTF-8 then, just Unicode. Whatever.
    Also also, it's very easy to think up scenarios where it could open a pretty serious security problem if UTF-8 decoders began producing spaces from unwanted byte combinations, or even removing them silently. Imagine someone thinking themselves secure because they have verified that the UTF-8 input doesn't contain ".." anywhere and then later convert the checked string to UTF-16, silently removing the spurious bytes in ".\x01./.\0x01./.\x01./etc/passwd"?

    Oh, but that happens all the time today. You can trick such filters with percent-encoding, or other tricks. The solution is of course to actually test after it has been converted.

    HIBT?
    Probably not.
  • (cs)

    Surely TRWTF is WCF.

  • big picture thinker (unregistered) in reply to PleegWat
    PleegWat:
    I wouldn't expect a chr() character in C#, but I'd expect "\a" or "\009" works?

    There is a Chr() in VB.NET but there it does not exist in C# (also .NET).

    In C#, char is a 2-byte integer that represents a UTF-16 Unicode codepoint.

    To to use a hexadecimal escape sequence, you need to put the u:

    "For whom the \u0007 tolls"

    Or cast the codepoint to a char:

    char myBel = (char)7 string myString = String.Format("For whom the {0} tolls", myBel);

  • Randy Snicker (unregistered)
    UPDATE: Google Feedburner, our RSS feed host, apparently doesn't like BEL characters, so I removed it from the article in hopes that it will fix the broken feed.

    Wait? They worked at Google?

  • A Gould (unregistered) in reply to Gurth
    Gurth:
    The BEL character was originally used to cause an audible beep or buzz on terminals
    No, it was originally used to ring the physical bell on a teletype. That is, a low-tech version of this.

    Ack, I suddenly feel old for being around when you used BEL to get the attention of the Sysop (or for them to get your attention). (Post-physical bell, but we got the point).

  • spstanley (unregistered) in reply to Rootbeer
    Rootbeer:
    The Real What TF is that they misused BEL as a delimiter when there's already an ASCII Unit Separator non-printable control character (0x1F) that fits the purpose exactly, right?
    Now I want to use Emoji as delimiters.
  • spstanley (unregistered) in reply to A Gould
    A Gould:
    Gurth:
    The BEL character was originally used to cause an audible beep or buzz on terminals
    No, it was originally used to ring the physical bell on a teletype. That is, a low-tech version of this.

    Ack, I suddenly feel old for being around when you used BEL to get the attention of the Sysop (or for them to get your attention). (Post-physical bell, but we got the point).

    WHADDAYA MEAN "POST"? I'M USING A TTY 33!
  • Luiz Felipe (unregistered) in reply to Cbuttius
    Cbuttius:
    Aargle Zymurgy:
    Wow... this many comments and not a peep about using CSV?

    I did earlier you just weren't reading properly.

    comma really is a very poor choice of delimiter.

    There are altneratives that are humanly readable and typeable but rarely used in data, e.g. ` (very infrequently required) and | (usually infrequent).

    Tab is commonly used which is ok except that Microsoft Word seems to be the only text editor that allows you to find-replace using it (you put in ^t for it), which often leads me to copy-pasting text into blank Word documents just to "process" it into tab-separated before copying it back (to Excel or wherever).

    ç is a good delimiter character, no one uses it. yes, my native language has that abortion of character, i cleanse all my text of it before serialization

  • KMag (unregistered) in reply to qbolec
    qbolec:
    We have our own home-grown NoSQL database at nk, and it uses spaces for separation which provides even more fun with escaping and unescaping.

    We also use our own scheme of serialization of table rows for cacheing which is uses \xFF as separator, which is quite unlikely to happen anywhere in UTF-8 strings or numbers (and most of coulmns are of either of these types), but in case it happens we escape it by doubling (\xFF becomes \xFF\xFF).

    We thought for a minute about using \0 which also doesn't happen inside regular strings, but that would be problematic for almost every part of the stack.

    I've never heard about unit separator ASCII character but sounds elegant, if it doesn't occur in UTF-8 strings to often.

    I've heard about guys who were implementing some MMORPG internet protocol which chosen 37 (or some other odd byte) as a separator, which was found by experiment to be the least probable byte value. It was like 12 years ago, but I am still wondering WTF was wrong with them, their priorities, their architecture, their stats analyzis, and their data distribution.

    No no no. Prefix (or postfix) the individual elements with a variable-length encoded integer. That way (1) you don't need to scan in order to parse and (2) no quoting is necessary. In-band signaling was a bad idea before phreakers figured out how to break it. Magic bytes aren't even quant or retro at this point, they're just broken. If you need a particular lexical order, then use 0x00 or 0xFF as delimiters and append multiple variable-length integers to the very end of your string, so you still don't need to quote magic bytes, or scan every byte in order to parse your data structure. This is faster and more robust than using delimiters.

  • AN AMAZING CODER (unregistered) in reply to Captcha:ideo
    Captcha:ideo:
    ASCII is the perfect example of a standard that was designed with lots of features that are just unnecessary today (though I suppose they were used back then). The other example is HTTP (PUT, DELETE, OPTIONS, PATCH?).

    Can't tell if serious or troll

Leave a comment on “For Whom the BEL Tolls”

Log In or post as a guest

Replying to comment #:

« Return to Article