• (disco) in reply to Jerome_Grimbert

    And once restore works, test it regularly.

    $PARENT is not computer engineer but knew she needed backups, contracted someone to set them up, got instruction sheet, followed instruction sheet religiously, all was well. Several years later some other contractor upgraded something, the tape backup continued to work through the tapes without any errors at all, but it was only writing zeroes... then some third contractor installed a new CD-ROM reader (yes this was some time ago) and managed to overwrite the hard disk . . . it cost several months' salary to get the data back.

  • (disco) in reply to rc4
    rc4:
    Because different countries have different "standards"?

    I don't see why these "standards" should affect one's data format. CSV files are made for computers, and computers don't have nationalities.

  • (disco) in reply to Spectre

    CSV files also contain languages, which differ in how various things are represented. Can you imagine trying to represent european currency values with comma delimiters?

  • (disco) in reply to rc4
    rc4:
    Can you imagine trying to represent european currency values with comma delimiters?

    Are you talking about currency values that are a part of arbitrary text in a natural language, or standalone currency values? In the former case, you'd have a more-or-less the same amount of pain with culture-dependent delimiters, because arbitrary text can contain any delimiter. In the latter case, it's just a number, and whether it's European or a currency value is irrelevant. You should just encode it in a culture-independent way.

  • (disco) in reply to Spectre

    Should and what actually happens are different. Thus, different delimiters. Why don't we all use comma dot instead of dot comma in currency? Why don't we all speak the same language? You are failing to account for the fact that people can be different.

  • (disco) in reply to Protoman
    Protoman:
    How about ASCII 0x1E (record separator) or 0x1F (unit separator)?

    I was going to suggest those myself, along with their fellows:

    Dec Hex Acronym Symbol Name Usage
    25 0x19 EM End of Medium Intended as means of indicating on paper or magnetic tapes that the end of the usable portion of the tape had been reached. Not needed, but may be useful in the case of files with multiple ␜.
    26 0x1A SUB Substitute Need to keep a particular symbol (or set of symbols) from appearing in data? Use substitute to mark their replacement character(s)!*
    27 0x1B ESC Escape What better to start escape sequences with than the actual, official, "Escape"?*
    28 0x1C FS File Separator End of file. Or between a concatenation of what might otherwise be separate files.
    29 0x1D GS Group Separator Between sections of data. Not needed in simple data files.
    30 0x1E RS Record Separator End of a record or row.
    31 0x1F US Unit Separator Between fields of a record, or members of a row.
    32 0x20 SP Word SeparatorSpace Between words of a field.

    *: Not necessarily official usage semantics. (Mis)Use at your own risk.

    Unicode also has a line separator (U+2028) and paragraph separator (U+2029) so one could even embed multi-line and multi-paragraph fields in a record in a platform-independent way! I might be somewhat unserious here.

    Protoman:
    Everyone loves unprintable control characters in their text files, right?

    No reason for them to be unprintable. For one, they have official symbols (see above), and for another an editor could show them specially (e.g., colored barriers between sections).

    rc4:
    You are failing to account for the fact that people can be different.

    Maybe out there. Around here, we're all just @boomzilla.

  • (disco) in reply to Dreikin

    Well, except for Fox.

  • (disco) in reply to Dreikin
    Dreikin:
    I was going to suggest those myself, along with their fellows:
       Dec     Hex     Acronym     Symbol     Name     Usage              25     0x19     EM     ␙     End of Medium     Intended as means of indicating on paper or magnetic tapes that the end of the usable portion of the tape had been reached.  Not needed, but may be useful in the case of files with multiple ␜.           26     0x1A     SUB     ␚     Substitute     Need to keep a particular symbol (or set of symbols) from appearing in data? Use substitute to mark their replacement character(s)!*           27     0x1B     ESC     ␛     Escape     What better to start escape sequences with than the actual, official, "Escape"?*           28     0x1C     FS     ␜     File Separator     End of file. Or between a concatenation of what might otherwise be separate files.           29     0x1D     GS     ␝     Group Separator     Between sections of data. Not needed in simple data files.           30     0x1E     RS     ␞     Record Separator     End of a record or row.           31     0x1F     US     ␟     Unit Separator     Between fields of a record, or members of a row.           32     0x20     SP     ␠     
    

    Word SeparatorSpace Between words of a field.

    No no no no no no no!

    FS doesn't stand for "File Separator". Since time in a memorial, it has stood for "Field Separator", which makes it the only reasonable choice to delimit values within a single data record.

  • (disco) in reply to da_Doctah
    da_Doctah:
    FS doesn't stand for "File Separator". Since time in a memorial, it has stood for "Field Separator", which makes it the only reasonable choice to delimit values within a single data record.

    Blame ASCII ("most recent update during 1986"). "Page Separator" might've been a better choice, but "Unit Separator" and "File Separator" isn't bad.

  • (disco) in reply to chubertdev
    chubertdev:
    Hong Kong

    I don't know we're a country now.

    And sad to see Macau is MIA. (Both places are of equal status in China)

  • (disco) in reply to dkf
    dkf:
    Yazeran:
    I usually prefer ; as it is rare that that could show up in data other than text.

    If you've got control over what is being generated, tabs are a really good choice

    It pisses me off that ASCII defines four perfectly good control codes specifically for delimiting structured text and nobody ever uses them.

  • (disco) in reply to da_Doctah
    da_Doctah:
    Since time in a memorial, it has stood for "Field Separator"

    Never seen that. Eggcorn suspected. Cite required.

  • (disco) in reply to chubertdev

    You missed South Africa

  • (disco) in reply to rc4

    It's Wikipedia... you're supposed to use a little bit of skepticism. With or without any banners about content quality! :smiley:

  • (disco) in reply to Jerome_Grimbert

    Yes! You're not even supposed to have a backup strategy... you're supposed to have a restore strategy!

  • (disco) in reply to Protoman
    Protoman:
    I've never used Apple Numbers, but I suspect if you tried launching it from the console after setting the environment variable `LC_ALL` to something like `en_US.utf-8`, it would probably work (and just `LC_NUMERIC` may also be sufficient, too).
    I suspect not, because according to printenv, neither of those variables are defined in OS X. Even if they were, I doubt many native applications refer to them.

    It’s a WTF in any case: the program assuming that because formulas in the spreadsheet don’t use , as a separator, neither do files with comma-separated values. Far better would be to pop up a dialog before opening the file to ask what separator should be used to parse the file, or provide some other (simple) method of selecting that.

    rc4:
    Why don't we all speak the same language?
    At least that is explained by history ;)
  • (disco) in reply to Yazeran
    Yazeran:
    Especially since some country-settings specify to use , as the decimal point (we do in Denmark) causing all manner of fun when some rows contain floating point values

    The most fun is when the app gets confused and just uses a comma as both the separator and the decimal point. Have fun untangling that mess!

  • (disco) in reply to Maciejasjmj

    Simple solution. Just replace the delimiter comma by c0cb5f0fcf239ab3d9c1fcd31fff1efc and you are done.

  • (disco) in reply to flabdablet

    you mean those around 30?

  • (disco) in reply to Yazeran
    Yazeran:
    Actually the worst is that people use , blindly as a separator.

    I usually prefer ; as it is rare that that could show up in data other than text.

    Especially since some country-settings specify to use , as the decimal point (we do in Denmark) causing all manner of fun when some rows contain floating point values (other places it is used as a 1000 - separator which also can result in hilarity).....

    Only last year I was using a new extension of our flagship product that had i18n added to it, where the various language variants were handled in a home-rolled csv parser. No escaping for commas, so if you happened to have a string with a comma in, it interpreted the comma as a separator.

  • (disco) in reply to Gurth
    Gurth:
    Far better would be to pop up a dialog before opening the file to ask what separator should be used to parse the file, or provide some other (simple) method of selecting that.

    You mean like how LibreOffice does it? (I'm ignoring its ancestors on purpose)

  • (disco) in reply to accalia
    accalia:
    the kraw and the lac....or however you spell them....
    Conventionally, crore and lakh in the Latin alphabet.

    Or करोड़ and लाख in Hindi.

  • (disco) in reply to Steve_The_Cynic
    Steve_The_Cynic:
    Conventionally, crore and lakh in the Latin alphabet.

    hmm....

    completely off base for one, but surprisingly close for the other...

    still no wonder google was no help in getting the correct spellings given how far off i was on the first one.

  • (disco) in reply to rc4
    rc4:
    You are failing to account for the fact that people can be different.

    You are failing to account for the fact that computers are not people. If you represent data in a culture-neutral format, it's easier to write, easier to read, and can be transferred between users with different locale settings.

    If you're saying that some systems already use culture-dependent CSV files and you may have to be compatible with those - yeah, sure, that's an argument. But it's not The Right Thing, and you should not do it in the absense of compatibility constraints.

  • (disco) in reply to Spectre

    This. Don't serialize in a locale-dependent way. Otherwise you may as well ignore endianness in your binary files and network streams...

  • (disco) in reply to Dlareg
    Dlareg:
    you mean those around 30?

    31 = 0x1F = ctrl-_ = US = Unit Separator: what should be used instead of tabs, commas, pipes or *ing asterisks 30 = 0x1E = ctrl-^ = RS = Record Separator: what should be used instead of CR, CRLF or LF 29 = 0x1D = ctrl-] = GS = Group Separator: for delimiting groups of records within a file 28 = 0x1C = ctrl-\ = FS = File Separator: for delimiting files within an archive or stream (CP/M and DOS should have used this instead of ctrl-Z = SUB, and Unix should have used it instead of ctrl-D = EOT, for marking EOF for data entered from the keyboard)

  • (disco) in reply to Spectre

    You somehow fail to understand what I'm saying in one paragraph and then understand it in the next, but dismiss it immediately afterwards because it's :doing_it_wrong:. Okay, Jeff.

  • (disco)

    So, I'm an Active Directory Engineer and my name is Tommy and yeah, this is about me. I like all of your arguments, but here's a good solution.

    http://blogs.technet.com/b/activedirectoryua/archive/2015/01/19/ad-magic-restore-script-published-on-codeplex.aspx

    or

    http://blogs.technet.com/b/ashleymcglone/archive/2014/04/24/oh-snap-active-directory-attribute-recovery-with-powershell.aspx

    Just my two cents.

  • (disco) in reply to Maciejasjmj

    Oh yea, I remember having an issue like that some 15 years ago when I tried to import some data (might have been into origin) and had some WTF moments when the data looked all weird until I noticed all those big integers in the raw data tables.....

    I think i ended with having to do a series of search-replace on the data prior to import before i got it to work.....

  • (disco) in reply to Dlareg
    Dlareg:
    Simple solution. Just replace the delimiter comma by ```c0cb5f0fcf239ab3d9c1fcd31fff1efc``` and you are done.

    Your solution is to Discourse the separator value? *

  • (disco) in reply to flabdablet

    so yes those around thirty

    hungrier:
    Your solution is to Discourse the separator value? *dae361af79b04c9c8e7057f60cc6**
    that was the Bad Idea
  • (disco) in reply to Gurth
    Gurth:
    I suspect not, because according to printenv, neither of those variables are defined in OS X. Even if they were, I doubt many native applications refer to them.

    No, they're not, but they fall back onto to $LANG by default when unset, see locale(1), which should be your system locale setting (mine is en_US.UTF-8). Most applications don't directly refer to the locale variables, but a number of functions in the C runtime library do, like printf.

    Gurth:
    It’s a WTF in any case
    Completely agree.
  • (disco) in reply to Chesspiece_Face

    As someone who does BASH/Unix and PowerShell scripting regularly this looks to me like a Unix guy trying to program PowerShell without learning how it works first. The problem here is not PowerShell, the problem is whoever wrote that mess.

    Agreed. I was also immediately suspicious when the article claimed PowerShell was "executed poorly". I have my complaints about PowerShell and the PowerShell community, but if you think it was poorly executed you're just making yourself look like a fool. This script author clearly doesn't understand what an object is.

    The difference between Unix and Windows is: in Unix, everything is a character string, in Windows, everything is an object. That's it. If you can grok what that means, you're 95% of the way to understanding how the two systems work. The biggest problem Unix people have when they work in Windows is to try to make everything into a character string, and then they blame the OS when that doesn't work. If you think Windows administration involves installing Cygwin, you're doing it wrong.

  • (disco) in reply to Spectre

    Blaming PowerShell because Active Directory and LDAP property names are both numerous and long is hardly fair.

  • (disco)

    The RWTF is people who somehow transition into adulthood without coming to the realization that Animaniacs sucked.

  • (disco) in reply to Zemm

    I guess that makes me a non-adult [spoiler]as I've never seen a single second of that show.[/spoiler]

  • (disco) in reply to Zylon
    Zylon:
    The RWTF is people who somehow transition into adulthood without coming to the realization that Animaniacs sucked.
    [image]
  • (disco) in reply to antiquarian
    antiquarian:
    Zylon:
    The RWTF is people who somehow transition into adulthood without coming to the realization that Animaniacs sucked.
    [image]

    i always add a second 't' into that and giggle......

  • (disco) in reply to accalia
  • (disco) in reply to antiquarian

    that's the bunny

  • (disco) in reply to dkf
    dkf:
    tabs are a really good choice. They virtually never turn up in structured data or user input.
    Ah, if only that were the case. Had to modify a couple of reports recently to explicitly remove tabs from selected fields, because someone had entered data with a tab included. (The really evil one was the one with a trailing tab on the last field of the record - you're not going to find that from a visual inspection.)

    Still, tabs are often one of the best available choices and I generally do prefer them.

  • (disco) in reply to Scarlet_Manuka
    Scarlet_Manuka:
    someone had found a field that wasn't sanitized and entered data with a tab included.
    FTFF
  • (disco) in reply to Tsaukpaetra

    Well, I don't know the specifications of that field offhand, not being an Oracle developer, but it's very possible that they decided to allow the user to put in whatever they like and that it works perfectly well within the application. (No doubt it's equally possible that they just didn't bother thinking about it.) It's just that when we then include that field in a tab-delimited report, parsing the output becomes harder.

  • (disco) in reply to Tsaukpaetra

    Most fields don't even allow you to enter a tab as data... normally pressing tab just shifts the focus to the next element in the tab index.

  • (disco) in reply to anotherusername
    anotherusername:
    don't even allow you to use the keyboard's Tab key toenter a tab as data

    PTFY. Many text-entry controls that allow pasting, especially those that allow multi-line, allow you to paste in a Tab character from the clipboard.

  • (disco) in reply to Scarlet_Manuka
    Scarlet_Manuka:
    Ah, if only that were the case. Had to modify a couple of reports recently to explicitly remove tabs from selected fields, because someone had entered data with a tab included.

    You're giving your users text fields? Are you mad?

    ;)

  • (disco) in reply to flabdablet
    flabdablet:
    ctrl-\

    On Unix, that's short for "crash the program".

  • (disco) in reply to accalia

    There already are two Ts in that.

  • (disco) in reply to ben_lubar
    ben_lubar:
    flabdablet:
    ctrl-\

    On Unix, that's short for "crash the program".

    That's because Ctrl+Letter on a terminal doesn't code for a character by default. It codes for a signal. In the case of Ctrl+\, I believe the signal is SIGABRT, which generates a coredump, though I'm not sure of that or what in general the mapping is between letters and signals. I do know Ctrl+V is escape, so Ctrl+V, Ctrl+\ will input FS.

  • (disco) in reply to PleegWat
    PleegWat:
    I believe the signal is SIGABRT

    It's actually SIGQUIT.

Leave a comment on “Good Idea, Bad Idea”

Log In or post as a guest

Replying to comment #:

« Return to Article