• Richard@Home (unregistered) in reply to Noam Samuel

    'nuff said.

  • Paul Bowman (unregistered) in reply to WTF Batman

    I can see the requirement now: Put all our flatfiles into a database....

     

    so they did!

  • maht (unregistered) in reply to Casiotone
    Casiotone:
    Anonymous:
    Anonymous:
    ... then XML is probably the better choice.


    may I ask which other schemes you considered before making this decision ?

    CSV is not a "standard" anywhere so it is a poor choice for unambiguous data transmission.

    in XML

    <row><cell>cell1</cell></row>
    and
    <row>
        <cell>
              cell1
        </cell>
    </row>

    are *not* the same

    I hoped that worked out, XML makes it hard to post examples, another reason to stand idly by not calling 911 when it looks like it might die !!



    Am I missing something here or are the two examples exactly the same in XML?


    In the first example thw row has one child - the cell
    In the second example, the row has 3 children - 1 cell & 2 text nodes - one contains "\n\t" before the cell and the other the "\n" after the cell

    Same goes for the cell, the text inside the cell is "\n\t\tcell1\n\t" in the second example not "cell" as expected

    Process it with SAX and you'll see the fun you can have.



  • wang (unregistered) in reply to maht

    Actually this is rather sensible on the whole (without getting into column names, tables names or the width limit on a row).

    As someone mentioned this is almost certainly a staging table for a large, brutal etl process.  You know, the ones where the format file doesn't match the data, and the data types don't match ( Column should contain values 1-5, actually also contains Y,N, T and F).  Dates in a wide variety of unformats.  General garbage.

    What you do is create a simple staging table, load it in (that way the shat data doesn't break a 6 hour load 5 hours in) and happily manipulate what you want in sql - extracting the data into the sensible data structure.  Its quite normal really, if you are into that kind of thing.

  • Ron Pakston (unregistered) in reply to temp

    <FONT face=Garamond>--Minor quibble. The CSV format does allow for comma seperated data within a field. That field, however, must be surrounded by double quotes. Double quotes inside of such a field must be escaped using another double quotes character. </FONT>

     

    Dude, you need a life. Imagine quibbling over something as pathetic as this. Its people like you that offer solutions to the wtfs. You are so anal !!!!!

  • Ant (unregistered)

    I've actually seen tables laid out a bit like this before. Not as the main data storage, but as a staging area.<o:p></o:p>

    <o:p> </o:p>The idea was to import a CSV data file by reading it in one line of the file -> one row in the unstructured table. The data was then massaged into its final destination by stored procedures – MS SQL server, version 7 I think it was. This is actually not such a terribly bad way of doing it if you know your SQL well. Tables with this kind of layout have their uses when named, say "CSVImportTemp". But please, not as the final data storage place. Nobody would do anything that awful. Right?

  • foo (unregistered)

    It is truly an exciting time to be alive.

  • (cs) in reply to maht
    Anonymous:
    Casiotone:
    Anonymous:
    Anonymous:
    ... then XML is probably the better choice.


    may I ask which other schemes you considered before making this decision ?

    CSV is not a "standard" anywhere so it is a poor choice for unambiguous data transmission.

    in XML

    <row><cell>cell1</cell></row>
    and
    <row>
        <cell>
              cell1
        </cell>
    </row>

    are *not* the same

    I hoped that worked out, XML makes it hard to post examples, another reason to stand idly by not calling 911 when it looks like it might die !!



    Am I missing something here or are the two examples exactly the same in XML?


    In the first example thw row has one child - the cell
    In the second example, the row has 3 children - 1 cell & 2 text nodes - one contains "\n\t" before the cell and the other the "\n" after the cell

    Same goes for the cell, the text inside the cell is "\n\t\tcell1\n\t" in the second example not "cell" as expected

    Process it with SAX and you'll see the fun you can have.





    Ahh I was mixing up the job of an XML processor and a HTML user agent, oops. You're right :)
  • (cs)

    F'ing Sweet Baby! This is definately thinking out of the box, it is totally Object Oriented since any kind of object can appear anywhere, and so totally Enterprise since there is nothing tying down that data to a server. So brilliant it makes me cry in envy. Where I can contact this maestro - so that I may sit at the feet of this master and learn ?!?

  • (cs) in reply to Moo

    "I store data on the conference room projector..." <-- That was a good one.

  • RobLyman (unregistered) in reply to Casiotone

    I really thought this was a complete WTF until I remember a project I did a year or so ago. We got a CSV export from another system and had to report on it and also place it into an Excel spreadsheet available on a website. The database I imported it into had very similar columns(ie all varchar 4000) but at least I named the columns, or tried to, so they indicated their content. In reality, I really didn't care WHAT was in the columns. Most were" XYZIndicatorCode" or something meaningless like that. I treated it all as text fields because, frankly, the data was all crap anyway. I then spit it back out again on a webpage. I must also mention, in my case(and maybe in this WTF 's case), the person at the other end, providing the flat file, was VERY incooperative.

    Had I actually had to apply business logic to this data then I would have validated it as I imported it, producing error reports, etc...So, flame me if you will, but I could ALMOST see myself doing something like this if I was in a hurry and importing a meaningless flat file into a database for reporting.

     

    ha ha...captcha = bozo

  • :-O (unregistered) in reply to masidani
    masidani:
    Isn't this just a flat text file with some transaction management bolted on?



    I doubt that the person who came up with this design knows anything about transaction management.
  • (cs) in reply to :-O

    I feel pretty certain that this design is because they swapped the old csv files they used as data storage to a database and didn't want to rewrite the app more than they had to. This way it should be possible to only change how they interface the data storage and leave the rest of the logic intact. Maybe the application were OK for them except performance problems so they swapped to a real DB instead. Even if this design have built in performance problems when you hit a few 1000 rows I would expect, it's still better than Excel. You can put the DB on a seperate server and throw hardware at any performance problem they get.

  • Christian (unregistered)

    The real WTF is that I have to turn on JavaScript to go to the next WTF page.

    WTF...

    Christian

  • Danix Defcon 5 (unregistered) in reply to WTF Batman
    WTF Batman:
    Anonymous:

    Name the popular RDBMS this *sample* code is in the documentation of :

    SELECT f1[1][-2][3] as e1, f1[1][-1][5] AS e2 FROM (SELECT '[1:1][-2:-1][3:5]={{{1,2,3},{4,5,6}}}'::int[] AS f1) AS ss;

    e1 | e2
    ----+----
    1   |  6
    (1 row)



    Hmm. That reminds me of PostgreSQL.



    factura_siana=# SELECT f1[1][-2][3] as e1, f1[1][-1][5] AS e2 FROM (SELECT '[1:1][-2:-1][3:5]={{{1,2,3},{4,5,6}}}'::int[] AS f1) AS ss;
    ERROR: missing dimension value


    Not Postgresql at least I tried. Still, I fear this in documentation as much as I feared the smart-ass commentary by the MySQL folks "wee dont need no stinkin transactions. MyISAM will NEVAH support transactions!" (Wonder why InnoDB is so popular now???)
  • Kiss me, I'm Polish (unregistered) in reply to Ron Pakston
    Anonymous:

    <font face="Garamond">--Minor quibble. The CSV format does allow for comma seperated data within a field. That field, however, must be surrounded by double quotes. Double quotes inside of such a field must be escaped using another double quotes character. </font>

     

    Dude, you need a life. Imagine quibbling over something as pathetic as this. Its people like you that offer solutions to the wtfs. You are so anal !!!!!


    We have the next TDWTF's author. Actually, anything starting with "Dude, " and ending with a number of exclamation points not equal to 1 or 3 should be dealt with using euthanasia.
    And castration.

  • Imagine the resume (unregistered) in reply to Kiss me, I'm Polish

    I should add to my CV

    5 years experience in "Transactional MS Excel datastorage with failover and clustering (multiple sheets)"
    4 years experience in "Implementing highly fault tolerant CSV multi-terrabyte datastorage systems"

  • Kiss me, I'm Polish (unregistered) in reply to Imagine the resume
    Anonymous:
    I should add to my CV

    5 years experience in "Transactional MS Excel datastorage with failover and clustering (multiple sheets)"
    4 years experience in "Implementing highly fault tolerant CSV multi-terrabyte datastorage systems"
    That's terabyte, not terrabyte, Earthling.
  • Michael (unregistered) in reply to analysis

    You laugh?  I recently worked with a vendor who was requested to provide updates from their system via an XML file.  It needed to contain information for various aspects of an design element.  Guess what format we got?  Just as you described...

    May favorite was when the repeated name/value pairs instead of using sequences:

    <designs>
        <design>
            <id>12345</id>
            <param>name</param>
            <value>john</value>
        </design>
        <design>
            <id>12345</id>
            <param>phone</param>
            <value>555-1212</value>
        </design>
        <design>
            <id>12345</id>
            <param>address</param>
            <value>123 main street</value>
        </design>
        <design>
            <id>12345</id>
            <param>Last Sold Date|Quantity|Price 1</param>
            <value>03/01/05|34|12.2</value>
        </design>
        <design>
            <id>12345</id>
            <param>Last Sold Date|Quantity|Price 2</param>
            <value>03/17/04|5|2.34</value>
        </design>
    </designs>

    Too add to the WTF, turns out this was the design of their Access database which was used as the backend for this system...

  • wgc (unregistered) in reply to Michael

    I'd vote for budget cuts on data cleanup.  Some PHB decreed the project must use the old data floating around in CSV, spreadsheets, etc, then some other PHB looked at the time on the schedule for data cleanup and said: "look at how much  we could save if we didn't do that".

  • (cs) in reply to bullseye


    How about another approach, we create the "WTF Employee of the Month!"
    Each month we nominate three of the biggest WTFs. Voting is handled by those with login IDs.
    Maybe we won't know their real names but it would be fun to come up with a nuckname for the 'winning' coder and we can have a contest for who can come up with an appropriate "portrait" to go along with the name.

    For the First Entry and Hall of Fame Nominee I give you: Paula Brillant
    <insert cheering crowd noises here>
    I have some pictures from my old high school yearbook (1975) that might do great as the portrait.

    unlisted_error

  • (cs) in reply to Gio
    Anonymous:
    > Don't be rediculous.  You're not supposed to store CSV in the fields...  That's supposed to be XML data.

    XML data? Why limit yourself to XML data? Store serialized java object in the database!
    Think of a database where you can store a full hashtable in a single field... wouldn't it be a dream?

    (I've actually seen it done - in an enterprise web application of course)


    Oh god, I once got handed a CMS for a website that consisted of db full of serialised php classes for each of the posts, users, etc.
    The pain, the pain.
  • meh (unregistered) in reply to Kiss me, I'm Polish
    Anonymous:
    Anonymous:
    I should add to my CV

    5 years experience in "Transactional MS Excel datastorage with failover and clustering (multiple sheets)"
    4 years experience in "Implementing highly fault tolerant CSV multi-terrabyte datastorage systems"
    That's terabyte, not terrabyte, Earthling.


    Excuse me for not using American bastardised english. Maybe American bastardised english isnt everybodys first language.  Try loading up google earth sometime you may see there is more out there than yankyland.


  • Timbo Jones (unregistered) in reply to sjfsjf
    meh:

    Excuse me for not using American bastardised english.


    Zuh?

    Main entry: tera-
    Function: combining form
    Etymology: International Scientific Vocabulary, from Greek terat-, teras monster
    : trillion (terawatt)

    Settle down, friend.  Since when do SI unit prefixes have anything to do with America?

    Continue to enjoy your bitter day!

  • HAHA (unregistered) in reply to Timbo Jones
    Anonymous:
    meh:

    Excuse me for not using American bastardised english.


    Zuh?

    Main entry: tera-
    Function: combining form
    Etymology: International Scientific Vocabulary, from Greek terat-, teras monster
    : trillion (terawatt)

    Settle down, friend.  Since when do SI unit prefixes have anything to do with America?

    Continue to enjoy your bitter day!




    Only an asshat would paste a dictionary entry to have a comeback LOL
  • Trevor (unregistered) in reply to temp

    Ah, you beat me to it. 
    I get a CSV datafeed. 
    Lucky for me, they put quotes around the string fields. 
    Unfortunately, they forgot about escaping the quotes. 
    My complaints go unanswered. 
    So... there are always several records I am unable to import.

  • (cs) in reply to HAHA
    Anonymous:
    Anonymous:
    meh:
    Excuse me for not using American bastardised english.


    Zuh?

    Main entry: tera-
    Function: combining form
    Etymology: International Scientific Vocabulary, from Greek terat-, teras monster
    : trillion (terawatt)

    Settle down, friend.  Since when do SI unit prefixes have anything to do with America?

    Continue to enjoy your bitter day!


    Only an asshat would paste a dictionary entry to have a comeback LOL


    No, he posted a cite.  It is not just that he is saying something is so; it is international agreement on units of measure.

    You just whined.

    Sincerely,

    Gene Wirchenko

  • Kiss me, I'm Polish (unregistered) in reply to Gene Wirchenko
    Gene Wirchenko:
    Anonymous:
    Anonymous:
    meh:
    Excuse me for not using American bastardised english.

    Zuh?
    Main entry: tera-
    Function: combining form
    Etymology: International Scientific Vocabulary, from Greek terat-, teras monster
    : trillion (terawatt)
    Settle down, friend.  Since when do SI unit prefixes have anything to do with America?
    Continue to enjoy your bitter day!

    Only an asshat would paste a dictionary entry to have a comeback LOL

    No, he posted a cite.  It is not just that he is saying something is so; it is international agreement on units of measure.
    You just whined.
    Sincerely,
    Gene Wirchenko

    People just won't believe if it doesn't come from a dictionary. And when you get angry and show them one that says you're right, they tell you to get a life.
    Anybody saying LOL should meet YIA, Youth In Asia that is.
  • Kiss me, I'm Polish (unregistered) in reply to Timbo Jones
    Anonymous:

    Excuse me for not using American bastardised english. Maybe American bastardised english is not everybodys first language. Try loading up google earth sometime you may see there is more out there than yankyland.

    That's actually very funny, even my nickname states clearly where I'm from. No, really. That's even not a county in Oklahoma.
  • Petrified Eyes (unregistered)

    char []func1(char var1[], var2[]) { return strcpy(var1[76],var2[76]); }

    int main(int argc, char *argv[]) { char var1[8000][8000]; char var2[8000][8000]; char var3[8000][8000]; for(var1[34][567]='0';var1[34][567]<'9';var1[34][567]++) ; }

    ^o)

  • Sam (unregistered) in reply to Ron Pakston
    Anonymous:

    <FONT face=Garamond>--Minor quibble. The CSV format does allow for comma seperated data within a field. That field, however, must be surrounded by double quotes. Double quotes inside of such a field must be escaped using another double quotes character. </FONT>

     

    Dude, you need a life. Imagine quibbling over something as pathetic as this. Its people like you that offer solutions to the wtfs. You are so anal !!!!!

    Computers tend to disagree.  So do I.  In fact, I've seen this "minor quibble" as a major issue in a lot of CSV data transmissions.

  • Sam (unregistered) in reply to MB

    Anonymous:
    Sure - just as long as sender and receiver agree on line endings, quoted forms, column headers, datatypes, and do some fancy footwork on master detail relationships - CSV will do you just fine. OTOH, if you'd rather the data come with that description, or have a schema you can validate against, or write queries against the data without importing it into a database, etc. - then XML is probably the better choice.

    Actually, I've rarely seen a time when you want all the information included to decode the data transmission included in every single transmission, and typically someone sending you a file will simply give you the format ("agreement" often isn't hard to come to).   XML does help with validation, however, and there are some limited circumstances under which being able to query the data directly from the file can be useful.  Mostly, however, XML use is overblown for things like this.

    However, commas are a crappy way to delimit data.  I've always preferred pipes or fixed-length data.

     

  • (cs) in reply to Kiss me, I'm Polish

    Anonymous:
    Anonymous:

    Excuse me for not using American bastardised english. Maybe American bastardised english is not everybodys first language. Try loading up google earth sometime you may see there is more out there than yankyland.

    That's actually very funny, even my nickname states clearly where I'm from. No, really. That's even not a county in Oklahoma.

    Also, I understand - from various sources - that Americans English is nearer to the 'English English' [:S] that was spoken in England when the pilgrim fathers left for America, and it is 'English English' that has become bastardised.

    Now - whether the difference between American Billion (9 zeroes) and English Billion (12 zeroes) is a result of this is anybody's guess. (Although it would explain the fact that America Has more billionaire per head of poulation [:)] )

    BTW: It would appear that most of the English speaking world now agrees that there are 12 zeroes in a billion - Boy, that Tony Blair will do anything to fool us into thinking that we're better off under his governance [;)]

  • (cs) in reply to belugabob

    Correction...

    BTW: It would appear that most of the English speaking world now agrees that there are 9 zeroes in a billion - Boy, that Tony Blair will do anything to fool us into thinking that we're better off under his governance Wink [;)]

    (Just how quick do you have to be to use the 'edit' facility on this forum?)

  • (cs) in reply to Sam
    Anonymous:

    [However, commas are a crappy way to delimit data.  I've always preferred pipes or fixed-length data.

    Why are commas a crappy way to delimit data?

    This reminds me of a time when someone I worked with chose #. I asked him why, his response was that he thought a # would be less likely thing to use than a comma. Of course if he used a comma I wouldn't have to do anything, it would have been tokenized automatically. But since he used # I had to parse it myself.

  • (cs) in reply to chrismcb
    chrismcb:
    Anonymous:

    [However, commas are a crappy way to delimit data.  I've always preferred pipes or fixed-length data.

    Why are commas a crappy way to delimit data?

    You think commas are bad? Shell commands use spaces to delimit data.

  • John Hensley (unregistered) in reply to Sam
    Anonymous:
    However, commas are a crappy way to delimit data.  I've always preferred pipes or fixed-length data.

    Please send a sample of your code to Alex.

    When a standard has been thought out, tested, and works, you use the standard, dammit. You don't make up your own just to get better feng shui in the encoding.

  • Scatters (unregistered) in reply to RJ

    You mean "more worthless", right.  "Less worthless" would indicate more worth...

  • (cs) in reply to John Hensley

    I remember a few years back there was a forum program that used flatfiles with all sorts of funky delimiters. The forum list used |, thread lists used |^|, threads used ||, the members list used |!!|, and moderator lists used ||^||. I was a regular at one of these forums when someone discovered that it didn't deal with || inside posts properly (it would insert a space between them, but that would still leave two pipes together if you put three in a row), and you could put whatever you wanted in the IP field that came after the post text field. For a while we used the exploit to put funny messages in place of our IPs until the admin fixed the bug.

    This was state-of-the-art back in 2000. :)

  • wccdbah (unregistered) in reply to analysis

    Anonymous:
    The real WTF is why didn't they just go all the way and do...

    create table data (
       table varchar(80) not null,
       column varchar(80) not null,
       id varchar(80) not null,
       data varchar(8000) null
    )

    Example usage to insert customer with id of 100:

    insert into alldata values ('customer' , 'name' , '100' , 'john' )
    insert into alldata values ('customer' , 'phone' , '100' , '555-1212' )
    insert into alldata values ('customer' , 'address' , '100' , '123 main street' )

    Congratulations, you've invented XML (or something slightly more efficient)!

  • (cs) in reply to wccdbah
    Anonymous:

    Anonymous:
    The real WTF is why didn't they just go all the way and do...

    create table data (
       table varchar(80) not null,
       column varchar(80) not null,
       id varchar(80) not null,
       data varchar(8000) null
    )

    Example usage to insert customer with id of 100:

    insert into alldata values ('customer' , 'name' , '100' , 'john' )
    insert into alldata values ('customer' , 'phone' , '100' , '555-1212' )
    insert into alldata values ('customer' , 'address' , '100' , '123 main street' )

    Congratulations, you've invented XML (or something slightly more efficient)!

    It is called "Entity Attribute Value", usually just abbreviated "EAV", and it is horrid.  I will not challenge your remark that it is slightly more efficient than XML.

    Sincerely,

    Gene Wirchenko

  • (cs) in reply to Danix Defcon 5

    Anonymous:
    WTF Batman:
    Anonymous:

    Name the popular RDBMS this *sample* code is in the documentation of :

    SELECT f1[1][-2][3] as e1, f1[1][-1][5] AS e2 FROM (SELECT '[1:1][-2:-1][3:5]={{{1,2,3},{4,5,6}}}'::int[] AS f1) AS ss;

    e1 | e2
    ----+----
    1   |  6
    (1 row)



    Hmm. That reminds me of PostgreSQL.



    factura_siana=# SELECT f1[1][-2][3] as e1, f1[1][-1][5] AS e2 FROM (SELECT '[1:1][-2:-1][3:5]={{{1,2,3},{4,5,6}}}'::int[] AS f1) AS ss;
    ERROR: missing dimension value


    Not Postgresql at least I tried. Still, I fear this in documentation as much as I feared the smart-ass commentary by the MySQL folks "wee dont need no stinkin transactions. MyISAM will NEVAH support transactions!" (Wonder why InnoDB is so popular now???)

    Danix Defcon, or Daniel,  your a hard to find dude, keep in touch...

     

Leave a comment on “Introducing the CSVDB”

Log In or post as a guest

Replying to comment #:

« Return to Article