• (cs) in reply to Aaron
    Aaron:
    It could have been an indexed view, no?

    Still would make more sense to parse during import, but nevertheless, if it took less than 1 second to look up records, the view could very well have been indexed even if the table wasn't.

    You could create function based indexes (Oracle) on it, but does seem like a pretty crappy way to do it.

  • JoeTheNonPlumber (unregistered)

    This story is gay.

  • IMSoP (unregistered) in reply to Thuktun
    Thuktun:
    First, the NPI list is published in a non-standard CSV format
    Hang on, there's a standard for CSV files now?
    where every field is quoted, regardless of the contents of the field.
    Sounds good to me - no need to have awkward code to detect whether a field is quoted or not.
    When there are special characters in the field like double-quotes (") or commas (,), they are NOT ESCAPED.
    That'll be what the quoting's for, then. Although, if the quoting is done with double-quotes, and there's then no escaping of them, then yeah, this bit's broken.
    On a more amusing note, empty fields are also ALWAYS quoted, even though there's clearly no reason.
    No reason, apart from the fact that every field is quoted. Sure, this particular special case would save a lot of bytes, but:
    Needless to say, it has a pretty impressive compression ratio in the ZIP file that we download.
    So that's alright then.

    My personal unfavourite? CSV files with embedded newlines in some of the fields. Try parsing that with a line-by-line loop. :(

  • IMSoP (unregistered) in reply to JoeTheNonPlumber
    JoeTheNonPlumber:
    This story is gay.

    This comment is a submissive bisexual female-to-male transsexual.

  • (cs) in reply to Andy Goth
    raid disk array I always chuckle inwardly whenever someone says "RAID array". Spelling it out even further ("RAID disk array") made me laugh out loud. Now I'm just waiting for someone to start talking about their redundant RAID. :^)

    I was going to write a lengthy message chiding you for being so nitpicky, but I have to run to the ATM machine (I just hope I can remember my PIN number).

  • Redundant Array of Inane Comments Array (unregistered) in reply to IMSoP
    IMSoP:
    Thuktun:
    First, the NPI list is published in a non-standard CSV format
    Hang on, there's a standard for CSV files now?

    Yeah, the standard way is to use commas. This one used 'lol :p' as the delimiter.

    As far as RAID goes, in typical use RAID refers to specific technological implementations. For instance, if I were to configure mirroring over two cheap RAID 5 arrays, I would have a Redundant Array of Inexpensive Redundant Arrays of Inexpensive Disks. A RAIRAID?

    More seriously, RAID as a technology would describe the type of an array - as opposed, for instance, to a lolcat array, or a toaster array. A RAID disk is usefully distinguished from a disk that is not part of an array, and redundant RAID is usefully distinguished from a single array. Essentially, even though the phrase "Redundant RAID" contains duplication when the term RAID is expanded, it contains additional information not present in the dupe-free form. As it is simply deleted, rather than providing an efficient means of restoration, such linguistic collapse is too lossy for many applications, and insisting on it is too whiny for all others.

    As the problem lies in the naming of RAID technology itself, rather than in any improper application of the words redundant, array, inexpensive, independent, or disk, those complaining about such uses should clearly shut up and quit being whiny bitches.

  • (cs) in reply to Redundant Array of Inane Comments Array
    Redundant Array of Inane Comments Array:
    IMSoP:
    Hang on, there's a standard for CSV files now?

    Yeah, the standard way is to use commas. This one used 'lol :p' as the delimiter.

    And it is thus a DSV file. DSV, of course, stands for Delimiter Separated Values.

  • David (unregistered) in reply to Paladin
    Paladin:
    Boy you guys really have no idea of the power materialized views have in MS SQL 2000. They are awesome.

    Some fools will tell you they don't exist but ignore the haters. Indexed views and plain indexed derived tables with triggers are for chumps.

    Indeed we rely so much on materialised views that we never even both with database design. We just shove all of out enterprise data into a single row of a single column and let the views sort it out while we kick back and play xbox.

  • DiRadical (unregistered)

    Whoohoo, this brings back memories. But COBOL redefines should really be left to COBOL and flat files.

    <Flat file DBA mode> This is a very flexible solution. An new record type can be up and running is a few hours </Flat file DBA mode>
  • IHasYerCheezburger (unregistered) in reply to Andy Goth
    Andy Goth:
    Quango:
    raid disk array
    I always chuckle inwardly whenever someone says "RAID array". Spelling it out even further ("RAID disk array") made me laugh out loud. Now I'm just waiting for someone to start talking about their redundant RAID. :^)
    Dou you mean my redundant RAID array of inexpensive disks?
  • IHasYerCheezburger (unregistered) in reply to IHasYerCheezburger
    IHasYerCheezburger:
    Andy Goth:
    Quango:
    raid disk array
    I always chuckle inwardly whenever someone says "RAID array". Spelling it out even further ("RAID disk array") made me laugh out loud. Now I'm just waiting for someone to start talking about their redundant RAID. :^)
    Dou you mean my redundant RAID array of inexpensive disks?
    That comment was redundant.

    Anyway, about the inexpensive/independent. At the time of the Patterson paper, you had these really huge enterprisey harddisks with storage capacities up to perhaps 100 MiB (w00t!). They cost you a limb back then. At the same time, "cheap" (as in cost per megabyte) consumer-grade harddisks started to appear, with sizes like 5 or 10 MiB. Back then, the total cost per megabyte of a RAID was smaller than that of one disk of the same size as the raid.

    Today it's the other way around with harddisks in the range of 0.5 TiB to 1.5 TiB. They are cheaper in €/GiB (or $/GiB) than an equal sized (usable space) RAID.

  • (cs) in reply to IHasYerCheezburger
    IHasYerCheezburger:
    At the time of the Patterson paper, you had these really *huge* enterprisey harddisks with storage capacities up to perhaps 100 MiB (w00t!). They cost you a limb back then. At the same time, "cheap" (as in cost per megabyte) consumer-grade harddisks started to appear, with sizes like 5 or 10 MiB. : Today it's the other way around with harddisks in the range of 0.5 TiB to 1.5 TiB. They are cheaper in €/GiB (or $/GiB)...
    Wow, they store Men in Black on hard disks now? I also am not quite sure what TiB and GiB are though...Turtles in Black and Griffins in Black maybe?

    (Incidentally, hard disks marketroids created the prefix confusion because they used base-10 numbers to inflate their capacity (knowing that their block sizes were always in base-2 units anyway because all operating systems have to make a block fit in base-2 addressed memory), and you used the base-2 prefixes...I find this supremely amusing. I can't fault you for it though, as the IEEE were trying to find a solution to a problem that didn't exist in the first place.)

  • MCS (unregistered) in reply to Quango

    <cfset comment1 = ReplaceNoCase(Quangocomment, 'aware about', 'aware of')>

    It's a common English faux-pas. No offense meant, it just bugs me. Almost as much as it bugs me that Federal DB info has to be parsed out the way described in the article.

    You'd think the government at least would have separated fields for data integrity. Sheesh.

  • DBA (unregistered)

    Was the SNIP required?

    How many columns were in the leftover space between 52 and 78?

    SELECT rtrim(substring(DataType, 1, 25)) AS ProviderName, rtrim(substring(DataType, 26, 15)) AS ProviderAddress, rtrim(substring(DataType, 41, 10)) AS ProviderNPI, /* SNIP more columns */ FROM dbo.ProviderTable WHERE RecordType = '02'

    The underlying table had just two columns; DataType (VARCHAR(78)) and RecordType (VARCHAR(2)).

  • Sane Person (unregistered) in reply to Thuktun
    Thuktun:
    This hits close to home. We have a DTS package that I wrote that checks every night to see if there's an updated NPI list on the source website. If there is, it downloads it and replaces the contents of our local NPI table with the new list.

    First, the NPI list is published in a non-standard CSV format where every field is quoted, regardless of the contents of the field. When there are special characters in the field like double-quotes (") or commas (,), they are NOT ESCAPED.

    We couldn't use the off-the-shelf CSV parsing capabilities via DTS, so I had to write some elaborate C# code to parse the blasted thing.

    On a more amusing note, empty fields are also ALWAYS quoted, even though there's clearly no reason. In a file with thousands of fields per record and millions of records per file, there are A BLOODY LOT OF QUOTES in that file. Needless to say, it has a pretty impressive compression ratio in the ZIP file that we download.

    I cried because I had a CSV with every field quoted. Then I met a man who had XML.

  • A Wilson (unregistered) in reply to Andy Goth

    I believe the correct term is "redundant RAID array of inexpensive disks."

  • George (unregistered) in reply to Andy L
    Andy L:
    That reminds me of the time I was trying to figure out who stole my lunch. After I investigated for a while, I discovered that the refrigerator's compressor had a slow leak and was wasting electricity.

    Case Solved.

    Priceless.

  • (cs) in reply to greg
    greg:
    Maybe I'm just dumb, but what does the beginning of the story have to do with the end?

    What does a retarded table that is super slow to query have to do with the missing data?

    Just more examples of IT professionals solving the wrong problem. They'll argue to the death over the most efficient way to do the import, run the query, break it down into clock cycles and throw around details of the CPU pipelines... meanwhile we're still missing a record.

Leave a comment on “The Secret DTS Package”

Log In or post as a guest

Replying to comment #:

« Return to Article