• Walleye (unregistered) in reply to Rimm-Kaufman
    Rimm-Kaufman:
    This story doesn't ring true to me.

    Could the IRS be that backward? Really?

    And how could several CREATE INDEX commands run in seconds on several huge tables?

    I'm not an IRS or government employee, but this sounds urban mythish...

    You would not believe how backward the IRS is. Or how afraid they are to make any changes to their systems. Most of their systems are firmly mired in the 1960's and 70's.

  • SQL Design Nazi (unregistered)

    I am really sorry to see so many comments blame the DBAs. For sure, in this particular case he should have investigated. But defining PKs, FKs and indexes aren't for the DBA to do. PKs and FKs are merely a design issue (PKs define how you define a row is unique). Indexes depend on the queries you issue, and are therefore a development issue. When a DBA has 100 databases to manage (I know such cases), do you think he has the leisure to analyse how databases are designed? Most of his time will be consumed by refreshing development databases from backups of production databases, checking that backups worked OK, patching and the like. It's a little facile to say "my program is as slow as a snail, and the DBA didn't create the indexes!"

  • (cs) in reply to Joe
    Joe:
    The Chief Architect for Digg would beg to differ with you. In a podcast interview ("Deep Fried Bits") he stated that they don't normalize every chance they can, because all those JOINS are too much of a performance hit. There IS such a thing as over-normalizing. Even a DB 101 textbook will tell you that.

    And that would explain the use of "properly" in RealDatabaseProgrammer's post you were replying to, wouldn't it?

    Over-normalizing isn't proper normalization; it's over-normalization. Learn to read.

    Joe:
    Both of you: stay away from absolutist statements. Please.

    Posters named "Joe" who can't comprehend the posts shouldn't reply to them. Yep. That's an "absolutist" [sp] statement. Please.

  • The IRS (unregistered) in reply to Peter
    Peter:
    Jonatan:
    might be a bit off-topic. But your are not required to pay taxes. The constitution and the supreme curt says that clearly.

    If you investigate this topic you will be surprised.

    Tell you what - you try it and let us know how it goes for you. I'd be curious to find out. :)

    From what I've read on the subject, people who actually challenge this by refusing to pay taxes tend to lose in court and head to jail. I can see where you're coming from and definitely feel that the majority of my tax money is wasted, but it's still a legal obligation. Once again, let us know how many years you get away with earning money in the US and not paying your taxes. We'll follow the story with interest.

    We too.

  • Walleye (unregistered) in reply to KenW
    KenW:
    Joe:
    The Chief Architect for Digg would beg to differ with you. In a podcast interview ("Deep Fried Bits") he stated that they don't normalize every chance they can, because all those JOINS are too much of a performance hit. There IS such a thing as over-normalizing. Even a DB 101 textbook will tell you that.

    And that would explain the use of "properly" in RealDatabaseProgrammer's post you were replying to, wouldn't it?

    Over-normalizing isn't proper normalization; it's over-normalization. Learn to read.

    Joe:
    Both of you: stay away from absolutist statements. Please.

    Posters named "Joe" who can't comprehend the posts shouldn't reply to them. Yep. That's an "absolutist" [sp] statement. Please.

    As Dave Barry says: "I realize I'm generalizing here, but as in most cases when I generalize, I don't care."

  • 35% Genius (unregistered) in reply to Random Guy
    Random Guy:
    Just on the database note, does anyone know what the upper limits for sizes in most modern database's are? I dont mean the max rows that they support, but how useable they are at extreme sizes?

    I have always wanted to know, since the largest database I have worked with is about 4 million rows and about 1 gig in size ignoring indexes (MySQL). I would be curious to know how large a usable MySQL/Postgresql/Oracle database can be. I tried googling it then, but cant find anyone talking about extreme database sizes and how to get performance (such as real time performance) out of them.

    We have a table that is over 600 million rows. Full table scans on this table can be done in real time. A lot of real time.

  • Jonatan (unregistered) in reply to The IRS

    http://video.google.com/videoplay?docid=-1656880303867390173

    This might be interesting. Although, normally you don't get away with it. I don't live in america to be honest.

    America is an interesting field to lern from, since methods like that also come to germany slowly. (That's where I live.)

  • DeeJay1 (unregistered) in reply to Jay
    Jay:
    This is the government. To be a DBA you have to, (a) have a computer science degree, and (b) be very skilled at filling out the paperwork.

    It's the same in every country, we have here a multimillion euro system, that looks like it was built by high school students and sadly also works like such. Of course with every new release it's more broken than before...

  • Matthew (unregistered) in reply to Jonatan
    Jonatan:
    might be a bit offtopic. But your are not required to pay taxes. The contitution and the supreme curt says that clearly.

    If you investigate this topic you will be surprized.

    Tell Wesley Snipes "Hi" for us when you try this.

  • Mulder (unregistered) in reply to akatherder

    Yeah I once tried something similar, typing "rm -rf /" as root on our production database. And I almost hit "Enter" instead of "Backspace" when I wanted to delete the line... Go figure.

  • Also a big geek (unregistered) in reply to Michael
    Michael:
    Indexs, Keys, who needs them. I tend to store my data in colums rather than rows. Easier to read.

    Hey thanks, I'm maintaining your code right now, you bastard.

  • Franz Kafka (unregistered) in reply to Anonymous
    Anonymous:
    Michael:
    Indexs, Keys, who needs them. I tend to store my data in colums rather than rows. Easier to read.
    We actually have a database like that :(

    1 row, 60 columns :(

    Is it domain data or does it get updated from time to time? Perfectly reasonable to do this sometimes so you can reference some data in a query and update it from one place infrequently.

  • Andrew (unregistered) in reply to Also a big geek
    Also a big geek:
    Michael:
    Indexs, Keys, who needs them. I tend to store my data in colums rather than rows. Easier to read.

    Hey thanks, I'm maintaining your code right now, you bastard.

    There are Column Major Databases. They have some advantages over the Row Major Database storage in common use.

    I don't think Column-oriented databases eliminate the need for indexes or keys. They just perform certain reads and aggregates like COUNT(row) faster.

  • Brynn (unregistered) in reply to Jay

    I work for the Canadian Government, and it is universally here too. I'm a student doing web design for this ministry and it's frustrating me almost to the point of insanity. I tried to put forth the concept to introduce - god forbid- a wiki, and it's blossomed into 8 months of paperwork, proposals, working groups, and meetings. Thank god i go back to school in Sept.

    Not to mention all their dev work is outsourced, and i'm strictly verboten to do anything more complicated than a mail script.

    Don't work in government unless you want a rock-solid stable job with little chance of innovation.

  • TInkerghost (unregistered) in reply to JJ
    JJ:
    I've used databases before, of course, but I'm surprised by what a HUGE difference a few indices would make.
    I have a database that is such a CF that indexing the primary key on a table flips from a query time of about under 2 seconds to over 2 minutes. I know it's based on the sql optimization engine's ordering choice, but it's a pain in the ass to work with. Alternately, I added a multi field index on a table & took one from over 5 minutes to under 15 seconds.

    In short - idiots who don't know how to use a relational database shouldn't design them.

    while (conscious){head.desk();}
    
  • TInkerghost (unregistered)

    Right now I am looking at a refund checkstub in one hand & a notice that I have failed to file my taxes in the other.

  • (cs) in reply to Random Guy
    Random Guy:
    Just on the database note, does anyone know what the upper limits for sizes in most modern database's are? I dont mean the max rows that they support, but how useable they are at extreme sizes?

    I have always wanted to know, since the largest database I have worked with is about 4 million rows and about 1 gig in size ignoring indexes (MySQL). I would be curious to know how large a usable MySQL/Postgresql/Oracle database can be. I tried googling it then, but cant find anyone talking about extreme database sizes and how to get performance (such as real time performance) out of them.

    Try googling for "real time performance," then. RDBMSs don't do real-time. They don't do pre-emptive multi-tasking and they don't do latency.

    Of course, if you were insane, you could put them on top of a real-time operating system.

    In general, the limits (as observed many times above) are O(n) and O(n log n). In terms of scalability, you're not talking about algorithms; you're talking about system architecture. Ignoring distributed databases over a FDDI ring, and assuming a single server, it basically comes down to the design of the schema itself and the amount of RAM you have handy.

    You're probably better off re-specifying the problem.

  • (cs) in reply to ClaudeSuck.de
    ClaudeSuck.de:
    real_aardvark:
    Andrew:
    No, good normalization does not degrade performance. Suppose a table has 2 million rows & two FOREIGN KEYs. Both speed & space can be imporved.

    One key points to a VARCHAR(200), which is stored only once. The table's disk space holds only a 4-byte integer, or at least 200/4 = 50 times less space. <snip/>

    The database performs faster due to this normalization.

    Yes, the database will perform faster.

    No, this is not normalisation in any way, shape or form.

    No ...<snip/>

    I believe the term you are searching for is "caching."<snip/>

    The difference between an INT and a VARCHAR is that an INT is always 4 bytes (in our days) whereas for a VARCHAR the number of bytes to read must always be determined (for each row) either by a number indicating the length (or amount) of data that must be read or by parsing until ASCII 0 is met. Even a hash should give worse results than a stupid lookup of an INT.

    You force me to use my Powers of Observation, Captain Obvious. Much against my will, I might add.

    I was replying to an apparent idiot (anonymous) who believes that adding a (presumably auto-incrementing) int as a foreign key is some form of normalisation. It is not. It is merely a sensible thing to do. It doesn't actually help if the users persist in submitting queries based upon the original VARCHAR(200), but there's not much you can do about that, other than castrating the silly bastards for the good of mankind or (even crueller) encapsulating all queries into a stored procedure and allowing access only via the One True DBA.

    Thank you for explaining the semantics of a VARCHAR to me. That was most useful. It was particularly useful in that I had always assumed (what with being an ancient VOS and PL/1 programmer) that the whole point of a VARCHAR was that it didn't care about ASCII 0. Otherwise, why waste two bytes for the length? I must remember that.

    The idiot to whom I replied did have at least one valid point, however. Those naughty 3GHz cpus do have an O(log n) problem calculating a hash on a VARCHAR, whereas they only have an O(1) problem calculating an, er, integer from an integer.

    And then hashing (or indeed PATRICIAing) that integer, btw. You seem to be under the misapprehension that there's some sort of cretinous linear search going on there. Or maybe not -- I'm guessing at your translation from German to English here; in which case, verzeihen Sie mir.

    Actually, I'm surprised that the idiot in question didn't use a BLOB as his example. Now, there's a wonderful example of a well-chosen foreign key.

    All of this is trivial. Can we get back to real programming (eg VB.Net) now?

  • synp (unregistered) in reply to Mulder
    Mulder:
    Yeah I once tried something similar, typing "rm -rf /" as root on our production database. And I *almost* hit "Enter" instead of "Backspace" when I wanted to delete the line... Go figure.

    almost is the difference between a comment on this site, and a featured article

  • Aris (unregistered)

    Am I alone to find completely irresponsible to have launched such a dangerous request (alter table) on a production system without even looking at a test system ?

  • (cs) in reply to real_aardvark
    real_aardvark:
    ClaudeSuck.de:
    real_aardvark:
    Andrew:
    No, good normalization does not degrade performance. Suppose a table has 2 million rows & two FOREIGN KEYs. Both speed & space can be imporved.

    One key points to a VARCHAR(200), which is stored only once. The table's disk space holds only a 4-byte integer, or at least 200/4 = 50 times less space. <snip/>

    The database performs faster due to this normalization.

    Yes, the database will perform faster.

    No, this is not normalisation in any way, shape or form.

    No ...<snip/>

    I believe the term you are searching for is "caching."<snip/>

    The difference between an INT and a VARCHAR is that an INT is always 4 bytes (in our days) whereas for a VARCHAR the number of bytes to read must always be determined (for each row) either by a number indicating the length (or amount) of data that must be read or by parsing until ASCII 0 is met. Even a hash should give worse results than a stupid lookup of an INT.

    You force me to use my Powers of Observation, Captain Obvious. Much against my will, I might add.

    I was replying to an apparent idiot (anonymous) who believes that adding a (presumably auto-incrementing) int as a foreign key is some form of normalisation. It is not. It is merely a sensible thing to do. It doesn't actually help if the users persist in submitting queries based upon the original VARCHAR(200), but there's not much you can do about that, other than castrating the silly bastards for the good of mankind or (even crueller) encapsulating all queries into a stored procedure and allowing access only via the One True DBA.

    Thank you for explaining the semantics of a VARCHAR to me. That was most useful. It was particularly useful in that I had always assumed (what with being an ancient VOS and PL/1 programmer) that the whole point of a VARCHAR was that it didn't care about ASCII 0. Otherwise, why waste two bytes for the length? I must remember that.

    The idiot to whom I replied did have at least one valid point, however. Those naughty 3GHz cpus do have an O(log n) problem calculating a hash on a VARCHAR, whereas they only have an O(1) problem calculating an, er, integer from an integer.

    And then hashing (or indeed PATRICIAing) that integer, btw. You seem to be under the misapprehension that there's some sort of cretinous linear search going on there. Or maybe not -- I'm guessing at your translation from German to English here; in which case, verzeihen Sie mir.

    Actually, I'm surprised that the idiot in question didn't use a BLOB as his example. Now, there's a wonderful example of a well-chosen foreign key.

    All of this is trivial. Can we get back to real programming (eg VB.Net) now?

    I'm sorry that I was again stepping on your toes without taking the time to actually understand what is going on. In fact, Andrews "point" makes so few sense that it confused me completely. He seems to mess up several concepts and jumps from one keyword to another without finishing his thoughts. I suspect that he is an Access newbie. Using a VARCHAR(200) as a key is a WTF on its own which can only be topped by using a BLOB (that was a good one).

    And now let's really get back to: TRWTF is using VB or .NET, or Java, or Coldfusion, or PHP or whatever language. Sometimes I think TRWTF is writing programs.

  • (cs) in reply to Aris
    Aris:
    Am I alone to find completely irresponsible to have launched such a dangerous request (alter table) on a production system without even looking at a test system ?

    Yes, operation of an "open heart" is a really thrilling thing that you should try at least once before you die.

  • I know more than you do (unregistered) in reply to andrewbadera

    In a properly designed database, normalizing causes the construction of indexes which improve performance. I was pleased to discover, after a few YEARS of use, that PostgreSQL was busy building indexes for every single foreign key and check constraint I built.

    In short, the more specific I was about normalizing the database, the better the overall query time simply by defining how the relationships should occur.

    I really can't give a better example of effective sofware design. This is simply fantastic!!!

  • TInkerghost (unregistered) in reply to Mulder
    Yeah I once tried something similar, typing "rm -rf /" as root on our production database. And I *almost* hit "Enter" instead of "Backspace" when I wanted to delete the line... Go figure.
    Worse, I did
     rm -rf .* 
    to remove about 30 hidden files from my home machine.... I highly suggest you NOT do that on any machine.
  • Kuba (unregistered) in reply to F
    F:
    i have a database with 800mbytes of data and around 6 millions rows. and "a few minutes" for schema change is 30+, the indexes take 600mbytes and doing the first query takes a full minute (while it loads the index).

    so much for relational speed

    On a run-of-the-mill PC, loading such index would take ~12 seconds. WTF?

  • (cs) in reply to TInkerghost

    This is why on modern systems, .* will not expand to . or .. and rm -r will not descend into them.

  • (cs) in reply to ClaudeSuck.de
    ClaudeSuck.de:
    And BTW, this is, of course for DBs with several million rows, not an address book.

    What about an address book with several million rows?

  • Ninkendo (unregistered)

    The real WTF is that the existence of indexes isn't the very first thing he checked for.

  • Anonymous Person (unregistered) in reply to akatherder

    I would be tempted to type "DROP DATABASE database_name" and just leave it in the query window.

    I wouldn't execute it, but just to give the DBA sitting there a heart attack.

    EVIL! I love it!

  • Lynx (unregistered) in reply to TInkerghost
    TInkerghost:
    Yeah I once tried something similar, typing "rm -rf /" as root on our production database. And I *almost* hit "Enter" instead of "Backspace" when I wanted to delete the line... Go figure.
    Worse, I did
     rm -rf .* 
    to remove about 30 hidden files from my home machine.... I highly suggest you NOT do that on any machine.
    Worse, I unofficially know of an incident in my company where a Sys Admin did hit "Enter" to something similar while using a superuser account for a production server (application server IIRC). Much cussing and swearing followed, as I understand.
  • (cs) in reply to Jay
    Jay:
    or that there should be more programmers than managers on a project.

    Heh. I think any large company comes close to that, too. I've seen it as my company has grown from 60 to over 600 people.

    BTW, didn't the IRS have oh, I dunno, a development copy of the database lying around for him to work on...?

  • Eddy (unregistered) in reply to andrewbadera
    andrewbadera:
    Peter:
    I bet the tables also weren't properly normalized.

    Define "properly normalized." You do know that normalization is anti-performance, right?

    It's indeed faster to do a lookup on a 0NF table <g>

  • AdT (unregistered) in reply to andrewbadera
    andrewbadera:
    Define "properly normalized." You do know that normalization is anti-performance, right?

    Don't wear a helmet when motorcycling. It increases drag.

  • What is REALLY wrong! (unregistered) in reply to Code Slave

    Actually, the REAL WTF is why don't they fire the whole lot of them, ESPECIALLY those DBAs! They should also forget the taxes. All it does is buy PORK! I swear, sometimes I just want to go KOSHER!

  • What is REALLY wrong! (unregistered)

    BTW regarding the speed of creating the indexes... Sometimes adding an index on a small table can make a fantastic difference, ESPECIALLY if other tables are partitioned in some way and/or excluded by criteria in the smaller table and/or it saves enough space to use a quicker method, etc....

  • John (unregistered) in reply to akatherder

    You wouldn't PLAN to execute it. When the DBA tackled you out of the chair and one of you accidentally hit the enter key though... It'd all get blamed on you.

  • MidNiteSk8r (unregistered) in reply to Jay

    Or....you could be like my current lead (and others I've seen before): You've been around since before there was such a thing as a "database".
    How many govt. folks know of the Peter Principle (rising to one's own level of incompetence).

    You don't - and you never will...care to act for more. After all, it "works" now, doesn't it?

Leave a comment on “Hastening an Inevitable”

Log In or post as a guest

Replying to comment #:

« Return to Article