• Matthew (unregistered) in reply to b1xml2
    b1xml2:
    when it comes to the magic figure of millions of rows per table, indexing can be the difference between hours and seconds.

    It can also be the difference between seconds and hours. I once worked on a SQL Server database for an application that created a report by populating a table with what could end up being in excess of a million rows. The beauty of the temporary table (it was truncated when the report was complete) is that it had a clustered index on it. Sample reports would run very quickly. A full report could only be started at close of business because the drag caused by the moving of data would bring a 3x3Ghz CPU Xeon server to its knees. I was hired to "find out why the report is so slow." I deleted the clustered index and the next day management thought I was a genius.

  • Edward Royce (unregistered)

    Hmmmmm.

    So now I know who to blame!!

  • (cs) in reply to Mike Dimmick
    Mike Dimmick:
    No, they use a simple linear search which is O(n). The engine will only sort if a sort has been specified in an ORDER BY clause. It may sort to do a join, or it may use hashing.<snip>

    This is one of the few times that Alex highlighted a good reply. Most people in here have no idea what they are talking about, especially when it comes to RDBMS. Thanks for the dose of sanity.

  • David Walker (unregistered) in reply to Wayne
    Wayne:
    Bah... these are the DBA's who didn't know enough to build an index? I doubt they'd be phased by "DROP DATABASE" or even "DROP TABLE".

    Phased?

  • ergegs (unregistered) in reply to F

    You suck. I have database that's 5 gigs big, and runs like a champ.

  • (cs) in reply to ergegs
    ergegs:
    You suck. I have database that's 5 gigs big, and runs like a champ.

    Did I ever mention I have a 12 foot penis?

  • Corey (unregistered) in reply to andrewbadera
    andrewbadera:
    Peter:
    I bet the tables also weren't properly normalized.

    Define "properly normalized." You do know that normalization is anti-performance, right?

    Whether normalized or denormalized is more performant completely depends on what you are trying to do.

    Normalized data is very performant for writes. Denormalized data is very performant for reads that include all the details and for aggregation.

  • (cs) in reply to akatherder
    akatherder:
    I would be tempted to type "DROP DATABASE database_name" and just leave it in the query window.

    I wouldn't execute it, but just to give the DBA sitting there a heart attack.

    Attention! You might eventually execute it when executing another command. A query window may execute several commands in a batch job if you don't select a particular command.

  • (cs) in reply to my name is missing
    my name is missing:
    So what did the DBA do for his zillion dollars per hour?

    Since when DBAs earn zillions of dollars? Normally, they earn even less than a programmer.

  • Jonatan (unregistered)

    might be a bit offtopic. But your are not required to pay taxes. The contitution and the supreme curt says that clearly.

    If you investigate this topic you will be surprized.

  • (cs) in reply to David Walker
    David Walker:
    Wayne:
    Bah... these are the DBA's who didn't know enough to build an index? I doubt they'd be phased by "DROP DATABASE" or even "DROP TABLE".

    Phased?

    I think he means as in "The DBA was phased by the magic touch of the mains plug which may or may not have been placed by the BOFH." Triple-phased, even. ;)

    If I had been Bobby (Tables?), I would've added a primary key as well. I'd chill at the sight of something like this! PK/indexes really help, as I found out when managing a 4 million table, eating away about 3Gb in our 'test' server. Good thing that PostgreSQL was able to cope with it ;)

  • patreh (unregistered)

    I can verify Jay is absolutely correct. I too have way too much experience with goverment IT.

  • Peter (unregistered) in reply to Jonatan
    Jonatan:
    might be a bit off-topic. But your are not required to pay taxes. The constitution and the supreme curt says that clearly.

    If you investigate this topic you will be surprised.

    Tell you what - you try it and let us know how it goes for you. I'd be curious to find out. :)

    From what I've read on the subject, people who actually challenge this by refusing to pay taxes tend to lose in court and head to jail. I can see where you're coming from and definitely feel that the majority of my tax money is wasted, but it's still a legal obligation. Once again, let us know how many years you get away with earning money in the US and not paying your taxes. We'll follow the story with interest.

  • (cs) in reply to ClaudeSuck.de
    ClaudeSuck.de:
    akatherder:
    I would be tempted to type "DROP DATABASE database_name" and just leave it in the query window.

    I wouldn't execute it, but just to give the DBA sitting there a heart attack.

    Attention! You might eventually execute it when executing another command. A query window may execute several commands in a batch job if you don't select a particular command.

    Well, isn't that the idea? Those idiots would deserve it anyhow!

  • Mr.'; Drop Database -- (unregistered) in reply to Peter
    Peter:
    Jonatan:
    might be a bit off-topic. But your are not required to pay taxes. The constitution and the supreme curt says that clearly.

    If you investigate this topic you will be surprised.

    Tell you what - you try it and let us know how it goes for you. I'd be curious to find out. :)

    From what I've read on the subject, people who actually challenge this by refusing to pay taxes tend to lose in court and head to jail. I can see where you're coming from and definitely feel that the majority of my tax money is wasted, but it's still a legal obligation. Once again, let us know how many years you get away with earning money in the US and not paying your taxes. We'll follow the story with interest.

    Something like this has already happened. http://en.wikipedia.org/wiki/Edward_Lewis_Brown

  • (cs) in reply to Mr.'; Drop Database --
    Mr.'; Drop Database --:
    Something like this has already happened. http://en.wikipedia.org/wiki/Edward_Lewis_Brown

    Someone added a reference to the daily wtf on this page. WTF did someone do that?

  • Random Guy (unregistered)

    Just on the database note, does anyone know what the upper limits for sizes in most modern database's are? I dont mean the max rows that they support, but how useable they are at extreme sizes?

    I have always wanted to know, since the largest database I have worked with is about 4 million rows and about 1 gig in size ignoring indexes (MySQL). I would be curious to know how large a usable MySQL/Postgresql/Oracle database can be. I tried googling it then, but cant find anyone talking about extreme database sizes and how to get performance (such as real time performance) out of them.

  • (cs) in reply to Rob
    Rob:
    andrewbadera:
    You do know that normalization is anti-performance, right?

    I don't, and I have a hard time believing this. Can you explain?

    Rob

    Just try creating a report with a query containing some 10 or more JOIN instructions against a "flat" query against a flat table. Of course, your DB should still be normalized, but running this report from a "reporting table" will, for sure, increase performance. Creation of this reporting table can be done over night, for example, when it doesn't hurt too much. Think of gathering data out of 10 or more boxes against gathering it from a single one.

    And BTW, this is, of course for DBs with several million rows, not an address book.

  • (cs) in reply to TheDude
    TheDude:
    DeLos:
    The real WTF is that they aren't using flat files to store the data instead of a slow database!

    Excel spreadsheets are even faster than flat files.

    Don't know about the latest version but Excel 5 could hold 16000 rows and the latest version I know can hold 65535 rows. Which is lame for a "database".

  • (cs) in reply to w
    w:
    andrewbadera:
    Peter:
    I bet the tables also weren't properly normalized.

    Define "properly normalized." You do know that normalization is anti-performance, right?

    Not always

    This also depends on the definition of performance. For me it means speed. But when I look at the MS definition of performance, it means the number of features, the ease of access and the like. Normalizing a DB makes it easier to handle and manage the data, but as I wrote in a previous post, it doesn't necessarily mean speed of execution.

  • (cs) in reply to F
    F:
    i have a database with 800mbytes of data and around 6 millions rows. and "a few minutes" for schema change is 30+, the indexes take 600mbytes and doing the first query takes a full minute (while it loads the index).

    so much for relational speed

    I find it quite surprising that 800 MB of data need 600 MB. Did you create some 20 indexes? Don't forget that indexes need to be updated from time to time, so INSERTs and UPDATEs become slower the more indexes you have.

  • (cs) in reply to TheDude
    TheDude:
    Ken B:
    Don't blame "relational" for your implementation's poor speed. My DBMS can find a record via index on a multi-million-row table in less than a second, even for the first search.

    Ken, no-one really cares about your db. Let's not get into a "my database is faster than yours" argument.

    Why not? Lyle can do it better, anyway.

  • Chad (unregistered) in reply to Itzac
    Itzac:
    In all this talk about normalization, how is it no one's brought up 3rd normal. Of course even a perfect 3rd normal database isn't necessarily going to give you peak performance. But generally, a sensibly normalized database combined with the right indexes will make a huge difference.

    Having said that, I've run into a case or two where denormalizing was the answer.

    With regards to foreign/primary keys, I know that Oracle enforces their uniqueness by drum roll... creating indexes, and I suspect most other databases do the same.

    SQL Server 2000 creates indexes on keys be default.

  • (cs) in reply to real_aardvark
    real_aardvark:
    Andrew:
    andrewbadera:
    Peter:
    I bet the tables also weren't properly normalized.

    Define "properly normalized." You do know that normalization is anti-performance, right?

    No, good normalization does not degrade performance. Suppose a table has 2 million rows & two FOREIGN KEYs. Both speed & space can be imporved.

    One key points to a VARCHAR(200), which is stored only once. The table's disk space holds only a 4-byte integer, or at least 200/4 = 50 times less space.

    Users often search on both these FOREIGN KEYs. The database engine can much more easily index integer keys than VARCHAR(200) strings or most other datatypes. Also, some databases automatically create indexes on PRIMARY or FOREIGN KEYs. The database performs faster due to this normalization.

    Yes, the database will perform faster.

    No, this is not normalisation in any way, shape or form.

    No, in the non-degenerate case, a database will create indices with equal ease, be the key an integer or a VARCHAR(200).

    I believe the term you are searching for is "caching." Since this applies equally as well to in-memory data structures, most mere programmers are just as comfortable with the concept as the mighty DBA.

    The difference between an INT and a VARCHAR is that an INT is always 4 bytes (in our days) whereas for a VARCHAR the number of bytes to read must always be determined (for each row) either by a number indicating the length (or amount) of data that must be read or by parsing until ASCII 0 is met. Even a hash should give worse results than a stupid lookup of an INT.

  • Chad (unregistered) in reply to Anonymous Vector Hacker
    Anonymous Vector Hacker:
    Try ~175 billion rows in a vector db (nine years of stock market records).

    Data warehouses are starting to use Cubes now. All the processing is done up front so data retrieval can be much faster than a regular RDBMS.

  • (cs) in reply to Bappi
    Bappi:
    ergegs:
    You suck. I have database that's 5 gigs big, and runs like a champ.

    Did I ever mention I have a 12 foot penis?

    So, in winter you can wrap it around you and it keeps you warm. In summer you might fold it above your head to give you some shade.

  • TheHat (unregistered)

    "Primary keys: none"

    I admit I'm a database n00b but I didn't realize this was possible.

  • Kaemaril (unregistered) in reply to Jonatan
    Jonatan:
    might be a bit offtopic. But your are not required to pay taxes. The contitution and the supreme curt says that clearly.

    If you investigate this topic you will be surprized.

    Wesley Snipes certainly was.

  • david (unregistered)

    9 Minutes? This replaces a system where best case was probably nine hours, or perhaps 9 weeks. The users probably loved it when it was installed.

  • Gabe da Silveira (unregistered) in reply to andrewbadera
    Define "properly normalized." You do know that normalization is anti-performance, right?

    I'm sorry, but this is painfully ignorant. There are standard normal forms that represent a theoretically optimal data representation. In general normalization improves performance on a great number of ad-hoc queries. In practice we de-normalize selectively to optimize certain queries. However that comes at a cost of disk space, data maintenance, ad-hoc querying ability, and risk of data corruption.

  • Mr.'; Drop Database -- (unregistered) in reply to Gabe da Silveira
    Gabe da Silveira:
    Define "properly normalized." You do know that normalization is anti-performance, right?
    ...In practice we de-normalize selectively to optimize certain queries...
    So you agree with him?
  • (cs)

    I will be the first to call BS on this. While I don't work for the IRS, I do work across the street at another agency that deals with sensitive data. We have development databases, QA databases, training databases, production databases, etc. I find it very, very, very hard to believe that the Dept. of Treasury does not require similar standards of its bureaus.

    My take is that the OP did this in a development database and realized that it wasn't anything special, so the story was spiced up to take place in production.

    And to the person who said that we don't have to pay taxes because it isn't in the Constitution, this has been debunked many times. You do realize that laws passed by Congress, such as the Internal Revenue Code (Title 26), are still legally-binding right? I suppose murder is legal, since it doesn't say anything about it in the Constitution. And I guess I can drive on whichever side of the road I choose. And I suppose it would be okay for me to rape your wife, since the Constitution doesn't prohibit it.

  • Peter (unregistered) in reply to andrewbadera

    That's not entirely true. The level of Normalization/Denormalization can have a dramatic impact on performance. It largely depends on the data that's being queried and a good DBA would work with developers to understand how the data is being used and define a structure that has the proper level of normalization.

  • Mik (unregistered) in reply to Random Guy
    Random Guy:
    Just on the database note, does anyone know what the upper limits for sizes in most modern database's are? I dont mean the max rows that they support, but how useable they are at extreme sizes?

    I have always wanted to know, since the largest database I have worked with is about 4 million rows and about 1 gig in size ignoring indexes (MySQL). I would be curious to know how large a usable MySQL/Postgresql/Oracle database can be. I tried googling it then, but cant find anyone talking about extreme database sizes and how to get performance (such as real time performance) out of them.

    Tens of billions of rows, hundreds of gigs, a thousand transactions/sec and three thousand queries/sec would be medium sized. I'm not sure where 'big' starts.

  • Manic Mailman (unregistered) in reply to TheSabre
    TheSabre:
    I find it very, very, very hard to believe that the Dept. of Treasury does not require similar standards of its bureaus.

    Things may have been somewhat different 15 years ago when a Pyramid 4-way 150MHz computer would have been state of the art.

  • Enzo (unregistered) in reply to andrewbadera

    ...and you can guarantee they have no idea of statistics or Informix in general.

  • Captcha eros (unregistered)

    He forgot to insert sleep(100060MINUTES); into code so he could just reduce MINUTES by one monthly and collect money

  • Brian (unregistered) in reply to FlySwat

    You're aware that primary keys are automatically added to tables upon creation and there's a simple "add_index" method when writing a migration, right?

    Just because you dislike a language/framework (for practically no reason, I'm guessing) doesn't mean you should ignorantly talk crap about it. Educated crap, sure, but cut the stupid comments.

  • lokey (unregistered) in reply to Jonatan
    Jonatan:
    might be a bit offtopic. But your are not required to pay taxes. The contitution and the supreme curt says that clearly.

    If you investigate this topic you will be surprized.

    Who is this "supreme curt" and where does he live? The IRS wants to talk to him...

  • Tourist (unregistered) in reply to Peter
    Peter:
    I bet the tables also weren't properly normalized.

    I bet you are a fresh graduate

  • mark (unregistered)

    Wow! I almost cried a little when I read this ;) What a wonderful story. It bet it must have felt pretty sweet when you where done :)

  • synp (unregistered) in reply to Code Slave
    Code Slave:
    More importantly, Bobby's changes should have first been tested on a test server, tested, documented, handed over to the DB Admin, implemented on the beta server, re-tested and then implemented on production .

    The Real WTF, is that they were experimenting with a production system.

    This was pretty much spelled out in the story: "Of course, it's highly unorthodox for someone to be given absolute access to one of the IRS's production databases, but no one wanted to question the big cheese. "

    I'd bet he could very easily have caused damage rather than making things so much better. Just a few days ago, everyone here was dissing "The Process", but there should be a middle ground between "Bobby wants a change. Bobby changes the production" and "We need a 9-hour process to allow someone to click F1"

  • Rimm-Kaufman (unregistered)

    This story doesn't ring true to me.

    Could the IRS be that backward? Really?

    And how could several CREATE INDEX commands run in seconds on several huge tables?

    I'm not an IRS or government employee, but this sounds urban mythish...

  • (cs) in reply to Rob Williams
    Rob Williams:
    A general rule of thumb that I hear and see practiced in the industry is to design for 3rd Normal Form, then "turn the dial back a quarter turn" to roughly 2.75 Normal Form (meaning denormalize just a little in the handful of places where performance is measurably subpar).

    Wait, you would go backwards to a lower NF than 3rd?

    You would have attributes in a relation not directly dependant on the Key(s)? WTF?

    So in your systems, remember when updating an Order Line item, don't forget to update the Customer Contact Details!

    sigh...

  • Anonymous (unregistered) in reply to Michael
    Michael:
    Indexs, Keys, who needs them. I tend to store my data in colums rather than rows. Easier to read.
    We actually have a database like that :(

    1 row, 60 columns :(

  • JohnLocke (unregistered) in reply to Bappi
    Bappi:
    ergegs:
    You suck. I have database that's 5 gigs big, and runs like a champ.

    Did I ever mention I have a 12 foot penis?

    What, your penis has feet and walks around?

  • Pawel (unregistered) in reply to Jay

    How long did it take to fill out the paperwork to leave the job? 12.5 years?

  • cklam (unregistered) in reply to ClaudeSuck.de
    ClaudeSuck.de:
    Bappi:
    ergegs:
    You suck. I have database that's 5 gigs big, and runs like a champ.

    Did I ever mention I have a 12 foot penis?

    So, in winter you can wrap it around you and it keeps you warm. In summer you might fold it above your head to give you some shade.

    If you don't mind looking like the Michelin Man, that is.

  • (cs)
    There's no way it could've run that fast. I must've totally destroyed the production database.

    The real WTF is that he was optimizing the production database without any testing or backups.

  • (cs) in reply to Random Guy
    Random Guy:
    Just on the database note, does anyone know what the upper limits for sizes in most modern database's are? I dont mean the max rows that they support, but how useable they are at extreme sizes?

    I have always wanted to know, since the largest database I have worked with is about 4 million rows and about 1 gig in size ignoring indexes (MySQL). I would be curious to know how large a usable MySQL/Postgresql/Oracle database can be. I tried googling it then, but cant find anyone talking about extreme database sizes and how to get performance (such as real time performance) out of them.

    We run numerous SQL Server 2000 and 2005 databases here that run 35+ GB and growing. Insurance eligibility and claim records. And that's still nothing -- we haven't come anywhere close to needing partitioning, which would allow truly huge numbers.

Leave a comment on “Hastening an Inevitable”

Log In or post as a guest

Replying to comment #:

« Return to Article