- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
I bet the tables also weren't properly normalized.
Admin
How the heck did this DBA earn his title?
(First!)
Admin
the real wtf is a government agency actually had a win!
Admin
OK, nearly first.
Admin
Define "properly normalized." You do know that normalization is anti-performance, right?
Admin
(Fail!)
Admin
Setting or adding a primary key should help just a tad. Doing lots of joins without a key ? Doesn't seem like its relational.
Admin
I've used databases before, of course, but I'm surprised by what a HUGE difference a few indices would make.
Admin
I'll be they managed to combine anti-performance and no normalization just fine, though.
Admin
And yet, even today the Ruby on Rails guys don't know what a index or a primary key is.
Admin
When I read [i]"There's no way it could've run that fast. I must've totally destroyed the production database."[i], I thought the DBA would then say: "Damn, I'm gonna have to restore the database from a backup, and revoke your database access until we find out what really happened."
Admin
We're getting taxed faster! Awesome! The Process be praised!
Admin
when it comes to the magic figure of millions of rows per table, indexing can be the difference between hours and seconds.
but as they say, seeing is believing and when you touch your hands on production grade data in excess of say 2 million rows, then you will know the difference.
Admin
Race past rebel base. Lt. Rat so startles able Bert's ape car.
Admin
This guy's heart was in the right place but it's clear that he's just a rookie. The real solution is to point out the fact the DB server is running on a 30Mhz processor, then say you can get them a deal on one of these. Don't work smarter, work harder!
Whoops, it was supposed to say 150Mhz. Fixed! -ed.
Admin
Indexs, Keys, who needs them. I tend to store my data in colums rather than rows. Easier to read.
Admin
Oh. Yes. Little Bobby Tables, we call him.
Admin
I would be tempted to type "DROP DATABASE database_name" and just leave it in the query window.
I wouldn't execute it, but just to give the DBA sitting there a heart attack.
Admin
I find lots of people don't like me, especially on the Web, but I've no idea why.
Admin
Clearly, the REAL WTF™ is that he was rewarded for fixing a problem.
Admin
Even having done this before (This database could have been the ERP database for an insurance company I used to work for), I'm still amazed at the performance gains you get through proper indexing and Primary/Foreign Key relationships.
And sadly, no longer surprised when I encounter a database with no keys, indexes, or defined relationships.
Admin
I bet you could throw a really good party with the money saved from sacking the "DBA".
Admin
For a moment, I thought this was a success story. Thanks for pointing that out.
Admin
At least this could be fixed with a few CREATE INDEX statements
I was expecting it to look like this: http://thedailywtf.com/Articles/The_Flat-File_Society.aspx
I bet he was sweating bullets waiting for those indexes to be generated. On my Athlon 2500+, generating an index (in SQLite) on a table with 1-2M rows takes 2-4 minutes.
Admin
Indexes: none
Primary keys: none
Didn't it say they had DBAs? Exactly what qualifies one to be a DBA if they don't even know enough to have something as basic as PKs and Indexes?
Admin
Admin
The difference indexing makes is pretty simple if you think about it. Indexing basically stores another table ordered by whatever column you indexed. With this, you can do a binary search, which is O(log n), which means searching 1,000,000 records takes only twice the time that searching 1000 records. Without indexing, the table will have to be reordered with every query. As far as I know, most databases use mergesort, which is O(n log n), then the binary search O(log n).
so, with 2 Million records in the table: indexed lg(2,000,000) ~ 21 instructions non-indexed 2,000,000 lg(2,000,000) + lg(2,000,000) = ~42,000,021 instructions
Admin
OTOH, we get to read about them here :)
Admin
Let's see, table scanning each 2M row table at least once, possibly many times. Possibly creating temp indexes or hash tables on any joined table. Versus a few index reads and table probes for the relevant records with proper indexes. And you are surprised by the difference?
Admin
Hence doing denormalization afterwards.
Admin
The real WTF is that they aren't using flat files to store the data instead of a slow database!
Admin
Sir, why do you hate America so much? You hate us for our normalized tables, isn't it?
Admin
"I would be tempted to type "DROP DATABASE database_name" and just leave it in the query window."
Yeah, and making sure that you don't somehow automatically press "Enter" after that instruction as you are used to do countless times a day.
Remind me not to let you near a live terminal, I prefer other types of excitement in my life..
Admin
I guess in the government, DBA stands for Dumb Blonde Attraction.
Admin
And you should be fired for such a stunt. It's a production system, you don't even type the command unless you're sure it's what you want to do. It's too easy to hit enter and before you know it, goodnight vienna to the system.
Admin
So what did the DBA do for his zillion dollars per hour?
Admin
Admin
only 150mhz each core, 4 cores. Something like an index on such a large database will DEFINITELY improve performance...
Admin
The IRS's main database is a huge entity called the 'Master Tax File' and consists of assembly language modules which were originally written in 1962 and have been continually patched ever since. He might be a genius at manipulating these tables without knowing anything about relational databases.
Admin
you know you're in trouble when you ask enterprise mangler to generate a db diagram, and you get a little row of ants, off into the distance...
Admin
OT, what is with all the dumb SQL injection comments in here? Are people really so ignorant of databases that they see more than a superficial connection between this story and those ones?
Admin
You know that "properly normalized" is pro-performance?
It's comments like yours that make real DBA's watch developers like a hawk. There is a reason we run Relational Database Engines everywhere these days, when you model the problem correctly and index it properly it runs very very well.
Admin
No, they use a simple linear search which is O(n). The engine will only sort if a sort has been specified in an ORDER BY clause. It may sort to do a join, or it may use hashing.
Far more important than the number of CPU instructions is the amount of disk I/O. An operation on a database is rarely CPU-bound - only if the data will fit in the processor cache. It may be memory-bandwidth-bound if the dataset fits in main memory and the data is already cached. Otherwise you're in the (relatively) glacial arena of disk.
Remember that even a 150MHz processor could manage to execute 42m instructions in well under a second if all the instructions and data fit in its caches! Even the Pentium was superscalar and pipelined, and able to execute simple arithmetic and logical instructions at more than one per clock cycle.
Programmers need to understand the hierarchy of bandwidths and latencies in a modern system. Watch Herb Sutter's presentation (will make most sense to C++ programmers, but managed/Java/scripting programmers should also take note).
Admin
Sooo true. It's hard to believe he was letting that slide. Was he/she using the lack of indexes, etc as a bargaining chip for the future?
Admin
Wow! A story where a developer got the credit they deserve, I almost wept a tear of joy!
Admin
You get a star for effort, good sir
Admin
Are you working with a POS system?
Admin
Bah... these are the DBA's who didn't know enough to build an index? I doubt they'd be phased by "DROP DATABASE" or even "DROP TABLE".
Admin
There's a trade-off to be made between putting something in a different table, and increasing the number of rows per page in the main table, but incurring increased I/O cost to do the join to put them back together for one row, and putting more related things in one table, decreasing the number of rows per page in this table and increasing the table scan cost, but reducing the cost when fetching one row.
Both 'fully normalized' and 'fully denormalized' are usually wrong. Somewhere in between is commonly necessary.
Admin
More importantly, Bobby's changes should have first been tested on a test server, tested, documented, handed over to the DB Admin, implemented on the beta server, re-tested and then implemented on production .
The Real WTF, is that they were experimenting with a production system.