- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
This is a cluster**** of a different kind. How do people get paid to develop this sort of thing?
Admin
Those that don't understand the past are doomed to reinvent it badly. Obviously, rather than clustered indexes for dummies we have table partitioning for dummies. He is in good company however.
Admin
It seems to me that these folks were trying to use the database just like the local filesystem. You know, a table equals a file, and we don't want to have files too big so they don't take forever to load in Notepad.exe. WTF indeed...
Admin
Rule #1 for not creating more WTFs - do not make pre-emptive "optimizations"
Admin
Well not so sure I'd be blaming MySQL here... I think maybe the people who used MySQL incorrectly should receive the blame...
Admin
I love it. Reading this WTF is kind of like watching Fox's COPS. It makes you feel really good about yourself.
Admin
MySQL does not have clustered indexes-
Admin
Clustering? Surely they just needed normalisation
Admin
Perhaps MySQL is not the strongest choice for many purposes, but it does not force you to do insane database structures either.
Admin
Admin
At least they didn't try to implement in Access... I can't even imagine how slow this must run, even with MySQL. I thought that had to be a typo, several thousand UNION's? just 1 UNION could be bad enough if the tables were large!
Admin
If your database has dynamic tables then your design is probably screwed up.
Admin
As crazy as this sounds, it's very close to the design that the wordpress mu team decided to go with: http://mu.wordpress.org/ Basically they have a set of 11 tables they create for every blog that is like [prefix][blogid][table name] instead of just using one table. To do any type of site wide analysis (like most recent posts) you have to do some crazy unions.
I asked them about this and the responses i got were that i didn't understand mysql and databases, and that backing up a blog was so much easier if you had a seperate table for every blog. They ALSO told me that it was MUCH faster to get the posts for a blog from a table that only contained posts for that blog.
Here's a google cache of the topic: http://72.14.209.104/search?q=cache:24MCgqZboDYJ:www.workfriendly.net/browse/Office2003Blue/http/mu.wordpress.org//forums/topic.php%3Fid%3D1834+drdamour+lyceum&hl=en&ct=clnk&cd=1&gl=us
I told them that they obviously didn't understand indexes. My topic was then deleted. Seriously check it out: You can tell it was manually deleted as my post was topic #1834 and the one before (http://mu.wordpress.org/forums/topic.php?id=1833) and the one after (http://mu.wordpress.org/forums/topic.php?id=1835) are still there.
Luckily the lyceum project looks to be more sane about this: http://lyceum.ibiblio.org/about/
Admin
that's very strange... sure they were not trying to use partitioning, perhaps with a clustered index existing on another column in these tables? [perhaps they went with a unique clustered index instead, and partitioned on customer due to volume of data, and perhaps MySQL has some sort of means to grab only those paritions related to the query at hand] Hell, does MySQL even support partitioning??!?
bleh. I'll go with WTF, FTW. But it does "sound" like someone was trying to parition the data, doesn't sound like they were rolling their own clustered index..
Admin
Maybe they get paid by the table...
Admin
When designing a database, try to get it to at least first normal form. Although, that may be a lofty pie-in-the-sky dream for that particular system.
Admin
You can always file a bug report about it:
http://bugs.mysql.com/bug.php?id=22037
Admin
By the table, obviously.
Admin
I suppose we should be grateful that there wasn't some inner loop using a cursor to iterate over each row in each table. Or worse, that it somehow involved xml.
Admin
If you want to make fun with them, just ask why the database-function is the function that hits the memory limit while dumping the query on the stack :) Or why the entire thing is 10x slower than, for example, Joomla (which has a superset of functionality). You don't even have to start about the argument handling in the functions, just search the amount of 'memory limit'-topics on their site :P
Admin
I think Wordpress MU's structure grew out of the logical step of how people typically install multiple copies of wordpress on one DB. (Here at my work, we have a few blogs running on Wordpress standard, and it's simply 4 different sets of tables all in the same DB -- it just made more sense to do that, to encapsulate all of the blog tables into one DB).
Would it really be much more efficient to do a single wp_posts table and wp_users table (etc.) instead of wp_joe_posts / wp_joe_users, wp_john_posts / wp_john_users, etc? That is, is it slower / less optimal to have more tables in the long run? I had never really thought much into it before.
Admin
Amen to that. Although there could concievably be cases where you need dynamic tables, I'd say generally, you don't.
Admin
This WTF is just perfect on a perfect day. I'm teaching the last night of a DBM380 class and this WTF will make the perfect closing discussion for the class on how to really screw things up because you're not smart enough to look up Best Practices.
CAPTCHA: doom - as in my career if I tried a design like this.
Admin
[cries softly]
Admin
Is DeVry offering DBA degrees now? Or are these dbas from that online University of Pheonix thing?
Admin
This is a UOP class. The program definately is not deep enough for a dba, but I hope I'm passing enough knowledge so that designs like this WTF are seriously curtailed.
Admin
Hell yeah!
yeah!!
YEEEEEEEEEEEEEEEAAAAAAAAAAAAAAHHHHH BABY!!!!
Admin
If it makes you feel any better, Wordpress programmers also don't seem to know anything about programming. The front-end code is a disgrace, and their rich editor is a buggy piece of garbage they should be ashamed of.
Admin
As for Wordpress, DB tables for a blog are nowhere near this kind of size; for partitioning we're talking about tables that are several GB or even a few hundred GB, to the point where the index itself is so large that it takes forever to search. Unless you need to split the load across servers or disks, you should never do manual partitioning; this is precisely what the database was designed to do, and it's way better at it than you or I.
Admin
Admin
I don't understand how this can even work in the real world. What if more than one thing happens for a given customer in a given month? How will you fit that in one table?
By having separate tables for each customer and month, they've achieved the first abnormal form. But note all tables have the same schema. That's wrong. To achieve the second abnormal form, adding an event to the table should add a column.
Having achieved the second abnormal form, it's easy enough to implement the third abnormal form: each event column contains a comma-delimited list of attributes for the event.
In the fourth abnormal form, the attributes are not values but file names of files that give the values.
In the fifth abnormal form, the files are snippets of php that generate or calculate the values, returning XML.
--Rank
Admin
Those who don't understand the POST are doomed to repeat it.
Admin
sixth abnormal form: those snippets are generated by a single large perl script.
Admin
Sure...One union per customer, per month.
Complete cockup. One customer table, one product table (probably with some property subtables) one order table, query based on date ranges. End of story.
If you get zillions of orders a month, maybe dump the old orders table to a "Orders_2006" or whatever at the end of the year...If you're using MyISAM in mysql you're limited to 2^32 records, but if you're using INNOdb, there is no limit outside of the one imposed by your file system...Unless you were stupid enough to do something like load image data into individual rows of a table with billions of lines.
Stuff like this is almost always a result of bad design. I work with commerical database products all the time, and they may come with a hundred tables, but chances are, you only use 20.
Admin
Admin
Like anyone I learned about normalized tables at school, but in a previous job, we had the need for a data warehouse. So we had a training about redbrick ( data warehousing software from informix, now owned by IBM ) and we were taught that in this case it's usual to have a massively indexed, redundant and read-only database because it's only used for reporting, and it's optimal this way.
New records were added by batches importing fresh data from production database. There were even tables split by month, implying several unions in some reports.
So I don't know if using mysql is a WTF here. If the production database is split that way that's indeed stupid, but apparently the system described here does only reports, new data being imported each month ( again, ok, that's a WTF if it's not automated and documented ). But either I was told bullshit by an IBM trainer or I don't get the point.
Admin
Bad tables, bad tables, whatcha gonna doooo whatcha gonna doooo when DBA comes for you!!!
Man, now I will get that song stuck into my head for weeks.
BTW, MySQL defaulting to MyISAM is one of the reasons I switched back to PostgreSQL. The other one was the mySQL team bragging about why they chose not to implement transactions, referential integrity and such. Funny thing: somewhere between 2002 and 2006, the mySQL team whisked away every single comment like that from its documentation; even from the older releases. Maybe they grew up??
Admin
aaaaaaaaaaaaaaaaaaaaaaaaa the idiocracy is alive and well....someone, ahem highly paid management should know better....
<captcha: idiots are everywhere they keep following me around/>
Admin
OMFG .... WTF ??? Obviously these people dont understand what a relational database is or have never read any books ..... or any documentation on basic table design ... or have ever spoken to a real database programmer..
Admin
And you can feel your IQ dropping...
Admin
As someone who isn't a DBA, but has had to do quite a bit of DB design and programming in the past year, I've been proud of compressing as many things into the fewest set of tables (without duplicating data needlessly, no over-the-top normalisation here) as possible.
The entire concept of having a core schema that is duplicated EVERY MONTH makes me want to kill someone. I've seen schemas (and databases) that are duplicated each year, and they make me cry. How much effort would that year int NOT NULL field have cost you? Or maybe a table of years? Do you even understand indexes - they're not rocket science!
A design for one person/one year can be extended to multiple people trivially without needing per-user/year copies of the schema with tables prefixed with the username. Did the programmer not stop and think "Wait a minute, this seems wrong, let me think about this" once?
It's called up-front-design. Get your requirements. Design. It works, X tables, not nX tables.
How far can you take it? 1 million users? 20 tables per user? 20 million tables! Each year? 10 years? 200 million tables? Don't people think that maybe just the 20 tables would have been enough, with a couple of extra fields to identify the user and year? No more complex unions across tables holding the same data just to find that friend-of-a-friend.
Oh, where's my gun. I'm going on a rampage.
Admin
I especially loved the part where some guy tried to prove its point (zillions of small tables are faster than one big table) with some maths :
Well, let's just say that a very basic property of logarithms allows me to write that lg(32000 * m) = lg(32000) + lg(m) ... Nice try, though !
CAPTCHA : muhahaha - how appropriate
Admin
Admin
After reading this site for a couple of months, I have started to think that I should learn something about databases. Can anyone recommend a good book or website? After reading it, I would like to know what things like clustered index and normal form means. If it explains SQL from the start that would be great too. I would prefer if it's either not about a particular db, or about MySQL or Postgres, as I am unlikely to get my hand on a more expensive db any time soon. I have tried to look around on Wikipedia, for for some reason the articles about databases are very poor.
Admin
I'm not sure how easy or hard this is to swallow for someone who's never used a DBMS before, but here you go: http://www.amazon.com/Introduction-Database-Systems-C-Date/dp/0201385902
Admin
http://en.wikipedia.org/wiki/Database_normalization
Admin
Actually, I wdwonloaded MySQL 5 in March, 2006, and though it does have InnoDB, the documentation still does a lot of whining about how you really don't need transactions.
At least they are available now in MySQL, so I use it.
<captcha gygax is a "word"?!? Now that is truly a WTF!/>
Admin
IMHO, deeply understanding good database design is one of the three or so things that justifies getting a quality formal computer science education. Seriously -- people who teach themselves get it wrong much too much of the time.
(Also, if you're only going to learn MySQL or PostgreSQL, learn PostgreSQL; it's much closer to Oracle in terms of the set of available features which have been supported long enough to have them exhaustively covered in even somewhat aging literature, whereas MySQL has most serious database features only as comparatively recent additions -- and if you don't learn to use SQL's more serious features, you won't understand what they're good for; Paul Graham's argument about the differences in power between programming languages and the impact of such on how coders think when using them applies).
Admin
Today, you'll have the choice of getting "express" versions of commercial products from companies such as IBM, Oracle, Sybase or Microsoft for free. Aside from their limitations regarding storage, memory and feature set - compared to their enterprise versions - they're a good starting point. And companies like Oracle offer you the option to download also enterprise versions for prototype development.
l.
Admin
Wow. This is actually an example where I'd recommend they use flat files. At least those are easy to understand and hard to mess up, if one has no knowledge whatsoever of how relational databases and normalization work.