Comment On Roll Your Own Clustered Index

It was a slow day for Levi C. His company had an excellent staff and meticulous development and testing procedures, so calls for maintenance were rare. He was beginning to feel like the Maytag repairman of maintenance programmers. Fortunately, he was delighted to get a call from one of his favorite clients. [expand full text]
« PrevPage 1 | Page 2Next »

Re: Roll Your Own Clustered Index

2007-02-20 13:34 • by my name is missing (unregistered)
This is a cluster**** of a different kind. How do people get paid to develop this sort of thing?

Re: Roll Your Own Clustered Index

2007-02-20 13:36 • by lefty (unregistered)
Those that don't understand the past are doomed to reinvent it badly. Obviously, rather than clustered indexes for dummies we have table partitioning for dummies. He is in good company however.

Re: Roll Your Own Clustered Index

2007-02-20 13:47 • by TheRider
It seems to me that these folks were trying to use the database just like the local filesystem. You know, a table equals a file, and we don't want to have files too big so they don't take forever to load in Notepad.exe. WTF indeed...

Re: Roll Your Own Clustered Index

2007-02-20 13:53 • by Beavis (unregistered)
Rule #1 for not creating more WTFs - do not make pre-emptive "optimizations"

Re: Roll Your Own Clustered Index

2007-02-20 14:01 • by kswanton
wayne:
Sounds as if MySQL was a WTF of it's own in this case.


Well not so sure I'd be blaming MySQL here... I think maybe the people who used MySQL incorrectly should receive the blame...

Re: Roll Your Own Clustered Index

2007-02-20 14:01 • by Red5
I love it.
Reading this WTF is kind of like watching Fox's COPS.
It makes you feel really good about yourself.

Re: Roll Your Own Clustered Index

2007-02-20 14:03 • by anon (unregistered)
MySQL does not have clustered indexes-

Re: Roll Your Own Clustered Index

2007-02-20 14:03 • by Ciarán (unregistered)
Clustering? Surely they just needed normalisation

Re: Roll Your Own Clustered Index

2007-02-20 14:07 • by finnn (unregistered)
Perhaps MySQL is not the strongest choice for many purposes, but it does not force you to do insane database structures either.

Re: Roll Your Own Clustered Index

2007-02-20 14:15 • by Antony Curtis (unregistered)
121522 in reply to 121518
anon:
MySQL does not have clustered indexes-

Yes it does, when you use InnoDB for your storage engine.

Re: Roll Your Own Clustered Index

2007-02-20 14:17 • by sir_flexalot
At least they didn't try to implement in Access... I can't even imagine how slow this must run, even with MySQL. I thought that had to be a typo, several thousand UNION's? just 1 UNION could be bad enough if the tables were large!

Re: Roll Your Own Clustered Index

2007-02-20 14:25 • by DustinMichaels
121524 in reply to 121507
If your database has dynamic tables then your design is probably screwed up.

Re: Roll Your Own Clustered Index

2007-02-20 14:44 • by drdamour
As crazy as this sounds, it's very close to the design that the wordpress mu team decided to go with: http://mu.wordpress.org/ Basically they have a set of 11 tables they create for every blog that is like [prefix]_[blogid]_[table name] instead of just using one table. To do any type of site wide analysis (like most recent posts) you have to do some crazy unions.

I asked them about this and the responses i got were that i didn't understand mysql and databases, and that backing up a blog was so much easier if you had a seperate table for every blog. They ALSO told me that it was MUCH faster to get the posts for a blog from a table that only contained posts for that blog.

Here's a google cache of the topic: http://72.14.209.104/search?q=cache:24MCgqZboDYJ:www.workfriendly.net/browse/Office2003Blue/http/mu.wordpress.org//forums/topic.php%3Fid%3D1834+drdamour+lyceum&hl=en&ct=clnk&cd=1&gl=us

I told them that they obviously didn't understand indexes. My topic was then deleted. Seriously check it out: You can tell it was manually deleted as my post was topic #1834 and the one before (http://mu.wordpress.org/forums/topic.php?id=1833) and the one after (http://mu.wordpress.org/forums/topic.php?id=1835) are still there.

Luckily the lyceum project looks to be more sane about this: http://lyceum.ibiblio.org/about/

Re: Roll Your Own Clustered Index

2007-02-20 14:50 • by barf indeedy... (unregistered)
that's very strange... sure they were not trying to use partitioning, perhaps with a clustered index existing on another column in these tables? [perhaps they went with a unique clustered index instead, and partitioned on customer due to volume of data, and perhaps MySQL has some sort of means to grab only those paritions related to the query at hand] Hell, does MySQL even support partitioning??!?

bleh. I'll go with WTF, FTW. But it does "sound" like someone was trying to parition the data, doesn't sound like they were rolling their own clustered index..

Re: Roll Your Own Clustered Index

2007-02-20 14:57 • by Johnny (unregistered)
Maybe they get paid by the table...

Re: Roll Your Own Clustered Index

2007-02-20 15:00 • by Some newbie (unregistered)
When designing a database, try to get it to at least first normal form. Although, that may be a lofty pie-in-the-sky dream for that particular system.

Re: Roll Your Own Clustered Index

2007-02-20 15:04 • by Pap
121535 in reply to 121526
drdamour:
As crazy as this sounds, it's very close to the design that the wordpress mu team decided to go with: http://mu.wordpress.org/ Basically they have a set of 11 tables they create for every blog that is like [prefix]_[blogid]_[table name] instead of just using one table. To do any type of site wide analysis (like most recent posts) you have to do some crazy unions.

I asked them about this and the responses i got were that i didn't understand mysql and databases, and that backing up a blog was so much easier if you had a seperate table for every blog. They ALSO told me that it was MUCH faster to get the posts for a blog from a table that only contained posts for that blog.

Here's a google cache of the topic: http://72.14.209.104/search?q=cache:24MCgqZboDYJ:www.workfriendly.net/browse/Office2003Blue/http/mu.wordpress.org//forums/topic.php%3Fid%3D1834+drdamour+lyceum&hl=en&ct=clnk&cd=1&gl=us

I told them that they obviously didn't understand indexes. My topic was then deleted. Seriously check it out: You can tell it was manually deleted as my post was topic #1834 and the one before (http://mu.wordpress.org/forums/topic.php?id=1833) and the one after (http://mu.wordpress.org/forums/topic.php?id=1835) are still there.

Luckily the lyceum project looks to be more sane about this: http://lyceum.ibiblio.org/about/


You can always file a bug report about it:

http://bugs.mysql.com/bug.php?id=22037

Re: Roll Your Own Clustered Index

2007-02-20 15:15 • by Your Name (unregistered)
121540 in reply to 121507
my name is missing:
This is a cluster**** of a different kind. How do people get paid to develop this sort of thing?


By the table, obviously.

Re: Roll Your Own Clustered Index

2007-02-20 15:18 • by snoofle
121542 in reply to 121540
I suppose we should be grateful that there wasn't some inner loop using a cursor to iterate over each row in each table. Or worse, that it somehow involved xml.

Re: Roll Your Own Clustered Index

2007-02-20 15:18 • by mbvlist
121543 in reply to 121526
drdamour:
As crazy as this sounds, it's very close to the design that the wordpress mu team decided to go with: http://mu.wordpress.org/ Basically they have a set of 11 tables they create for every blog that is like [prefix]_[blogid]_[table name] instead of just using one table. To do any type of site wide analysis (like most recent posts) you have to do some crazy unions.

I asked them about this and the responses i got were that i didn't understand mysql and databases, and that backing up a blog was so much easier if you had a seperate table for every blog. They ALSO told me that it was MUCH faster to get the posts for a blog from a table that only contained posts for that blog.

Here's a google cache of the topic: http://72.14.209.104/search?q=cache:24MCgqZboDYJ:www.workfriendly.net/browse/Office2003Blue/http/mu.wordpress.org//forums/topic.php%3Fid%3D1834+drdamour+lyceum&hl=en&ct=clnk&cd=1&gl=us

I told them that they obviously didn't understand indexes. My topic was then deleted. Seriously check it out: You can tell it was manually deleted as my post was topic #1834 and the one before (http://mu.wordpress.org/forums/topic.php?id=1833) and the one after (http://mu.wordpress.org/forums/topic.php?id=1835) are still there.

Luckily the lyceum project looks to be more sane about this: http://lyceum.ibiblio.org/about/


If you want to make fun with them, just ask why the database-function is the function that hits the memory limit while dumping the query on the stack :) Or why the entire thing is 10x slower than, for example, Joomla (which has a superset of functionality). You don't even have to start about the argument handling in the functions, just search the amount of 'memory limit'-topics on their site :P

Re: Roll Your Own Clustered Index

2007-02-20 15:19 • by aaron (unregistered)
I think Wordpress MU's structure grew out of the logical step of how people typically install multiple copies of wordpress on one DB. (Here at my work, we have a few blogs running on Wordpress standard, and it's simply 4 different sets of tables all in the same DB -- it just made more sense to do that, to encapsulate all of the blog tables into one DB).

Would it really be much more efficient to do a single wp_posts table and wp_users table (etc.) instead of wp_joe_posts / wp_joe_users, wp_john_posts / wp_john_users, etc? That is, is it slower / less optimal to have more tables in the long run? I had never really thought much into it before.

Re: Roll Your Own Clustered Index

2007-02-20 15:28 • by SomeCoder (unregistered)
121547 in reply to 121524
DustinMichaels:
If your database has dynamic tables then your design is probably screwed up.


Amen to that. Although there could concievably be cases where you need dynamic tables, I'd say generally, you don't.

Re: Roll Your Own Clustered Index

2007-02-20 15:28 • by DaDon (unregistered)
This WTF is just perfect on a perfect day. I'm teaching the last night of a DBM380 class and this WTF will make the perfect closing discussion for the class on how to really screw things up because you're not smart enough to look up Best Practices.

CAPTCHA: doom - as in my career if I tried a design like this.

Re: Roll Your Own Clustered Index

2007-02-20 15:36 • by Tukaro
[cries softly]

Re: Roll Your Own Clustered Index

2007-02-20 15:39 • by dustin (unregistered)
Is DeVry offering DBA degrees now? Or are these dbas from that online University of Pheonix thing?

Re: Roll Your Own Clustered Index

2007-02-20 15:49 • by DaDon (unregistered)
121551 in reply to 121550
This is a UOP class. The program definately is not deep enough for a dba, but I hope I'm passing enough knowledge so that designs like this WTF are seriously curtailed.


Re: Roll Your Own Clustered Index

2007-02-20 16:03 • by Steve B (unregistered)
121552 in reply to 121517
Red5:
I love it.
Reading this WTF is kind of like watching Fox's COPS.
It makes you feel really good about yourself.


Hell yeah!

yeah!!

YEEEEEEEEEEEEEEEAAAAAAAAAAAAAAHHHHH BABY!!!!

Re: Roll Your Own Clustered Index

2007-02-20 16:44 • by rogthefrog
121560 in reply to 121526
drdamour:
As crazy as this sounds, it's very close to the design that the wordpress mu team decided to go with: http://mu.wordpress.org/ Basically they have a set of 11 tables they create for every blog that is like [prefix]_[blogid]_[table name] instead of just using one table. To do any type of site wide analysis (like most recent posts) you have to do some crazy unions.

I asked them about this and the responses i got were that i didn't understand mysql and databases, and that backing up a blog was so much easier if you had a seperate table for every blog. They ALSO told me that it was MUCH faster to get the posts for a blog from a table that only contained posts for that blog.

Here's a google cache of the topic: http://72.14.209.104/search?q=cache:24MCgqZboDYJ:www.workfriendly.net/browse/Office2003Blue/http/mu.wordpress.org//forums/topic.php%3Fid%3D1834+drdamour+lyceum&hl=en&ct=clnk&cd=1&gl=us

I told them that they obviously didn't understand indexes. My topic was then deleted.


If it makes you feel any better, Wordpress programmers also don't seem to know anything about programming. The front-end code is a disgrace, and their rich editor is a buggy piece of garbage they should be ashamed of.

Re: Roll Your Own Clustered Index

2007-02-20 16:46 • by Aaron (unregistered)
121561 in reply to 121544
aaron:
That is, is it slower / less optimal to have more tables in the long run? I had never really thought much into it before.

It's definitely slower. There are some rare cases where a table gets so ridiculously massive that you need to partition it, but that partitioning is done across drives or servers so the operation becomes parallelizable - and even then it's almost always put back together with an indexed view so it "looks" like a single table. It's never the goal to have logical separation of relational data, even if it has to be physically separated for some reason.

As for Wordpress, DB tables for a blog are nowhere near this kind of size; for partitioning we're talking about tables that are several GB or even a few hundred GB, to the point where the index itself is so large that it takes forever to search. Unless you need to split the load across servers or disks, you should never do manual partitioning; this is precisely what the database was designed to do, and it's way better at it than you or I.

Re: Roll Your Own Clustered Index

2007-02-20 16:59 • by Carl T (unregistered)
121563 in reply to 121547
SomeCoder:
DustinMichaels:
If your database has dynamic tables then your design is probably screwed up.

Amen to that. Although there could concievably be cases where you need dynamic tables, I'd say generally, you don't.

I believe I've worked with one of those cases. Microarray data. One set of data is maybe a few hundred times fifty thousand rows times a few column. Tens or hundreds of these in an experiment, mostly with the same columns. Then maybe a thousand experiments, usually (but not necessarily) with the same set of columns again. Making one table per experiment gave us lots of gigabyte-sized files rather than one huge lump (using MySQL with one file per table). If nothing else, incremental backups with a granularity of whole files is a lot easier this way, as is deleting whole experiments.
We *could* of course have done it the proper way, but then again we *could* have had more time and money, etc.

Re: Roll Your Own Clustered Index

2007-02-20 17:02 • by Rank Amateur
I don't understand how this can even work in the real world. What if more than one thing happens for a given customer in a given month? How will you fit that in one table?

By having separate tables for each customer and month, they've achieved the first abnormal form. But note all tables have the same schema. That's wrong. To achieve the second abnormal form, adding an event to the table should add a column.

Having achieved the second abnormal form, it's easy enough to implement the third abnormal form: each event column contains a comma-delimited list of attributes for the event.

In the fourth abnormal form, the attributes are not values but file names of files that give the values.

In the fifth abnormal form, the files are snippets of php that generate or calculate the values, returning XML.

--Rank

Re: Roll Your Own Clustered Index

2007-02-20 17:32 • by Wene Gerchinko (unregistered)
Those who don't understand the POST are doomed to repeat it.

Re: Roll Your Own Clustered Index

2007-02-20 17:35 • by anonymous (unregistered)
121572 in reply to 121564
sixth abnormal form: those snippets are generated by a single large perl script.

Re: Roll Your Own Clustered Index

2007-02-20 17:49 • by Satanicpuppy
121573 in reply to 121523
sir_flexalot:
At least they didn't try to implement in Access... I can't even imagine how slow this must run, even with MySQL. I thought that had to be a typo, several thousand UNION's? just 1 UNION could be bad enough if the tables were large!


Sure...One union per customer, per month.

Complete cockup. One customer table, one product table (probably with some property subtables) one order table, query based on date ranges. End of story.

If you get zillions of orders a month, maybe dump the old orders table to a "Orders_2006" or whatever at the end of the year...If you're using MyISAM in mysql you're limited to 2^32 records, but if you're using INNOdb, there is no limit outside of the one imposed by your file system...Unless you were stupid enough to do something like load image data into individual rows of a table with billions of lines.

Stuff like this is almost always a result of bad design. I work with commerical database products all the time, and they may come with a hundred tables, but chances are, you only use 20.

Re: Roll Your Own Clustered Index

2007-02-20 18:02 • by newt0311 (unregistered)
121575 in reply to 121524
DustinMichaels:
If your database has dynamic tables then your design is probably screwed up.

No, if your design has dynamic tables, you can be damn sure that your design is screwed up. Dynamic tables violate the entire intent of RM. The people who developed this system shoud be shot, hanged, and publicly beaten.

Re: Roll Your Own Clustered Index

2007-02-20 18:03 • by aikii
Like anyone I learned about normalized tables at school, but in a previous job, we had the need for a data warehouse. So we had a training about redbrick ( data warehousing software from informix, now owned by IBM ) and we were taught that in this case it's usual to have a massively indexed, redundant and read-only database because it's only used for reporting, and it's optimal this way.

New records were added by batches importing fresh data from production database. There were even tables split by month, implying several unions in some reports.

So I don't know if using mysql is a WTF here. If the production database is split that way that's indeed stupid, but apparently the system described here does only reports, new data being imported each month ( again, ok, that's a WTF if it's not automated and documented ). But either I was told bullshit by an IBM trainer or I don't get the point.

Re: Roll Your Own Clustered Index

2007-02-20 18:45 • by danixdefcon5
121580 in reply to 121517
Red5:
I love it.
Reading this WTF is kind of like watching Fox's COPS.
It makes you feel really good about yourself.


Bad tables, bad tables,
whatcha gonna doooo
whatcha gonna doooo when DBA comes for you!!!

Man, now I will get that song stuck into my head for weeks.

BTW, MySQL defaulting to MyISAM is one of the reasons I switched back to PostgreSQL. The other one was the mySQL team bragging about why they chose not to implement transactions, referential integrity and such. Funny thing: somewhere between 2002 and 2006, the mySQL team whisked away every single comment like that from its documentation; even from the older releases. Maybe they grew up??

Re: Roll Your Own Clustered Index

2007-02-20 18:53 • by Ghost Ware Wizard
aaaaaaaaaaaaaaaaaaaaaaaaa
the idiocracy is alive and well....someone, ahem highly paid *management* should know better....

<captcha: idiots are everywhere they keep following me around/>

Re: Roll Your Own Clustered Index

2007-02-20 18:56 • by Terminator (unregistered)
OMFG .... WTF ??? Obviously these people dont understand what a relational database is or have never read any books ..... or any documentation on basic table design ... or have ever spoken to a real database programmer..

Re: Roll Your Own Clustered Index

2007-02-20 18:59 • by grg (unregistered)
121583 in reply to 121517
Red5:
I love it.
Reading this WTF is kind of like watching Fox's COPS.
It makes you feel really good about yourself.


And you can feel your IQ dropping...

Re: Roll Your Own Clustered Index

2007-02-20 20:16 • by htg (unregistered)
As someone who isn't a DBA, but has had to do quite a bit of DB design and programming in the past year, I've been proud of compressing as many things into the fewest set of tables (without duplicating data needlessly, no over-the-top normalisation here) as possible.

The entire concept of having a core schema that is duplicated EVERY MONTH makes me want to kill someone. I've seen schemas (and databases) that are duplicated each year, and they make me cry. How much effort would that year int NOT NULL field have cost you? Or maybe a table of years? Do you even understand indexes - they're not rocket science!

A design for one person/one year can be extended to multiple people trivially without needing per-user/year copies of the schema with tables prefixed with the username. Did the programmer not stop and think "Wait a minute, this seems wrong, let me think about this" once?

It's called up-front-design. Get your requirements. Design. It works, X tables, not nX tables.

How far can you take it? 1 million users? 20 tables per user? 20 million tables! Each year? 10 years? 200 million tables? Don't people think that maybe just the 20 tables would have been enough, with a couple of extra fields to identify the user and year? No more complex unions across tables holding the same data just to find that friend-of-a-friend.

Oh, where's my gun. I'm going on a rampage.

Re: Roll Your Own Clustered Index

2007-02-20 20:59 • by Florent (unregistered)
121589 in reply to 121526
drdamour:
I asked them about this and the responses i got were that i didn't understand mysql and databases, and that backing up a blog was so much easier if you had a seperate table for every blog. They ALSO told me that it was MUCH faster to get the posts for a blog from a table that only contained posts for that blog.

Here's a google cache of the topic: http://72.14.209.104/search?q=cache:24MCgqZboDYJ:www.workfriendly.net/browse/Office2003Blue/http/mu.wordpress.org//forums/topic.php%3Fid%3D1834+drdamour+lyceum&hl=en&ct=clnk&cd=1&gl=us


I especially loved the part where some guy tried to prove its point (zillions of small tables are faster than one big table) with some maths :

How would:

finding your table = lg(32000) PLUS

finding your info = lg(m)

for m = number of elements in a user table

compare to

finding your info in a big table lg(n)

for n = 32000 * m


Well, let's just say that a very basic property of logarithms allows me to write that lg(32000 * m) = lg(32000) + lg(m) ... Nice try, though !

CAPTCHA : muhahaha - how appropriate

Re: Roll Your Own Clustered Index

2007-02-20 21:03 • by triso
121590 in reply to 121582
Terminator:
OMFG .... WTF ??? Obviously these people dont understand what a relational database is or have never read any books ..... or any documentation on basic table design ... or have ever spoken to a real database programmer..
Well, the problem is that these things work okay with a few hundred records but it does not scale.

Re: Roll Your Own Clustered Index

2007-02-20 21:18 • by TK (unregistered)
After reading this site for a couple of months, I have started to think that I should learn something about databases. Can anyone recommend a good book or website? After reading it, I would like to know what things like clustered index and normal form means. If it explains SQL from the start that would be great too. I would prefer if it's either not about a particular db, or about MySQL or Postgres, as I am unlikely to get my hand on a more expensive db any time soon. I have tried to look around on Wikipedia, for for some reason the articles about databases are very poor.

Re: Roll Your Own Clustered Index

2007-02-20 21:39 • by mountain (unregistered)
121594 in reply to 121592
I'm not sure how easy or hard this is to swallow for someone who's never used a DBMS before, but here you go: http://www.amazon.com/Introduction-Database-Systems-C-Date/dp/0201385902

Re: Roll Your Own Clustered Index

2007-02-20 23:34 • by Pap

Re: Roll Your Own Clustered Index

2007-02-21 00:49 • by Joseph Newton (unregistered)
121602 in reply to 121580
Actually, I wdwonloaded MySQL 5 in March, 2006, and though it does have InnoDB, the documentation still does a lot of whining about how you really don't need transactions.

At least they are available now in MySQL, so I use it.

<captcha gygax is a "word"?!? Now that is truly a WTF!/>

Re: Roll Your Own Clustered Index

2007-02-21 01:16 • by Duff (unregistered)
121604 in reply to 121592
TK:
After reading this site for a couple of months, I have started to think that I should learn something about databases. Can anyone recommend a good book or website?


IMHO, deeply understanding good database design is one of the three or so things that justifies getting a quality formal computer science education. Seriously -- people who teach themselves get it wrong much too much of the time.

(Also, if you're only going to learn MySQL or PostgreSQL, learn PostgreSQL; it's much closer to Oracle in terms of the set of available features which have been supported long enough to have them exhaustively covered in even somewhat aging literature, whereas MySQL has most serious database features only as comparatively recent additions -- and if you don't learn to use SQL's more serious features, you won't understand what they're good for; Paul Graham's argument about the differences in power between programming languages and the impact of such on how coders think when using them applies).

Re: Roll Your Own Clustered Index

2007-02-21 01:19 • by lofwyr
121605 in reply to 121592
TK:
I would prefer if it's either not about a particular db, or about MySQL or Postgres, as I am unlikely to get my hand on a more expensive db any time soon.


Today, you'll have the choice of getting "express" versions of commercial products from companies such as IBM, Oracle, Sybase or Microsoft for free. Aside from their limitations regarding storage, memory and feature set - compared to their enterprise versions - they're a good starting point. And companies like Oracle offer you the option to download also enterprise versions for prototype development.

l.

Re: Roll Your Own Clustered Index

2007-02-21 02:14 • by Arancaytar
121608 in reply to 121507
Wow. This is actually an example where I'd recommend they use flat files. At least those are easy to understand and hard to mess up, if one has no knowledge whatsoever of how relational databases and normalization work.
« PrevPage 1 | Page 2Next »

Add Comment