• my name is missing (unregistered)

    This is a cluster**** of a different kind. How do people get paid to develop this sort of thing?

  • lefty (unregistered)

    Those that don't understand the past are doomed to reinvent it badly. Obviously, rather than clustered indexes for dummies we have table partitioning for dummies. He is in good company however.

  • (cs)

    It seems to me that these folks were trying to use the database just like the local filesystem. You know, a table equals a file, and we don't want to have files too big so they don't take forever to load in Notepad.exe. WTF indeed...

  • Beavis (unregistered)

    Rule #1 for not creating more WTFs - do not make pre-emptive "optimizations"

  • (cs)
    wayne:
    Sounds as if MySQL was a WTF of it's own in this case.

    Well not so sure I'd be blaming MySQL here... I think maybe the people who used MySQL incorrectly should receive the blame...

  • (cs)

    I love it. Reading this WTF is kind of like watching Fox's COPS. It makes you feel really good about yourself.

  • anon (unregistered)

    MySQL does not have clustered indexes-

  • Ciarán (unregistered)

    Clustering? Surely they just needed normalisation

  • finnn (unregistered)

    Perhaps MySQL is not the strongest choice for many purposes, but it does not force you to do insane database structures either.

  • Antony Curtis (unregistered) in reply to anon
    anon:
    MySQL does not have clustered indexes-
    Yes it does, when you use InnoDB for your storage engine.
  • (cs)

    At least they didn't try to implement in Access... I can't even imagine how slow this must run, even with MySQL. I thought that had to be a typo, several thousand UNION's? just 1 UNION could be bad enough if the tables were large!

  • (cs) in reply to my name is missing

    If your database has dynamic tables then your design is probably screwed up.

  • (cs)

    As crazy as this sounds, it's very close to the design that the wordpress mu team decided to go with: http://mu.wordpress.org/ Basically they have a set of 11 tables they create for every blog that is like [prefix][blogid][table name] instead of just using one table. To do any type of site wide analysis (like most recent posts) you have to do some crazy unions.

    I asked them about this and the responses i got were that i didn't understand mysql and databases, and that backing up a blog was so much easier if you had a seperate table for every blog. They ALSO told me that it was MUCH faster to get the posts for a blog from a table that only contained posts for that blog.

    Here's a google cache of the topic: http://72.14.209.104/search?q=cache:24MCgqZboDYJ:www.workfriendly.net/browse/Office2003Blue/http/mu.wordpress.org//forums/topic.php%3Fid%3D1834+drdamour+lyceum&hl=en&ct=clnk&cd=1&gl=us

    I told them that they obviously didn't understand indexes. My topic was then deleted. Seriously check it out: You can tell it was manually deleted as my post was topic #1834 and the one before (http://mu.wordpress.org/forums/topic.php?id=1833) and the one after (http://mu.wordpress.org/forums/topic.php?id=1835) are still there.

    Luckily the lyceum project looks to be more sane about this: http://lyceum.ibiblio.org/about/

  • barf indeedy... (unregistered)

    that's very strange... sure they were not trying to use partitioning, perhaps with a clustered index existing on another column in these tables? [perhaps they went with a unique clustered index instead, and partitioned on customer due to volume of data, and perhaps MySQL has some sort of means to grab only those paritions related to the query at hand] Hell, does MySQL even support partitioning??!?

    bleh. I'll go with WTF, FTW. But it does "sound" like someone was trying to parition the data, doesn't sound like they were rolling their own clustered index..

  • Johnny (unregistered)

    Maybe they get paid by the table...

  • Some newbie (unregistered)

    When designing a database, try to get it to at least first normal form. Although, that may be a lofty pie-in-the-sky dream for that particular system.

  • (cs) in reply to drdamour
    drdamour:
    As crazy as this sounds, it's very close to the design that the wordpress mu team decided to go with: http://mu.wordpress.org/ Basically they have a set of 11 tables they create for every blog that is like [prefix]_[blogid]_[table name] instead of just using one table. To do any type of site wide analysis (like most recent posts) you have to do some crazy unions.

    I asked them about this and the responses i got were that i didn't understand mysql and databases, and that backing up a blog was so much easier if you had a seperate table for every blog. They ALSO told me that it was MUCH faster to get the posts for a blog from a table that only contained posts for that blog.

    Here's a google cache of the topic: http://72.14.209.104/search?q=cache:24MCgqZboDYJ:www.workfriendly.net/browse/Office2003Blue/http/mu.wordpress.org//forums/topic.php%3Fid%3D1834+drdamour+lyceum&hl=en&ct=clnk&cd=1&gl=us

    I told them that they obviously didn't understand indexes. My topic was then deleted. Seriously check it out: You can tell it was manually deleted as my post was topic #1834 and the one before (http://mu.wordpress.org/forums/topic.php?id=1833) and the one after (http://mu.wordpress.org/forums/topic.php?id=1835) are still there.

    Luckily the lyceum project looks to be more sane about this: http://lyceum.ibiblio.org/about/

    You can always file a bug report about it:

    http://bugs.mysql.com/bug.php?id=22037

  • Your Name (unregistered) in reply to my name is missing
    my name is missing:
    This is a cluster**** of a different kind. How do people get paid to develop this sort of thing?

    By the table, obviously.

  • (cs) in reply to Your Name

    I suppose we should be grateful that there wasn't some inner loop using a cursor to iterate over each row in each table. Or worse, that it somehow involved xml.

  • (cs) in reply to drdamour
    drdamour:
    As crazy as this sounds, it's very close to the design that the wordpress mu team decided to go with: http://mu.wordpress.org/ Basically they have a set of 11 tables they create for every blog that is like [prefix]_[blogid]_[table name] instead of just using one table. To do any type of site wide analysis (like most recent posts) you have to do some crazy unions.

    I asked them about this and the responses i got were that i didn't understand mysql and databases, and that backing up a blog was so much easier if you had a seperate table for every blog. They ALSO told me that it was MUCH faster to get the posts for a blog from a table that only contained posts for that blog.

    Here's a google cache of the topic: http://72.14.209.104/search?q=cache:24MCgqZboDYJ:www.workfriendly.net/browse/Office2003Blue/http/mu.wordpress.org//forums/topic.php%3Fid%3D1834+drdamour+lyceum&hl=en&ct=clnk&cd=1&gl=us

    I told them that they obviously didn't understand indexes. My topic was then deleted. Seriously check it out: You can tell it was manually deleted as my post was topic #1834 and the one before (http://mu.wordpress.org/forums/topic.php?id=1833) and the one after (http://mu.wordpress.org/forums/topic.php?id=1835) are still there.

    Luckily the lyceum project looks to be more sane about this: http://lyceum.ibiblio.org/about/

    If you want to make fun with them, just ask why the database-function is the function that hits the memory limit while dumping the query on the stack :) Or why the entire thing is 10x slower than, for example, Joomla (which has a superset of functionality). You don't even have to start about the argument handling in the functions, just search the amount of 'memory limit'-topics on their site :P

  • aaron (unregistered)

    I think Wordpress MU's structure grew out of the logical step of how people typically install multiple copies of wordpress on one DB. (Here at my work, we have a few blogs running on Wordpress standard, and it's simply 4 different sets of tables all in the same DB -- it just made more sense to do that, to encapsulate all of the blog tables into one DB).

    Would it really be much more efficient to do a single wp_posts table and wp_users table (etc.) instead of wp_joe_posts / wp_joe_users, wp_john_posts / wp_john_users, etc? That is, is it slower / less optimal to have more tables in the long run? I had never really thought much into it before.

  • SomeCoder (unregistered) in reply to DustinMichaels
    DustinMichaels:
    If your database has dynamic tables then your design is probably screwed up.

    Amen to that. Although there could concievably be cases where you need dynamic tables, I'd say generally, you don't.

  • DaDon (unregistered)

    This WTF is just perfect on a perfect day. I'm teaching the last night of a DBM380 class and this WTF will make the perfect closing discussion for the class on how to really screw things up because you're not smart enough to look up Best Practices.

    CAPTCHA: doom - as in my career if I tried a design like this.

  • (cs)

    [cries softly]

  • dustin (unregistered)

    Is DeVry offering DBA degrees now? Or are these dbas from that online University of Pheonix thing?

  • DaDon (unregistered) in reply to dustin

    This is a UOP class. The program definately is not deep enough for a dba, but I hope I'm passing enough knowledge so that designs like this WTF are seriously curtailed.

  • Steve B (unregistered) in reply to Red5
    Red5:
    I love it. Reading this WTF is kind of like watching Fox's COPS. It makes you feel really good about yourself.

    Hell yeah!

    yeah!!

    YEEEEEEEEEEEEEEEAAAAAAAAAAAAAAHHHHH BABY!!!!

  • (cs) in reply to drdamour
    drdamour:
    As crazy as this sounds, it's very close to the design that the wordpress mu team decided to go with: http://mu.wordpress.org/ Basically they have a set of 11 tables they create for every blog that is like [prefix]_[blogid]_[table name] instead of just using one table. To do any type of site wide analysis (like most recent posts) you have to do some crazy unions.

    I asked them about this and the responses i got were that i didn't understand mysql and databases, and that backing up a blog was so much easier if you had a seperate table for every blog. They ALSO told me that it was MUCH faster to get the posts for a blog from a table that only contained posts for that blog.

    Here's a google cache of the topic: http://72.14.209.104/search?q=cache:24MCgqZboDYJ:www.workfriendly.net/browse/Office2003Blue/http/mu.wordpress.org//forums/topic.php%3Fid%3D1834+drdamour+lyceum&hl=en&ct=clnk&cd=1&gl=us

    I told them that they obviously didn't understand indexes. My topic was then deleted.

    If it makes you feel any better, Wordpress programmers also don't seem to know anything about programming. The front-end code is a disgrace, and their rich editor is a buggy piece of garbage they should be ashamed of.

  • Aaron (unregistered) in reply to aaron
    aaron:
    That is, is it slower / less optimal to have more tables in the long run? I had never really thought much into it before.
    It's definitely slower. There are some rare cases where a table gets so ridiculously massive that you need to partition it, but that partitioning is done across drives or servers so the operation becomes parallelizable - and even then it's almost always put back together with an indexed view so it "looks" like a single table. It's never the goal to have logical separation of relational data, even if it has to be physically separated for some reason.

    As for Wordpress, DB tables for a blog are nowhere near this kind of size; for partitioning we're talking about tables that are several GB or even a few hundred GB, to the point where the index itself is so large that it takes forever to search. Unless you need to split the load across servers or disks, you should never do manual partitioning; this is precisely what the database was designed to do, and it's way better at it than you or I.

  • Carl T (unregistered) in reply to SomeCoder
    SomeCoder:
    DustinMichaels:
    If your database has dynamic tables then your design is probably screwed up.
    Amen to that. Although there could concievably be cases where you need dynamic tables, I'd say generally, you don't.
    I believe I've worked with one of those cases. Microarray data. One set of data is maybe a few hundred times fifty thousand rows times a few column. Tens or hundreds of these in an experiment, mostly with the same columns. Then maybe a thousand experiments, usually (but not necessarily) with the same set of columns again. Making one table per experiment gave us lots of gigabyte-sized files rather than one huge lump (using MySQL with one file per table). If nothing else, incremental backups with a granularity of whole files is a lot easier this way, as is deleting whole experiments. We *could* of course have done it the proper way, but then again we *could* have had more time and money, etc.
  • (cs)

    I don't understand how this can even work in the real world. What if more than one thing happens for a given customer in a given month? How will you fit that in one table?

    By having separate tables for each customer and month, they've achieved the first abnormal form. But note all tables have the same schema. That's wrong. To achieve the second abnormal form, adding an event to the table should add a column.

    Having achieved the second abnormal form, it's easy enough to implement the third abnormal form: each event column contains a comma-delimited list of attributes for the event.

    In the fourth abnormal form, the attributes are not values but file names of files that give the values.

    In the fifth abnormal form, the files are snippets of php that generate or calculate the values, returning XML.

    --Rank

  • Wene Gerchinko (unregistered)

    Those who don't understand the POST are doomed to repeat it.

  • anonymous (unregistered) in reply to Rank Amateur

    sixth abnormal form: those snippets are generated by a single large perl script.

  • (cs) in reply to sir_flexalot
    sir_flexalot:
    At least they didn't try to implement in Access... I can't even imagine how slow this must run, even with MySQL. I thought that had to be a typo, several thousand UNION's? just 1 UNION could be bad enough if the tables were large!

    Sure...One union per customer, per month.

    Complete cockup. One customer table, one product table (probably with some property subtables) one order table, query based on date ranges. End of story.

    If you get zillions of orders a month, maybe dump the old orders table to a "Orders_2006" or whatever at the end of the year...If you're using MyISAM in mysql you're limited to 2^32 records, but if you're using INNOdb, there is no limit outside of the one imposed by your file system...Unless you were stupid enough to do something like load image data into individual rows of a table with billions of lines.

    Stuff like this is almost always a result of bad design. I work with commerical database products all the time, and they may come with a hundred tables, but chances are, you only use 20.

  • newt0311 (unregistered) in reply to DustinMichaels
    DustinMichaels:
    If your database has dynamic tables then your design is probably screwed up.
    No, if your design has dynamic tables, you can be damn sure that your design is screwed up. Dynamic tables violate the entire intent of RM. The people who developed this system shoud be shot, hanged, and publicly beaten.
  • (cs)

    Like anyone I learned about normalized tables at school, but in a previous job, we had the need for a data warehouse. So we had a training about redbrick ( data warehousing software from informix, now owned by IBM ) and we were taught that in this case it's usual to have a massively indexed, redundant and read-only database because it's only used for reporting, and it's optimal this way.

    New records were added by batches importing fresh data from production database. There were even tables split by month, implying several unions in some reports.

    So I don't know if using mysql is a WTF here. If the production database is split that way that's indeed stupid, but apparently the system described here does only reports, new data being imported each month ( again, ok, that's a WTF if it's not automated and documented ). But either I was told bullshit by an IBM trainer or I don't get the point.

  • (cs) in reply to Red5
    Red5:
    I love it. Reading this WTF is kind of like watching Fox's COPS. It makes you feel really good about yourself.

    Bad tables, bad tables, whatcha gonna doooo whatcha gonna doooo when DBA comes for you!!!

    Man, now I will get that song stuck into my head for weeks.

    BTW, MySQL defaulting to MyISAM is one of the reasons I switched back to PostgreSQL. The other one was the mySQL team bragging about why they chose not to implement transactions, referential integrity and such. Funny thing: somewhere between 2002 and 2006, the mySQL team whisked away every single comment like that from its documentation; even from the older releases. Maybe they grew up??

  • (cs)

    aaaaaaaaaaaaaaaaaaaaaaaaa the idiocracy is alive and well....someone, ahem highly paid management should know better....

    <captcha: idiots are everywhere they keep following me around/>

  • Terminator (unregistered)

    OMFG .... WTF ??? Obviously these people dont understand what a relational database is or have never read any books ..... or any documentation on basic table design ... or have ever spoken to a real database programmer..

  • grg (unregistered) in reply to Red5
    Red5:
    I love it. Reading this WTF is kind of like watching Fox's COPS. It makes you feel really good about yourself.

    And you can feel your IQ dropping...

  • htg (unregistered)

    As someone who isn't a DBA, but has had to do quite a bit of DB design and programming in the past year, I've been proud of compressing as many things into the fewest set of tables (without duplicating data needlessly, no over-the-top normalisation here) as possible.

    The entire concept of having a core schema that is duplicated EVERY MONTH makes me want to kill someone. I've seen schemas (and databases) that are duplicated each year, and they make me cry. How much effort would that year int NOT NULL field have cost you? Or maybe a table of years? Do you even understand indexes - they're not rocket science!

    A design for one person/one year can be extended to multiple people trivially without needing per-user/year copies of the schema with tables prefixed with the username. Did the programmer not stop and think "Wait a minute, this seems wrong, let me think about this" once?

    It's called up-front-design. Get your requirements. Design. It works, X tables, not nX tables.

    How far can you take it? 1 million users? 20 tables per user? 20 million tables! Each year? 10 years? 200 million tables? Don't people think that maybe just the 20 tables would have been enough, with a couple of extra fields to identify the user and year? No more complex unions across tables holding the same data just to find that friend-of-a-friend.

    Oh, where's my gun. I'm going on a rampage.

  • Florent (unregistered) in reply to drdamour
    drdamour:
    I asked them about this and the responses i got were that i didn't understand mysql and databases, and that backing up a blog was so much easier if you had a seperate table for every blog. They ALSO told me that it was MUCH faster to get the posts for a blog from a table that only contained posts for that blog.

    Here's a google cache of the topic: http://72.14.209.104/search?q=cache:24MCgqZboDYJ:www.workfriendly.net/browse/Office2003Blue/http/mu.wordpress.org//forums/topic.php%3Fid%3D1834+drdamour+lyceum&hl=en&ct=clnk&cd=1&gl=us

    I especially loved the part where some guy tried to prove its point (zillions of small tables are faster than one big table) with some maths :

    How would:

    finding your table = lg(32000) PLUS

    finding your info = lg(m)

    for m = number of elements in a user table

    compare to

    finding your info in a big table lg(n)

    for n = 32000 * m

    Well, let's just say that a very basic property of logarithms allows me to write that lg(32000 * m) = lg(32000) + lg(m) ... Nice try, though !

    CAPTCHA : muhahaha - how appropriate

  • (cs) in reply to Terminator
    Terminator:
    OMFG .... WTF ??? Obviously these people dont understand what a relational database is or have never read any books ..... or any documentation on basic table design ... or have ever spoken to a real database programmer..
    Well, the problem is that these things work okay with a few hundred records but it does not scale.
  • TK (unregistered)

    After reading this site for a couple of months, I have started to think that I should learn something about databases. Can anyone recommend a good book or website? After reading it, I would like to know what things like clustered index and normal form means. If it explains SQL from the start that would be great too. I would prefer if it's either not about a particular db, or about MySQL or Postgres, as I am unlikely to get my hand on a more expensive db any time soon. I have tried to look around on Wikipedia, for for some reason the articles about databases are very poor.

  • mountain (unregistered) in reply to TK

    I'm not sure how easy or hard this is to swallow for someone who's never used a DBMS before, but here you go: http://www.amazon.com/Introduction-Database-Systems-C-Date/dp/0201385902

  • (cs)
  • Joseph Newton (unregistered) in reply to danixdefcon5

    Actually, I wdwonloaded MySQL 5 in March, 2006, and though it does have InnoDB, the documentation still does a lot of whining about how you really don't need transactions.

    At least they are available now in MySQL, so I use it.

    <captcha gygax is a "word"?!? Now that is truly a WTF!/>

  • Duff (unregistered) in reply to TK
    TK:
    After reading this site for a couple of months, I have started to think that I should learn something about databases. Can anyone recommend a good book or website?

    IMHO, deeply understanding good database design is one of the three or so things that justifies getting a quality formal computer science education. Seriously -- people who teach themselves get it wrong much too much of the time.

    (Also, if you're only going to learn MySQL or PostgreSQL, learn PostgreSQL; it's much closer to Oracle in terms of the set of available features which have been supported long enough to have them exhaustively covered in even somewhat aging literature, whereas MySQL has most serious database features only as comparatively recent additions -- and if you don't learn to use SQL's more serious features, you won't understand what they're good for; Paul Graham's argument about the differences in power between programming languages and the impact of such on how coders think when using them applies).

  • (cs) in reply to TK
    TK:
    I would prefer if it's either not about a particular db, or about MySQL or Postgres, as I am unlikely to get my hand on a more expensive db any time soon.

    Today, you'll have the choice of getting "express" versions of commercial products from companies such as IBM, Oracle, Sybase or Microsoft for free. Aside from their limitations regarding storage, memory and feature set - compared to their enterprise versions - they're a good starting point. And companies like Oracle offer you the option to download also enterprise versions for prototype development.

    l.

  • (cs) in reply to my name is missing

    Wow. This is actually an example where I'd recommend they use flat files. At least those are easy to understand and hard to mess up, if one has no knowledge whatsoever of how relational databases and normalization work.

Leave a comment on “Roll Your Own Clustered Index”

Log In or post as a guest

Replying to comment #:

« Return to Article