The Daily WTF: Curious Perversions in Information Technology

pitchingchris · 2008-03-05 Reply Admin

Just think if we had to add some... now everything might not even align correctly depending on the sequence of events.

2008-03-05 Reply Admin

I'm very surprised that being so concerned about space optimizations, the original database developer didn't use as much as even a simple rle compression on the data of this column.

2008-03-05 Reply Admin

one database I worked on had to be unnormalized because the normalized form would have taken maybe 10 times the disk space that the unnormalized form took.

RDBMS's are a tool, sometimes they're the wrong tool. Fortunately, the major vendors are catching on, even Oracle now has repeating groups.

Look at the typical example, customers, orders and payments, say each customer could have thousands of orders, and have made thousands of payments, but would only have one balance due column per customer, why would you keep totalling up these orders and payments instead of just maintaining a balance column per customer everytime they added a new order or payment? Yet this is exactly the sort of thing I hear database purists rail against.

UncleMidriff · 2008-03-05 Reply Admin

this webcomic is a wtf:
one database I worked on had to be unnormalized because the normalized form would have taken maybe 10 times the disk space that the unnormalized form took.
RDBMS's are a tool, sometimes they're the wrong tool. Fortunately, the major vendors are catching on, even Oracle now has repeating groups.

Look at the typical example, customers, orders and payments, say each customer could have thousands of orders, and have made thousands of payments, but would only have one balance due column per customer, why would you keep totalling up these orders and payments instead of just maintaining a balance column per customer everytime they added a new order or payment? Yet this is exactly the sort of thing I hear database purists rail against.

Segal's Law

TheRider · 2008-03-05 Reply Admin

Wunar:
I'm very surprised that being so concerned about space optimizations, the original database developer didn't use as much as even a simple rle compression on the data of this column.

You mean, to add a little spice to the WTF?

LOOOOL - been a while since I laughed this well.

2008-03-05 Reply Admin

this webcomic is a wtf:
one database I worked on had to be unnormalized because the normalized form would have taken maybe 10 times the disk space that the unnormalized form took.

Ummm,wtf... unormalized = duplication, normalized = no duplication... so how the **** would a normalized database take up 10x the ammount of an unormalized.

this webcomic is a wtf:
Look at the typical example, customers, orders and payments, say each customer could have thousands of orders, and have made thousands of payments, but would only have one balance due column per customer, why would you keep totalling up these orders and payments instead of just maintaining a balance column per customer everytime they added a new order or payment? Yet this is exactly the sort of thing I hear database purists rail against.

Sum(orderAmmount)

It's not hard... so why bother maintaining a seperate record? unless we are talking about having many disparate systems such as in a bank where we would have a dozen or so systems that need to run a number of batch jobs to agregate information from various data sources to calculate the end of day balance.

Rootbeer · 2008-03-05 Reply Admin

Ilyak:
That's so PHPish - not knowing anything about parameter substitition and thus always introducing sql injection.
That's why nobody likes PHP, if not PHPist himself.

Oh, hush. The "SQL by string concatenation" anti-pattern can be implemented in ANY language, and PHP has had DB packages that support parameter substitution for years now.

Rootbeer · 2008-03-05 Reply Admin

this webcomic is a wtf:
one database I worked on had to be unnormalized because the normalized form would have taken maybe 10 times the disk space that the unnormalized form took.

That can't be right. Normalization, by definition, REDUCES the amount of storage a database requires, since the amount of redundant information is minimized; Any atomic piece of data will be stored exactly once, and relational references to that data almost always take up less storage than copies of data (consider a varchar(2000) vs. an integer).

Denormalization is useful in the opposite case of what you described -- when you need to simplify complex relational queries in order to gain raw performance, and you have extra disk space to spare.

2008-03-05 Reply Admin

I dunno... if the assumption does hold that there would never be more than 2000 product lines, and you only ever need to get the status of particular items (as opposed to items matching...), that's really not the worst way it could have been done.

2008-03-05 Reply Admin

John:
Tell your colleague I think it is a idiot. Actually, give me his full name so I can add him to my list of morons to never work with.

Heh, now there's an idea for a web site...sort of like the opposite of LinkedIn.com.

The only thing is, I think I'd get too depressed by the number of idiots within 3 degrees of me.

2008-03-05 Reply Admin

That's so elitist - assuming that people who write a language don't know about the dangers of SQL injection and how to get around it.

(hint: in that case, if it's a known integer (i.e. settype($prodID, "int") or (int)$prodID or $prodID += 0, it's an int), you won't have to worry about injection there.

2008-03-05 Reply Admin

UncleMidriff:
this webcomic is a wtf:
one database I worked on had to be unnormalized because the normalized form would have taken maybe 10 times the disk space that the unnormalized form took.
RDBMS's are a tool, sometimes they're the wrong tool. Fortunately, the major vendors are catching on, even Oracle now has repeating groups.

Look at the typical example, customers, orders and payments, say each customer could have thousands of orders, and have made thousands of payments, but would only have one balance due column per customer, why would you keep totalling up these orders and payments instead of just maintaining a balance column per customer everytime they added a new order or payment? Yet this is exactly the sort of thing I hear database purists rail against.

Segal's Law

The total is always equal to the sum of the individual records. If you were unable to update the total despite updating one of the individual records being summarized by the total, then your update failed, and you should roll back. Thats what transactions are for.

Additionally, I could have a watch that was an hour off. Even if it was my only watch, I would be pretty foolish to ignore all the other watches that were actually correct. So obviosly Segal was one silly fellow, he should leave lawmaking to those a bit more in touch with reality!

2008-03-05 Reply Admin

UncleMidriff:
this webcomic is a wtf:
Look at the typical example, customers, orders and payments, say each customer could have thousands of orders, and have made thousands of payments, but would only have one balance due column per customer, why would you keep totalling up these orders and payments instead of just maintaining a balance column per customer everytime they added a new order or payment? Yet this is exactly the sort of thing I hear database purists rail against.

Segal's Law

Not a problem if the database does the work of updating the column for you...although there are still many reasons not do this (it makes the parent row a concurrency/locking hot spot, for one).

2008-03-05 Reply Admin

Has this technique been patented? If not I would like to use in my next DB design.

Also I'm trying to use a FOR NEXT loop in the WHERE clause of a SQL statement but I get a syntax error. Our DBA is a real jerk and won't help, does anyone have any code they can send me?

2008-03-05 Reply Admin

Rootbeer:
this webcomic is a wtf:
one database I worked on had to be unnormalized because the normalized form would have taken maybe 10 times the disk space that the unnormalized form took.

That can't be right. Normalization, by definition, REDUCES the amount of storage a database requires, since the amount of redundant information is minimized; Any atomic piece of data will be stored exactly once, and relational references to that data almost always take up less storage than copies of data (consider a varchar(2000) vs. an integer).

Denormalization is useful in the opposite case of what you described -- when you need to simplify complex relational queries in order to gain raw performance, and you have extra disk space to spare.

Heres a simplification of the database, it was an attendance tracking system. Now, for each instance of "attendance", I need the student id, the course id, the date of the meeting and the attendance. The student id in this case is 9 digits, the date is 8 digits, the class is 8 digits, and the attendance value could be as small as ONE BIT (but I went ahead and used one digit).

Now while you can optimize somewhat by reducing the size of the combined key, you can instead track attendance as a repeating group for a big win on storage space once your class size exceeds just a few students or meetings. (If you do allow repeating groups then the entire string of attendance values could be just one variable (array), and in that case it would be easily normalized. However, repeating groups (arrays) in a record throw some RDBMS weenies into fits!)

2008-03-05 Reply Admin

Thanks. Now everyone within the sound of my laughter knows that I am not working.

:)

emurphy · 2008-03-05 Reply Admin

pscs:
I can't think of any significantly better way of doing the required task without modifying the database schema (which would probably break other code and need lots of refactoring of existing code).
No one else here has come up with a better answer. I'm sure they'd like to refactor, but if that's not possible ("we need this by tomorrow") and this feature isn't going to be used much, and the database isn't that busy, then the substring idea would work, until they can schedule in some refactoring.

The proper way to perform non-trivial refactoring like this is as follows:

Create a view that translates from the stupid format to the sane format. In this case, it would look something like this:

create view sane_format as
select Agent, 1 as ProductLinePos, case substring(strProductLines,1,1)
  when '0' then 'Cannot'
  when '1' then 'Can'
  when '-' then 'Product not found'
  when '3' then 'In training'
  -- etc.
end as ProductLineStatus
union
select Agent, 2 as ProductLinePos, case substring(strProductLines,2,1)
  -- same as above
end as ProductLineStatus
union
-- etc.

 Obviously you'd want to write a one-off program to auto-generate an appropriate CREATE VIEW statement.

Convert programs, one by one, from accessing the stupid format to accessing the sane format. Take as long as you need.
Convert the sane format to a real table and ditch the stupid format:

select * into copy_of_sane_format from sane_format
go
drop view sane_format
go
select * into sane_format from copy_of_sane_format
go
drop table copy_of_sane_format
go
drop table stupid_format
go

2008-03-05 Reply Admin

Grovesy:
this webcomic is a wtf:
one database I worked on had to be unnormalized because the normalized form would have taken maybe 10 times the disk space that the unnormalized form took.

Ummm,wtf... unormalized = duplication, normalized = no duplication... so how the **** would a normalized database take up 10x the ammount of an unormalized.

Because row and column storage (and retrieval) costs are not the same.

The level of ignorance displayed in the main article is impressive, but the ignorance displayed here by DBA's concerning the implementation details of their own databases is downright astonishing.

As a rough guide, the marginal cost of adding a redundant integer column to a row in a table you already have is between 10 and 100 times less than the cost of adding a non-redundant column of the same size to a row in a table you don't have yet. This factor depends on whether you need to have indexes and how much overhead your RDBMS tacks onto each table row. In some cases other details like whether your rows are fixed width or not come into play. (*)

VARCHAR(2000) columns may take anywhere from 2 to 8000 bytes of amortized average storage space, depending on the actual length of the string stored and details of the implementation. Some RDBMS engines actually do common-prefix optimizations, LZO, or similar compression on large string and array objects in order to trade I/O time against CPU time, while other engines translate ASCII strings into fixed-length uncompressed UTF-16, add a few bytes of overhead, and then round up to the nearest page size. Some do things that are even less efficient.

Adding 2000 child rows containing two integer columns and an enum type with indexes to support joins will occupy megabytes of space per row in the parent table, and lose advantages that could be gained from having locality of reference or transparent data compression in the RDBMS. If the application requirements only include this information for application-side decision making (e.g. for enabling UI features as opposed to generating reports) then the huge costs of normalizing this particular data structure result in no actual benefit.

There isn't a single rule that can be applied here. "Pure" normalization is nice, but like anything it has appropriate and inappropriate uses. It could turn out that the original developer did some prototypes and discovered that the best approach (at least at the time) is the one that is in place now. On the other hand, maybe there are only 100 rows in the parent table, so the cost of 100 times more storage is acceptable. "Huge storage costs" might mean an actual increase in database size of less than 1GB. We simply don't have enough information to judge.

(*) Presumably the RDBMS implementation could (in theory) translate the "normalized" DDL into an internal implementation that looks like this CodeSOD's data structure. The closest to that I've seen to that capability in a general-purpose RDBMS in the field only supports read-only tables, though, and everything people have posted here so far about slow queries applies to this approach.

2008-03-05 Reply Admin

KenW:
Wrong. It's terrible. It's worse than terrible. The fact that you even suggest it tells me to stay far away from anything you've worked on, or for that matter anything that anyone who ever associated with you worked on.
...

More evidence that everyone should stay far away from your code.

... worst possible ... massive overhead ... every single row ... some kind of total imbecile ... it would take a total moron to even think ...

Please, for the sake of all that's good in the world, find yourself a new line of work far removed from computers. In fact, I'm not even sure you should be allowed near a computer, as a matter of public safety.

Wow. You're totally right, but did you dip your pants in hydrazine this morning?

emurphy · 2008-03-05 Reply Admin

this webcomic is a wtf:
one database I worked on had to be unnormalized because the normalized form would have taken maybe 10 times the disk space that the unnormalized form took.
RDBMS's are a tool, sometimes they're the wrong tool. Fortunately, the major vendors are catching on, even Oracle now has repeating groups.

Look at the typical example, customers, orders and payments, say each customer could have thousands of orders, and have made thousands of payments, but would only have one balance due column per customer, why would you keep totalling up these orders and payments instead of just maintaining a balance column per customer everytime they added a new order or payment? Yet this is exactly the sort of thing I hear database purists rail against.

Please post the DB design so we can tear it apart.

Repeating groups with native support from the language might not be a WTF, e.g. "select * from the_table where list_of_foo contains element"

As for a balance column in the customer table, provided that it's maintained automatically via triggers, that's not a WTF either. However, for other reasons (e.g. printing monthly statements), you should think about giving the orders/invoices/payments/etc. tables an Open/Closed status flag with an index on it.

caffeinatedbacon · 2008-03-05 Reply Admin

Ilyak:
"select * from agencies where SUBSTRING(strProductLines," . $prodID . ",1) == 0"
That's so PHPish - not knowing anything about parameter substitition and thus always introducing sql injection.
That's why nobody likes PHP, if not PHPist himself.

Not a huge fan of PHP myself except for the anti-injection restrictions in the mysql libraries...

mysql_query allows exactly one query per call, so unless you're executing an entire client-side generated sql string, the worst you'll get is an error.

If the original example in your quote had evaluated to

select * from agencies where SUBSTRING(strProductLines,'01222002',1) == 0; drop agencies --,1) == 0

php would return an error.

I'm ok with bashing languages and all, but at least learn how the standard tools in the language work before complaining about them.

2008-03-05 Reply Admin

So, you're saying you don't approve?

2008-03-05 Reply Admin

Wow, someone got up on the wrong side of the keyboard this morning. His solution works - it's just prohibitively expensive. This, coincidentally is how any good solution starts. Then it is revised until it is acceptable, right?

2008-03-05 Reply Admin

How anonymized was this this example? I was given a site to maintain that had products/product lines available to agents, database "designed" by someone who created columns with names like AgentThreeAddressFive, and his initials were the same as a common abbreviation for a bodily function...

2008-03-05 Reply Admin

wow..... looks like someone needs a college education, obviously high school wasnt enough

2008-03-05 Reply Admin

the article:
So how exactly can can I build

Can-can?

2008-03-05 Reply Admin

emil:
I have seen and had to maintain much worse in my short life. You'll do too and and at some point will have to relax and choose quiet resignation and a "let's get things moving" attitude over database normalization, Induced Tourette Syndrome and homicidal rage.

I've indeed encountered worse, and in my long life I've learned that a "let's get things moving" attitude eventually gets you an unpleasant conversation with your IT director, wherein you get to explain to him that a feature the owner has promised to deliver for an important client can't be done in the promised time-frame because the original developer was hopelessly naive and left behind something shitty and brittle.

ammoQ · 2008-03-05 Reply Admin

KenW:
emil:
supposing that 0 is for "can sell":
"select * from agencies where SUBSTRING(strProductLines," . $prodID . ",1) == 0"

Not that bad.

Wrong. It's terrible. It's worse than terrible. The fact that you even suggest it tells me to stay far away from anything you've worked on, or for that matter anything that anyone who ever associated with you worked on.

emil:
I wonder which is faster: a straight join on three tables (agencies, productLines, agencies_producLines) or this hack ?
Still, it's not what I would do, unless the performance advantage of the SUBSTRING version would be big enough.

More evidence that everyone should stay far away from your code.

The SUBSTRING() solution is one of the worst possible ones. It not only completely removes the advantage of indexes (unless you happen to be searching for the first character of the column values), but it requires the massive overhead of a function call for every single row in the table. Only some kind of total imbecile would even suggest this as a possible solution; it would take a total moron to even think that this is "Not that bad.".

Please, for the sake of all that's good in the world, find yourself a new line of work far removed from computers. In fact, I'm not even sure you should be allowed near a computer, as a matter of public safety.

Actually, I'd rather want to work with Emil than with you. Emil proposed a solution to the problem that might, after fixing minor syntax bugs, actually work. It's not a very beautiful solution, but in the given setup, nothing is beautiful.

As Emil suggested, it might be even faster than the normalized database schema every normal person (including you, Emil and me) would choose. Of course the query requires a full table scan, but since there is (most likely) at most one row per agent, and the number of agents is most likely in the range of a few hundreds to a few thousands, the database will easily handle that.

So there is Emil, who proposes a solution, and KenW, a complacent blowhard who just berates other people. And to add insult to injury, said KenW's post gets featured.

2008-03-05 Reply Admin

that guy:
emil:
I have seen and had to maintain much worse in my short life. You'll do too and and at some point will have to relax and choose quiet resignation and a "let's get things moving" attitude over database normalization, Induced Tourette Syndrome and homicidal rage.
I've indeed encountered worse, and in my long life I've learned that a "let's get things moving" attitude eventually gets you an unpleasant conversation with your IT director, wherein you get to explain to him that a feature the owner has promised to deliver for an important client can't be done in the promised time-frame because the original developer was hopelessly naive and left behind something shitty and brittle.

Ain't that the truth. I wouldn't sweat it, though - 'emil' was wondering in all apparent seriousness whether his SUBSTRING hack might actually beat out indexed joins. He's got a lot to learn: a few more years of putting "let's get things moving" into practice will be just the ticket, I think.

Nothing can school developers about maintainability - and finding the all-important balance between "getting it done" and "doing it right" - quite so effectively as a few solid years spent maintaining and adding new features to their own houses of cards. A few gray hairs later, and you've got a perfectly serviceable intermediate programmer.

2008-03-05 Reply Admin

PSWorx:
I like how the original developer dismissed a normal database table because it "wastes too much space" - only to replace it with a VARCHAR(2000) column. That's 1337 programming alright!

Dude, you gotta think outside the box. the 1337 part is where you apply RLE to the string, so an agent that can sell everything only takes 6 chars: '1:2000' :)

2008-03-05 Reply Admin

What is a possible solution in a case like this? A cronjob that creates a REAL relation once a day? I think I'd have to rip that apart and do it over.

Sometimes you don't want to mess with code that "works" but other times just rewriting will save you so much pain later.

pitchingchris · 2008-03-05 Reply Admin

this webcomic is a wtf:
Look at the typical example, customers, orders and payments, say each customer could have thousands of orders, and have made thousands of payments, but would only have one balance due column per customer, why would you keep totalling up these orders and payments instead of just maintaining a balance column per customer everytime they added a new order or payment? Yet this is exactly the sort of thing I hear database purists rail against.

I tottally disagree with you. If you totalize everything yourself and maintain the db number, it defeats the purpose. A query can easily return the correct data. If you do it through code, its easy to get everything out of sync. One failed write, and you would have to actually reread the items and totalize them again to get the correct information. Databases are meant to store data, and provide a mechanism to use the data in any way you see fit (queries, reports) to use for information and analysis purposes. Maybe back in high school an unnormalized database would have fit the bill, but to have a scalable solution, any good db developer would agree normalization is imperative.

2008-03-05 Reply Admin

pscs:
I think that's a bit harsh. It would work, and wouldn't need the database schema to be modified. I can't think of any significantly better way of doing the required task without modifying the database schema (which would probably break other code and need lots of refactoring of existing code).
No one else here has come up with a better answer. I'm sure they'd like to refactor, but if that's not possible ("we need this by tomorrow") and this feature isn't going to be used much, and the database isn't that busy, then the substring idea would work, until they can schedule in some refactoring.

What's wrong with refactoring the code? This is terrible and unmaintainable, so get rid of it. Besides, someone's already propeosed a migration strategy off of this POS.

Disgruntled DBA · 2008-03-05 Reply Admin

The true brillance (tm) of the design is the fact that it uses Nvarchar, rather than just regular varchar. Sure, he may be "wasting" an extra byte for every row and product line, but with over 65,000 characters to choose from this thing must scale to unimaginable (or is that "unmanageable"?) levels of complexity! The original designer really thought through that whole "takes up too much space" problem.

2008-03-05 Reply Admin

pitchingchris:
this webcomic is a wtf:
Look at the typical example, customers, orders and payments, say each customer could have thousands of orders, and have made thousands of payments, but would only have one balance due column per customer, why would you keep totalling up these orders and payments instead of just maintaining a balance column per customer everytime they added a new order or payment? Yet this is exactly the sort of thing I hear database purists rail against.

I tottally disagree with you. If you totalize everything yourself and maintain the db number, it defeats the purpose. A query can easily return the correct data. If you do it through code, its easy to get everything out of sync. One failed write, and you would have to actually reread the items and totalize them again to get the correct information. Databases are meant to store data, and provide a mechanism to use the data in any way you see fit (queries, reports) to use for information and analysis purposes. Maybe back in high school an unnormalized database would have fit the bill, but to have a scalable solution, any good db developer would agree normalization is imperative.

You cannot fail the write with an RDBMS that supports transactions. If you can, then a simple accounting app can debit one account but then fail to credit another account for a balancing entry, leaving the entire chart of accounts out of balance, yet according to you, this is somehow more acceptable than updating a summary record in the same transaction that adds a detail record that would be totalled in the summary record?

Ok then maybe you mean that I forgot to include the update in my code, but even then, isn't that just like forgetting the balancing entry in my accounting application? Heck, I'm in even better shape than your accounting app, because I can simply put my code in a trigger and forget about it, whereas you have to actually SPECIFY TWO SEPARATE ACCOUNTS to update ALL THE TIME.

This is like an application changing my street address but somehow "failing" to update my city, state, and zip. Either your RDBMS can do atomic updates to a practical level, or it can't. Make up your mind, THEN come back and school me if you want.

2008-03-05 Reply Admin

Sounds like someone forgot their Midol today...

2008-03-05 Reply Admin

Jay:
Performance could be very efficient if we just create a few appropriate indexes, like:
create index agent_prod_1 on agency(substring(strProductLines,1,1);
create index agent_prod_2 on agency(substring(strProductLines,2,1);
create index agent_prod_3 on agency(substring(strProductLines,3,1);
... etc ...
And think of the advantages in human readability! Instead of having to figure out some cryptic thing like
Agency   | Product Line     | Access      |
---------+------------------+-------------+
Denver   | bicycles         | In-training |
Richmond | bobble-heads     | No          | 
Austin   | nuclear warheads | Sell        |
etc, all you have to do is remember that, say, bicycles is product line number 204, count over to column number 204, note that "6" means in-training, and you're done! And please, don't tell me people will lose count finding column number 204. As long as you keep in mind that column 203 is "potted plants" and you remember that OF COURSE the Denver office would be code "8" on potted plants, then if you see that the column before the one you're looking at is not an 8 then you must have mis-counted. If you're too lazy to do a little work to figure it out, well ...

At least, that's the sort of explanation I typically get when I ask about crazy designs like this. Usually with an "are you stupid or just a trouble-maker?" tone of voice.

I don't think you can create indexes like that (and even if you could, I wouldn't create 200 indexes on it). Much better to add a product line table and a join table between sales critters and product lines:

SALES_GUY: existing table - nuke the varchars after we migrate the code to not use it PRODUCT_LINE: stuff describing a product line SALES_GUY_PROD_PERMISSION_DESC: 10 rows describing what each state means SALES_GUY_PROD_PERMISSIONS: fk to each of SALES_GUY and PRODUCT_LINE column for permission, fk to SALES_GUY_PROD_PERMISSION_DESC

add an index on the join table and now everything is fast and sane.

dextron · 2008-03-05 Reply Admin

Here's my crummy view that pivots things like this:

create view viewAgentProductLines as select a.intAgentId ,substr(a.strProductLines,pl.intProductLineId,1) strAgentProductFlag ,plflags.description from agency as a ,productLines as pl ,productLineFlags as pflag where substr(a.strProductLines,pl.intProductLineId,1)=pflag.allowance_code;

Please Note:

I never, ever use column or table names like this.
Yes, that is a Cartesian product between agency and productLines, a rare use for Cartesian products.
The productLineFlags table maps the various codes ('-','0','1', etc.) to descriptions in the approriate language.
It's assumed that product lines have a numeric key that starts with 1, or whatever index your strings start with.
This would have crummy performance if you know the agent, and absolutely horrific performance if your looking for agents that have a specific productLineFlag.

2008-03-05 Reply Admin

pscs:
KenW:
emil:
supposing that 0 is for "can sell":
"select * from agencies where SUBSTRING(strProductLines," . $prodID . ",1) == 0"

Not that bad.

Wrong. It's terrible. It's worse than terrible. The fact that you even suggest it tells me to stay far away from anything you've worked on, or for that matter anything that anyone who ever associated with you worked on.

I think that's a bit harsh. It would work, and wouldn't need the database schema to be modified. I can't think of any significantly better way of doing the required task without modifying the database schema (which would probably break other code and need lots of refactoring of existing code).

No one else here has come up with a better answer. I'm sure they'd like to refactor, but if that's not possible ("we need this by tomorrow") and this feature isn't going to be used much, and the database isn't that busy, then the substring idea would work, until they can schedule in some refactoring.

I'm so glad someone said this. I don't work as a programmer by trade, but KenW and all the others screaming about this strike me as incredibly lucky, incredibly naive, or they've just never held down a real job.

I'm betting that emil knows the definition of "deadline" and knows to get a working solution first, THEN ask to have a meeting about refactoring, involving the people affected by such a refactoring...if you dive in like a maverick and start refactoring without a care in the world about the repercussions, well, I hear McDonald's is hiring...

2008-03-05 Reply Admin

John:
John:
Shinhan:
When I showed this article to my colleague he said this looked elegant and nice.
He also likes to concatenate the foreign keys in one field (comma delimited). This is what I had to write to connect two tables whose only connection were comma delimited foreign keys:
WHERE FIND_IN_SET(SUBSTRING_INDEX(t1.category_array,',',1),t2.category_array)
  OR FIND_IN_SET(SUBSTRING_INDEX(SUBSTRING_INDEX(t1.category_array,',',2),',',-1),t2.category_array)
  OR FIND_IN_SET(SUBSTRING_INDEX(SUBSTRING_INDEX(t1.category_array,',',3),',',-1),t2.category_array)
  OR FIND_IN_SET(SUBSTRING_INDEX(SUBSTRING_INDEX(t1.category_array,',',4),',',-1),t2.category_array)
  OR FIND_IN_SET(SUBSTRING_INDEX(SUBSTRING_INDEX(t1.category_array,',',5),',',-1),t2.category_array)
  OR FIND_IN_SET(SUBSTRING_INDEX(SUBSTRING_INDEX(t1.category_array,',',6),',',-1),t2.category_array)
  OR FIND_IN_SET(SUBSTRING_INDEX(SUBSTRING_INDEX(t1.category_array,',',7),',',-1),t2.category_array)
  OR FIND_IN_SET(SUBSTRING_INDEX(SUBSTRING_INDEX(t1.category_array,',',8),',',-1),t2.category_array)
  OR FIND_IN_SET(SUBSTRING_INDEX(SUBSTRING_INDEX(t1.category_array,',',9),',',-1),t2.category_array)
  OR FIND_IN_SET(SUBSTRING_INDEX(t1.category_array,',',-1),t2.category_array)
Tell your colleague I think it is a idiot. Actually, give me his full name so I can add him to my list of morons to never work with.
Crap, now I look like a moron because I can't type :-P That should have been "Tell your colleague I think HE is AN idiot"

What a moron... I think that maybe all this stupidity is somehow contagious.

No, in this case "it" is correct.

2008-03-05 Reply Admin

If they stored the SQL strings required to access all the information in the agents database in another table called "agent_loopup_SQL" then it would be easy to access the info!

2008-03-05 Reply Admin

mister:
I would use something like...
SELECT strProductLines FROM Agency WHERE agent_id=$whatever [transform strProductLines into a comma-delimited list of product_ids we are interested in] SELECT * FROM Products WHERE product_id IN ($list)

followed by a petition to refactor.

Well, I would use something like this:

INSERT INTO old_dev SET rectum = largebluntOBJ WHERE largebluntOBJ = '1964 Cadillac Hearse';

Yes, I know it's not exactly a valid SQL statement, but it makes me feel better.

2008-03-05 Reply Admin

ammoQ:
... and the number of agents is most likely in the range of a few hundreds to a few thousands, the database will easily handle that.

Here's the thing; I see what you're saying, and you're likely correct. But ...

Consider this: once upon a time I saw a dev team deliver an app to a client. This app was quite pretty, and worked very well, for a few days. Client management was so impressed with this app, they decided to let it handle all their archival data, too, and dumped it all into the database, all 12 gigs worth.

Now, a properly designed database would take that 12 gigs and say "Yeah, what else ya got? Bring it on!" But this database was designed by a naive development team who figured that things like normalization were just something some fat grey-beard pulled out of his ass one day when he was bored. They were also absolutely certain that this app would never have to handle more than a few thousand records.

I found out about all this when the client called me up dismayed. When the app was doing some task and it was going to take more than a few seconds, it would helpfully display an estimated time 'till completion. For a couple of common tasks, the time 'till completion displayed was ... I shit you not ... several days.

Sometimes things change, businesses change, go in different directions. Robust solutions just keep on working, brittle ones break.

2008-03-05 Reply Admin

KenW:
emil:
supposing that 0 is for "can sell":
"select * from agencies where SUBSTRING(strProductLines," . $prodID . ",1) == 0"

Not that bad.

Wrong. It's terrible. It's worse than terrible. The fact that you even suggest it tells me to stay far away from anything you've worked on, or for that matter anything that anyone who ever associated with you worked on.

emil:
I wonder which is faster: a straight join on three tables (agencies, productLines, agencies_producLines) or this hack ?
Still, it's not what I would do, unless the performance advantage of the SUBSTRING version would be big enough.

More evidence that everyone should stay far away from your code.

The SUBSTRING() solution is one of the worst possible ones. It not only completely removes the advantage of indexes (unless you happen to be searching for the first character of the column values), but it requires the massive overhead of a function call for every single row in the table. Only some kind of total imbecile would even suggest this as a possible solution; it would take a total moron to even think that this is "Not that bad.".

Please, for the sake of all that's good in the world, find yourself a new line of work far removed from computers. In fact, I'm not even sure you should be allowed near a computer, as a matter of public safety.

Lighten up dick weed! All he did is post a reply with a working solution and a fair (even if naive) question.

Please, for the sake of all that's good in the world, find yourself a line of work far removed from other people.

2008-03-05 Reply Admin

Grovesy:
Ummm,wtf... unormalized = duplication, normalized = no duplication... so how the **** would a normalized database take up 10x the ammount of an unormalized.

So I give up... do you pronounce "unormalized" as "Un" + "ormalized" or "U" + "normalized"?

At my last job, I was asked to help the guy in charge of the Payroll project. When I saw columns like the one in the article in his database, I nicely explained to him that he could handle the data retrieval and I would be glad to help with the UI and business tiers.

2008-03-05 Reply Admin

Ilyak:
"select * from agencies where SUBSTRING(strProductLines," . $prodID . ",1) == 0"
That's so PHPish - not knowing anything about parameter substitition and thus always introducing sql injection.
That's why nobody likes PHP, if not PHPist himself.

It should be implied that $prodID has already been sanitized by this point.

Dan D.:
I dunno... if the assumption does hold that there would never be more than 2000 product lines, and you only ever need to get the status of particular items (as opposed to items matching...), that's really not the worst way it could have been done.

Just because it could be worse doesn't mean it isn't terrible.

real_aardvark · 2008-03-05 Reply Admin

this webcomic is a wtf:
Rootbeer:
<snipped -- but very pertinent/>
Heres a simplification of the database, it was an attendance tracking system. Now, for each instance of "attendance", I need the student id, the course id, the date of the meeting and the attendance. The student id in this case is 9 digits, the date is 8 digits, the class is 8 digits, and the attendance value could be as small as ONE BIT (but I went ahead and used one digit).
Now while you can optimize somewhat by reducing the size of the combined key, you can instead track attendance as a repeating group for a big win on storage space once your class size exceeds just a few students or meetings. (If you do allow repeating groups then the entire string of attendance values could be just one variable (array), and in that case it would be easily normalized. However, repeating groups (arrays) in a record throw some RDBMS weenies into fits!)

Disclaimer: I am not a DBA.

This, however, is a ludicrous piece of self-justification.

To be interested in the amount of disk space required, one of two things should apply:

(a) You can't afford an 80G disk or (b) Your application gets so many hits per second that locality problems ensue.

I doubt either applies, in your case. Or in any comparable case. But let's just give you the benefit of the (huge) doubt.

Your date fits into a BIGINT (strangely enough, it even fits into a DATE); your class fits into a BIGINT; even your student id (and I applaud any university or college that plans to have more than 4,000,000,000 students attend somewhere between foundation and having the plug pulled because of a moronic computer-based accounting system) can be fitted into a BIGINT using fairly simple compression techniques. Thanks for losing the seven bits, though -- that'll buy you two coffees and a double espresso latte.

Have you considered the other benefits of using an RDBMS (properly, that is?) The reason that repeating groups in a record throws some RDBMS weenies (also non-weenies, also ordinary programmers like me) is that it's just plain cretinous; no gain, much pain. I'd advise a simple ISAM database, but even that fails at the same hurdle.

Are you insane?

real_aardvark · 2008-03-05 Reply Admin

that guy:
ammoQ:
... and the number of agents is most likely in the range of a few hundreds to a few thousands, the database will easily handle that.
Here's the thing; I see what you're saying, and you're likely correct. But ...
Consider this: once upon a time I saw a dev team deliver an app to a client. This app was quite pretty, and worked very well, for a few days. Client management was so impressed with this app, they decided to let it handle all their archival data, too, and dumped it all into the database, all 12 gigs worth.

Now, a properly designed database would take that 12 gigs and say "Yeah, what else ya got? Bring it on!" But this database was designed by a naive development team who figured that things like normalization were just something some fat grey-beard pulled out of his ass one day when he was bored. They were also absolutely certain that this app would never have to handle more than a few thousand records.

I found out about all this when the client called me up dismayed. When the app was doing some task and it was going to take more than a few seconds, it would helpfully display an estimated time 'till completion. For a couple of common tasks, the time 'till completion displayed was ... I shit you not ... several days.

Sometimes things change, businesses change, go in different directions. Robust solutions just keep on working, brittle ones break.

Oh, poo to you.

YAGNI!

2008-03-05 Reply Admin

So you get a kick out of insulting other people, huh?

2008-03-05 Reply Admin

Heh. Show me the tiny trivial databases you work on. It's nothing, and it reflects that the guy who built it didn't want to use/depend on the SQL database.

It's trivial to take that field and create a display of who gets what on what lines programmatically. I'll bet that when he wrote it, it got the job done. New guys always thumb their noses at the guys who came before them.

Rule #1 for assessing programmers: The ones who are condescending suck at getting things done. I usually get their jobs (or at least their projects)...

Databases are quite cool, but naive programmers often unload their work on the DB and stuff it full of records - which bog down the system. Normalization is not always the solution - but you'll learn.

Anyone want to speed test the string thing? And why the hell would you index a column containing 1 character, especially when it only has 11 or 12 possible values. That's just nuts!

Pretty Simple

Leave a comment on “Pretty Simple”