- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
"The client instead bought a much faster and clustered database server."
If all else fails, just throw more money at it. That's the real WTF here.
Admin
Hey, they're the clients. Who're we to blame them if they wish to spend massive amounts of cash on a new server instead of improving the design to overcome such trouble from square one :)
Admin
Well, at least the consultants have something new to screw up.
Admin
Oh dear Elvis, that's so sad. Too bad the ones that can't code are paid more. But it's not the coding skills that count, it's about how good you can sell yourself... It's a hard knock life for us...
PS. Not gonna tell you what my captcha was...
Admin
I think I'm going to cry.
Admin
The fact that tens of thousands of people (and I presume more) actually buy these services... that depresses me to no end.
Q. But is it scalable?
A. What? Of course it's scalable - you can always spend more money!
Admin
So are these two SQL statements functionally identical? (and identical speed to run?)
SELECT *
FROM table1, table2
WHERE table1.key = table2.key
and
SELECT *
FROM table1
INNER JOIN table2
ON table1.key = table2.key
I usually use the second, but the example uses the first--is this a secondary wtf? Or just a usage convention difference?
Admin
Of course, it wan't scalable. The consultants blamed the problem on the lack of hardware and when they put in the hardware they stole JM's fix. So of course, now it scales, but not because there is a cluster of servers.
Admin
I had to do some SMS integration on an existing project as my first assignment at a new job. The previous programmer opted to leave it unfinished. I'm guessing he discovered how much of a pain in the ass it was going to be. The service is slow, unreliable, and - best of all - runs in a browser window for gasp hours. Sadly, it is the most well-coded project I have taken over to date.
Admin
I'm sure that somewhere out there are competent, proefessional, and dedicated consultants that really are worth every bit as much as they're paid. However, reading this site almost convinces me that they don't exist. I'm developing a bitter prejudice against outsourcing and consulting. Which is bad, because my boss is trying to convince me that we should outsource our next component...
(Ah, but our business is one giant WTF, so what do I care?)
Admin
Fools! I am pretty sure the optimal solution involved TRUNCATE TABLE or perhaps DROP DATABASE.
Admin
Not if you're the hardware vendor... then it's a FTW.
Admin
The first one is easier to understand for those that don't use SQL commonly.
Admin
Yes, they are identical functionally.
However, on some db systems they will work differently.
Command #1:
Command #2:
Most of the large (expensive) DB like oracle and MSSQL will convert this to the sane execution plan.
Admin
So what I don't get (don't know much about databases myself) is:
Why does the non-WTF query still take nearly five minutes for 2000 SMS on a presumably
large server? A SMS is not longer than ~160 characters afaik, that's not THAT
much of data.
Admin
In the world of Oracle, the first version is more common. And yes, the query optimizer usually understands how to execute that efficiently.
Admin
The wtf is the loop to make individual inserts. (If I'm reading correctly.)
Admin
I'm obviously not reading correctly. The non-wtf still takes a long time because it's still a crapload of data to move? Maybe there's more info we don't see? I dunno.
captcha = bedtime. I wish.
Admin
Yes, sure. I was talking about the sane and optimized version of the query... ?
Admin
This whole thing makes no sense.... even fetching each of the 2000 rows individually should NOT take 3 hours. 3 seconds, maybe. Were they running the db on an old palm pilot or something?
Admin
I'll chalk it up to anonymizing taking out necessary info.
Admin
The INNER JOIN suppresses any records from either table that don't have a matching key field in the other (pretty sure - been a while since I SQL'ed); IIRC, this is the default behavior for some databases, so this *may* be functionally equivalent. Speedwise, an inner joind could be faster than an outerjoin depending on index usage.
Vague enough for you? Good. Now I'm gonna go order me a batch of business cards that say "Doer of Things".
Admin
Honestly, I think the real WTF here is using the technique of copying a bunch of rows from one table to another to indicate a change in status. This sounds like a really stupid design. I guess something like this might need to be done if there were two different databases and systems involved, but the optimization done by our hero would indicate that these two tables are in the same database.
Why not use just ONE table, and have some sort of STATUS field on it, and set such status field to "SENT" or similiar after each messages was processed/sent/whatever? Really, given the need to fix the problem quickly, our hero's solution was great, but WTF, what kind of design is this in the first place?
Please fill me in if I'm completely missing something here!
Admin
Actually you'll notice that they join to several different tables. If some of these tables are large (IE users table) and they do not have proper index's (entirely possible) then this query could end up taking quite some time.
However, given the information we have, I believe that this query should preform faster.
--doc0tis
Admin
This bothers me too. 5 minutes for 2000 insert commands? That seems really slow. I must be missing something.
Admin
It's not only that they are individual inserts, it's also the fact that each row must first be returned to client, then then the client sends each row back to the server. Even using a CURSOR on the SQL Server itself would be more efficient that forcing the client to process each row ,because at least the data never needs to leave SQL. Of course, the simple set-based INSERT that was never implemented is the easiest and most efficient solution.
Admin
I'm with you! I didn't realize how fortunate I was to be at a company (or at least a group within the company) where getting the best solution matters more than making yourself look good, but I promise never to take it for granted again.
(captcha: craptastic -- you'd only see that on thedailywtf!)
Admin
In addition to the slowness caused by what I mentioned in my previous post, I can only guess that maybe a cross join was involved accidentally in the sql statement constructed .. we can't see the whole thing, but it is doing all joins in the WHERE clause so it's very possible. That's perhaps the biggest consequence of not using ANSI syntax -- it is not always clear when all tables are not joined properly and it is very easy to miss a relation or two.
Admin
The WTF is if the contractor who is more qualified than the consultants remains a contractor.
Admin
So lemme get this straight, Alex couldn't post the WTF until 4pm today because his joke sending SMS service was backed up for 3 hours? Now THERE'S the WTF!
;-P
Admin
Not that this is related to the WTF, but... most modern databases will treat those to queries identically. The second form is usually preferrable, though, since some outer joins will be ambiguous when expressed that way. Also, at least in my opinion, the second form is easier to understand for joins with many tables.
Alex: this is the first time I've literally gasped at a daily WTF. This really, really hurt. As a DBA myself, I hate to see the way companies are penny wise but pound foolish about getting in soemone with database expertise.
Admin
I noticed it... but still, 2 hours? I suspect its the inserts that are the time killer here, since the same select query is run in both case. The difference is 2000 inserts vs 1. If the db is over-indexed /super-enterprisey, low on memory or disk space, or just in the proximity of a 'consultant' the performance can get dragged down.
Admin
Fools - Alex anonymized out the cursor that was created to handle the 2000 row result set
Admin
I want my new title to be:
"Grand Imperial PooBah of all Things Great and Small"
Admin
Separate queue tables are good for processing transactions that may fail and be re-tried multiple times. You can have a single "INSERT INTO foo SELECT FROM ..." that executes in a couple of seconds. Then you have a process (or set of processes) that work through those sending the messages. This is why MSMQ exists.
Admin
Such designs aren't uncommon, and have merit. Think about it. There's a queue table (Pending) and a history table (Complete). The history table may be date oriented or partitioned for manageability and performance, especially for lookups and purges. However, your batch broadcast shouldn't have to look through all your historical data to find only the Pending records.
If one were designing it now, for a DBMS such as Oracle, you'd use composite partitioning (range by date, list by status), or you'd have a function-based index to allow you to highlight only the records in a Pending status.
But if you want something simple, and you don't really understand all the features and functions of the underlying RDBMS -- and exactly how many code-monkey contractors do? -- you'd design it exactly in such a way.
Admin
My eyes...the goggles do nothing!
As for the forms of select, on MS SQL Server 2000, the first form is about 6 times as slow as the second form. They will return the same results. I'm basing this off of a stored procedure that would take 3 hours to run and when I switched it from the layout of #1 to #2 it took only 30 minutes. I think 7 tables were being joined together. I can't say anything about the speed of MS SQL Server 2005 as I keep being put on adding new features to old systems, reusing old code for a new site, or bug fixing.
Admin
If you're dealing with large volumes of data, archiving old rows out of a table sometimes makes sense as a way to optimize queries over that table. It's not ideal normalized design, but there is a logical argument for it.
Of course, we don't know what details were anonymized/edited out, either... and we obviously can't assume everyone working on this project was competent!
Admin
you are obviously a contractor. the consultant secrets are not for your eyes...
Admin
They're copying a bunch of rows from a set of tables to another table. Consider:
Table 1 contains records like
"send message M1 to group G1"
"send message M2 to group G2"
...
"send message M99 to group G1"
"send message M100 to group G2"
Table 2 contains records like
"group G1 contains user U1"
"group G1 contains user U2"
...
"group G1 contains user U100"
"group G2 contains user U101"
...
"group G2 contains user U150"
What this query does is throw a bunch of expanded data into a third table:
"send message M1 to user U1"
"send message M1 to user U2"
etc.
Then another process pulls a row at a time from the third table, actually sends the message, and then (probably) deletes the row.
Admin
Isn't that typical?
Don't solve the software problem. Just throw more hardware at it and hope it keeps working!
How many times have you seen an example of this in your career?
Yes, I am asking for some responses to the question.
Admin
Are you sure it wasn't just taking advantage of memory cache the second time? As a sanity check, try doing #2 and then #1.
Even if they run at equal speeds, I still prefer #2 because it separates the conditions into useful blocks:
select (list of fields)
from A
join B on (list of conditions that join B to A)
join C on (list of conditions that join C to A and/or B)
where (list of non-join conditions, e.g. the ever-popular "salary >= 80000")
Admin
Well yes, but since you still have to 'look through all your historical records' to find the records with which to populate the queue, why not just kill two birds with one stone and have the process that finds these messages actually be the one that also sends the message and marks it as sent?
Or, if there really is a need to send many in bulk in a short amount of time (so, for example, the time between the first and last message is minimized so it is as simultaneous as possible for the recipients), why not just create a flat file of messages to send and have a second process grab that file and send them quickly. Even stranger about this WTF is they still have not removed and/or marked the messages in the first table as "sent to queue", so I imagine this has to be done also. Given you still have to do this in some way, all these multi table solutions seem like a huge amount of DB processing, and the one table/status field solution I give seems to me it would still have better performance.
Admin
Maybe the SMS DB is nowhere near the web server or something
Admin
Not necessarily. In this case, history of attempts normalizes differently (e.g. "send message M to group G" x 1) from pending/complete/failed transmissions (e.g. "send message M to user U" x number of users in group G), and you probably wouldn't explicitly store complete transmissions (except during testing) - you would simply assume that anything derivable from the history of attempts, but not in the transmission table, was transmitted without incident and thus deleted from the transmission table.
Admin
LOL, You picked up on another minor WTF, job titles and their ambiguous meanings.
I choose "Commander of the Known Universe" for my next title :)
Admin
The real WTF(tm) is that the client was performing these SQL Queries at all. Why didn't they have a data layer, developed and maintained by a competent DBA? Then he could have written Stored Procedures to do all of this kind of transactional stuff, it would have been fast and all that money spent for new systems could be saved.
I will say however, that all that aside, the original WTF with the query is a very worthy WTF indeed.
Admin
JOIN is evil, indeed very evil.
You can conclude that JOIN was invented by Oracle :D
Admin
The first syntax was used before Oracle 9i came out, what it means is simply: perform a cartesian product of the two rows and select only the ones that referr to the same person.
There virtually no difference other than the syntax. I personally like the old way better. The only problem is when you want to do a full outer join (and other other outer join restrictions), which you can't really (You have to use the Union operator)
Admin