- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
> The wtf is the loop to make individual inserts. (If I'm reading correctly.)
Yes, because this not only prevents the database server to create a proper execution plan, it also forces the database server and the business logic server to transfer a gazillion records over the network. I/O is expensive. The posted solution keeps everything on the database server, which will not only prevent this network access but in addition allows the database server to decide on the optimal execution plan.
Admin
That design is a small WTF because any problem in the sending loop and you've just lost track of which messages have already been sent.
Admin
Ok, the wtf is you can use the set insert bbut something else smells here!
2 hours for a 2000 insert, let's see it's 7200 secs / 2000 = >3 sec per insert,
well, either numbers (for times) are a bit inflated or the database is really a crappy one!!
[pr]
Admin
Must be a Java developer
Admin
The real WTF is that I didn't even go "WTF"; it's just how it goes in any company.
Managers don't listen to people smarter than them; it makes them feel dumb. In order for a manager to make the right choices, you have to manipulate him into slowly coming to the same conclussion you could have just told him right away. Obviously you won't get credit for the idea and the manager will probably get promoted, but atleast you won't be suffering from the dumber choice.
Admin
More often than not. I'd say in 90% of the cases more than 90% of the problem solving was done by throwing more hardware. ;-) Hardware is ALWAYS cheaper than people. Probably because the cost is easy to calculate, even in Excel.
But seriously, I've seen similar code in a similar case. I had the privilege of being the doer of things and my first idea was to use just SQL Server tables for inbox, outbox and sent. I ended up having a combination of flagging and queing in db, sending bulk with flat files.
Couple of notes though: Nothing wrong with joins and tables, just keep it simple. Nothing wrong with loops either. Nothing wrong with adding layers of logic. But the idea of dynamic SQL in any logic layer is almost always bad. Like here, would have been better to put that in db too. Performance is a good reason to use stored procedures.
In my case I was amazed with the speed the db engine handled the stuff with a couple of el cheapo pc boxes. I also needed to "architect" my initial solution to smaller individual services instead of one SMS server service. You are very likely to need a SMS dispatcher type of service to route to different queues/operators etc.
And here's the real WTF in my two-cent SMS solution: routing to different operators was easy, you can even buy a gateway module, instead of figuring that crap out yourself. And yes, the actual bottleneck is there, not in your db engine, whatever it is. But there was this one operator which made you use their gateway, a VB6 ActiveX, quality not something you put on your production server...and it had code examples and everything just like this WTF story...AND it took hours to work through a miserable amount of msgs, say 2000....and it would crash from time to time (yes, you better think of retry options!)...and their "service" was the most expensive one...and they managed to put most of the smaller competitors (who did things much better from the code point of view) out of business. AND all the time, financial analysts kept hyping this one operator's software as their crown jewel.
So, to add groups and support, say, 20 million users is a different story. But then you can probably afford to throw in some serious hardware to handle all those extra tables...
Admin
Funny you would put it that way :)
In Mike Resnick's novel "Birthright", the human race, at its apex of power while controlling the entire galaxy, tries to cross over to other galaxies. They figure out an engine technology that can get them there, but it can't take enough payload for the massive life support systems necessary for the years it would still take. Then they find a lifeform that can act as symbiotic life-support system for a human: it can live off human waste and produces edible food and breathable air as its own waste. Several human pilots are mentally conditioned to overcome their disgust at eating alien poo and set out in separate ships (redundancy). About half-way, they all commit suicide by disconnecting from the symbiont because the conditioning could not overcome their aversion to having to share this great achievement with a non-human.
Admin
This is not Java. It's dotNet. How do I know? DataTable is a dotNet thing. They obviously implement their own Data Access Block.
And yes, the WTF is a loop thing. The WTF perspective is that this can simply done as the author suggested and that this is not done through a Windows Service.
But if the design is to do this through a Service because this or other operation must hit another database server, this is the way to go (unless you can DTS it).
Okay, now back to the WTF of why it takes so long. From the code that is presented, the custom DataAccess helper can be openning and closing connection for each insert. This is the standard behavior of most DataAccess helper because the rule is that you should always close a connection so that you don't run out of connections. Open and closing connection is a time consuming process.
To improve performance, the developer can open and close connection outside of that loop and use that single connection. Or in c#, use the using statement to automatically dispose a connection.
using (SqlConnection conn = DataHelper.OpenConnection(connstring))
{
}
Another WTF is that the command is being built and execute each time. Even using a single connection, a command has to be build and initiate communication with the DB Server (ExecuteNonQuery) so it can prepare to receive and run the query. So if it must be done this way, the developer should've batch up and execute all at once (hint: StringBuilder).
In the end, there is no defend for doing it this way if it can all be done on the database as the author suggested.
Admin
you're talking about 1 transaction vs 2001 transactions, not to mention all of the network traffic, opening/closing connections, etc.
Admin
This is the first time for a while that I've physically winced at a WTF. What's so painful is not the original coding stupidity - there are plenty of people who don't understand diddly about databases after all - it's the fact that JM identified a perfectly good fix but there was no procedure to put it in place. Idiocy happens, the real WTF is not allowing non-idiots to sort out the ensuing mess.
Admin
A quick test I ran on some (sufficiently large) data I'd handy and not only did both forms produce identical execution plans, but the timing differences, both in terms of median and average run time, were statistically insignificant, which leads me to believe that what I heard was correct.
I tested this on MySQL 4.1.10.
I wish programmers would check things like this out first, and maybe learn a thing or two about statistics.
Capcha: knowhutimean
Admin
Please say it was also expressed using a decimal point, possibly after the first digit...
Admin
BTW, SQL Server is explicitly named as the DBMS in question. Not surprising, seeing as the app is written in C# and MS shops will tend to go with MSSQL rather than a DBMS from some other vendor.
Captcha: mustache
Admin
I have several vbs files that were developed by contractors that do similar things with record sets, and looping through single table record sets to query a second table - rather than run a join.
The files also connect to an Oracle database via shelling out and running SQLPlus... The query to extract data from the Oracle instance is run, and then run again with a count(*) to get the number of records... yes, the number of records.
There are no prepared statements being used on the inserts or updates, there are no stored procedures, at last count, there are 12 round trips with the data for a relatively simple ETL. My boss asked me to look at why it was taking 5 hours to complete a run, after the PM, and development lead had no answers. It turns out there was never a code review on the code from the off shore team, only a glance at the Unit Test results, and then a release to production.
My recommendation was to pour bleach in my eyes so I never see that code again, and start over with a script to Extract the data, and then some stored procedures to insert, and transform the data.
CAPTCHA is quality... PERFECT!
Admin
Buwahahaha, WTF post of the year!
How do you recommend getting relational data out of a relational database?
CAPTCHA - clueless - another PERFECT!
Admin
No, the WTF is, the nitwit who submitted this doesn't even know why the "consultants" were paid more than the "contractors", and thinks the two groups did "the same thing".
Now, I'm not saying a consultant who thinks a procedural loop through a result set is the right way to perform a join in SQL should be working on DBs. But, a simple code monkey should know his place. This is like a brain surgeon not knowing how to change the oil in his car, and the mechanics laughing at him for being stupid.
Admin
Captcha: null
Admin
And a system can be configured to login a specific user when it boots up. Not that this is a good thing, but it does work around idiocies like this. For example, a stock quote system I used to have to work with was a desktop app; we had to make the system autologin the user so a reboot would bring the system back to a working state. Oh, and it was really several apps, that had to be started in the right order, waiting until the previous app was completely initialized before starting the next, and the times weren't dependable. This is where tools similar to the PFB, along with window scrapers, come in handy--using our own WTFery to beat the venders' WTFery.
Admin
Sort of, but not entirely. The "consultants" should have known better than to try their hand at programming if that wasn't what they were there for.
To continue the analogy, it's like the brain surgeon changing his own oil in his car and using olive oil instead of engine oil and damaging his car in the process. Then the mechanics are justified in laughing at him because he should have known not to try to change the oil if he didn't know what he was doing.
Admin
Ahh........ Here's mine:
"Stupendous Thinker-Upper of Stuff That Costs a LOT"
Admin
At a company I worked for a few years back, my boss moved and left the company. They hired a friend of the upper boss to take my old boss's place. The new guy was completely inept (as opposed to my old 'ept' boss), and it was clear he would never change. I decided to get rid of him with ingenuity. I busted my butt to get a highly visible pet project of the CEO done waaaay ahead of schedule, and publicly gave my boss 100% of the credit. He got promoted, and we got a new boss.
Admin
There's another WTF in the code - the insCmd variable should have been created once outside the loop, then bound to different values and executed in the loop. This would probably cut the time by an order of magnitude, even with the otherwise brain-dead design. Array-insert would be better, and of course the consultant came up with the most efficient way - tell the optimizer the whole problem and let it figure out how to do it.
Admin
Most databases will optimise the query and execute exactly the same in both cases....
most, but not all. sometimes oracle will run faster with query #1, sometimes oracle will run faster with query #2. it all depends on how the CBO works the query out.
sad but true, sometimes going from query #2 to query #1 would speed up slow queries by 200-300%, with cache cleared.
at least that was the case last time i had to use RAC (9i) in a data warehouse environment.
Admin
"Text 782537 Your $1.99 Joke-A-Day Subscription" services."
The real WTF is that anyone uses those services. They deserve 3 hour delays.
CAPTCHA: craptastic
Admin
NOOOOOOOOOOOOOOO!!! This is a WTF in itself. You're just promoting idiocy. Your action result in many future WTF.
You remind me of that Jerry Seinfield episode where Elaine promoted that mail guy because she was afraid of him.
Admin
I know this one, you select from each table individually, then connect the data together in the code. Apparently it's faster than using those nasty joins.
Admin
Have you never heard of autologin?
:D (again)
Rich
Admin
This is an example of knowing your tools. JM knows his database tool. The other blokes obviously didn't. Either that or they were putting in obvious inefficiencies so that they could bid and upgrade project that was n-times faster.
Admin
Plenty of times, but it's not a black and white question.
"Oh, god this part of the system is bringing the web cluster down once a week. To fix it we need to re-architect the thingamajog workflow."
"Ok, how many hours will that take to code, QA and release?"
"At least 200 hours"
"I see, and if i throw another $2000 computer in the farm, it will spread the load to alleviate the problem."
"I guess so."
"Well, what's it gonna be, new computer or will you work for less than $10/hour?"
Sure some things are systemic and will cause untold costs down the road if not fixed, but others are not worth fixing with engineering time, when hardware will solve the problem.
Admin
Or, just put all the data into one table, of course...
Admin
three hours to five minutes for 2000 rows...
mmmm... been there,, done that.
some index missing, loads of tables that strictly follo all possible DB normalizations,
I guess that in one of that tables although there was a primary key there has beeb duplicate rows...
Admin
Where I work, we have a table that's setup much like the one CodeRage described. It was put together that way about 5 years ago, when it only had to track the status of a few thousand entries.
That table has nearly a million rows in it now, and almost all of them have been processed and are VERIFIED=1; the query that searches for VERIFIED=0 gets slower and slower as we add new clients, and it now times out when the database server is busy doing other things.
verified count
----------- -----------
0 2625
1 935590
If the original developer had created a separate 'processing' table instead, the query would be much, much shorter. Plus, it would be easier to see if something has been waiting a very long time to be processed (of those 2625 unverified entries, 2527 of them are from before October 1st. While this is not unusual, this normally means that the item is stale and should be removed.)
Having a separate 'processing' table that's very small has another advantage: You don't need as many indices on it (you may not need any, if it's small enough), which speeds up the insertion/deletion speed.
Admin
CodeRage's colution doesn't really fit, because this process was obviously sending out new messages to subscribers. The code sample is where the message is created and placed into the queue. Anon's solution has some merit. You have to remember that these systems can have a history that involves millions upon millions of messages. If the database is not well-engineered then that is going to be darn slow.
I worked at a company in the same field which solved this problem this way. But they weren't content to leave it there, because they decided that the historical tables should be broken up to improve speed. There were twelve tables: log1, log2, log3 ... log12. The logs for the current month went into the appropriate tables. Of course that meant that each table would hold data from multiple years, for example, January 2003 and January 2004.
I have to create an application that created reports based on this data across any specified date range. This was of course made much more interesting when I realised that all dates were stored in mm/dd/yyyy format as strings.
Admin
WTF? What about all the beneficial bacteria in the gut? We're never alone.
Rich
Admin
The real WTF to all of this is that nobody suggested that they use a View. You know a View ... a big query that pretends to be a table. The DBA sets it up and the programmers think they have a magic table that automatically changes with the other tables.
Admin
Yes! First normal form, because first is the best, right? Who'd want to be second, or sixth or whatever?
Admin
MS-SQL actually runs both of these statements in exactly the same way. Same performance, same CPU, same RAM, because they have the same optimization.
I use the second format myself, but only for readability. It makes more sense to me to have the condition of the join as part of the join statement. Makes it clear that that part of the script is to handle the join, while the Where statement determines which subset of rows to return from either table. Other DBAs use the other because it makes more sense to them and is more readable to them.
Since it executes the same, it's just a consistency question.
Admin
It's more like the brain surgeon changing the oil in someone else's car, using olive oil instead of engine oil and damaging the car, and, then as the mechanics all laugh at him, he charges the other guy some money to have it fixed.
Admin
USE TDWTF;
DROP WTF WHERE WTF.WTF = "Not optimized" and WTF.LEVEL < SEVERE;
COMMIT;
Admin
My eyes...the goggles do nothing!
As for the forms of select, on MS SQL Server 2000, the first form is about 6 times as slow as the second form. They will return the same results. I'm basing this off of a stored procedure that would take 3 hours to run and when I switched it from the layout of #1 to #2 it took only 30 minutes. I think 7 tables were being joined together. I can't say anything about the speed of MS SQL Server 2005 as I keep being put on adding new features to old systems, reusing old code for a new site, or bug fixing.
Wow. Any DB worth its salt should generate the same execute plan. Even MySQL and Postgres are smart enough to do this.Admin
In Oracle terms they are the exact same thing, just using different syntax forms. The INNER JOIN / NATURAL JOIN /CROSS JOIN keywords are a comparatively newer ANSI syntax, internally they get mapped the same way.
Admin
I was hired for a job as a developer, mainly because I knew VB3. The old system for this non-profit org was written in VB3, talking to a db2 database. They decided they needed to upgrade the system. The front end they got was written in VB5, talking to MSSQL 6.5 (I think. This was circa 1998). The front end used RDO to communicate to the database, and the geniuses involved in the development of this Wondrously Fine Thing decided they needed to write their own classes to wrap the RDO objects (adding another layer of communication and slowing things down). Performance was pretty craptacular. Their solution was to get more hardware. So we, as a non-profit org mind you, ended up with a quad processor box, with 4gig of ram and around 500 gigs of drive space. Eight years ago, that was a hefty substantial box. Performance increased by maybe 2-3%, at most.
Of course, this vendor also asked us for some specs on a later project we wanted them to do. After sending them a 5-6 page document explaining what was needed, they tacked on a cover sheet, a chart, and a $4000 bill for the document I wrote and sent to them.
Admin
Don't know if anybody caught this but ....
+ " VALUES (@Sender_Id, @Recipient_Id, ...))";
Admin
ya for real .. :) problem solved .. 10 seconds .. :)
Admin
Inserting into a second table violates normal form and is generally (as in nearly always) a BAD idea. A query should be used instead.
Admin
Hell, even MySQL will run the same query plan for both queries. PostgreSQL can do join-ish optimization on WHERE clauses more complex than equality, in some cases, too.
Admin
Well it only executes the same if there are only two tables in the query, or, when there are more than two tables involved, if the database vendor has coded the query processor to "optimize" both querys the same way. If one special case, when there are more than two tables involved and the joins are outer joins and not inner joins, it is not logically possible to do this and the two syntaxes cannot execute the same way.
Admin
There are things you can do in the second form (with joins) which are just not possible in the first. I can't remember the example off the top of my head, it may be if you have more than two tables...
Rich
Admin
I suppose you could fudge your Windows server with all your precious data to autologin too, maybe not the best plan though.
Admin
Sadly this sort of thing is not only not unusual, but commonplace. I recall optimizing a loop somewhere that reduced search time from 75 hours to 6 minutes. A small code reorganization mixed those 6 minutes into 90 minutes of data acquisition time, effectively reducing the search time to zero. As a result the code was infinitely faster. ;-)