- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
isTableEmpty('/t/1000')
Filed Under: DTFY
Admin
So let me guess, the other method basically calls the same thing. So it's making two trips to the database for every table. Not to mention the real WTF being using raw ADO.NET
Admin
ewwwww!
[image]Doing a select all followed by
return ds.Tables[0].Rows.Count > 0
would be bad enough, but iterating through as well?Admin
Tina Two Times:
Admin
11 posts were split to a new topic: Mod sign
Admin
Doh!
Why do people do stuff like that???
What is wrong with getting the list to process (with appropriate where clause/aggregates) an then bail out if the result is empty ?????
Then again, if no one made stuff like that, this site would be boring....
Admin
Running a COUNT(*) upfront with a simplified query and using that to short circuit out before running the Big Kahuna query with all sorts of aggregates and shit can help you avoid excessive database loads, particularly in cases where indexing is not optimized.
Now, TRWTF is a system that I inherited. The reporting component took an arbitrary query (the result of unsanitized concatenation if you must know), ran it once. It would count the rows it returned and determine how many threads it wanted to run and throw that away. Each thread would then mangle the query to get the correct 'page' of output. Each thread would then run the mangled query and output its results to a text file. The master thread would then merge the text files back together and continue onwards. Records would fall through the cracks, get duplicated, etc., things would take WAY longer than they needed to be, getting worse as number of rows increased. Nobody could figure out why.
Admin
All we need now is for there to be a critical table somewhere that has to have data or it will cause the app to crash. Oh no! The app's crashed! Make it not crash!
Okay, says the developer, I'll make it Not Crash. I'll add some extra functionality to IsTableEmpty to add a row of dummy data. Here's the pseudocode:
If (table = thistable) then Insert into thistable ... else if (table = thattable) then Insert into thattable ... else if (table = theothertable) then Insert into theothertable ... else ... end if end if end if
There. I stopped it crashing for you.
Admin
Actually I've found it. TRWTF is that IsTableEmpty is broken. This line:
"SELECT * FROM "+s,"server=;database=; User ID=; Password="
... will result in this string being built for e.g s = users:
SELECT * FROM usersserver=;database=; User ID=; Password=
(Extra space added so as to prevent semicolon P being interpreted as an emoticon.)
Admin
True, but in that case the shift from simple get it to get the count should be based on some profiling beforehand.
As regards your Inherited system, then that sounds just horrible.....
I mean what was the original rationale for splitting the query in the first place?? I can not see that any performance gain could in any way result of that....
Admin
Use code blocks. Also, you misread the code, turning a
,
into a+
. The function takes two aruments, not one.Admin
Discourse inquired about employing a Cleaner, but the agency told Jeff that he would be referred to The Hague on charges if he forced a Cleaner to plumb through the Discocode.
Admin
So "pretty please... with sugar on top" quit trying to be a goddamn programmer.
Admin
The first rule of SqlDataAdapter is never to use SqlDataAdapter.
And - a beautiful TDWTF - the more you look at the code, the more "beauty" is revealed.
Admin
Looks like this software developed in Maerica!
Admin
Shouldn't there be one of those
8 months later
banners around that post or something?
Admin
Whilst running a count(*) can be useful, in this case they're counting everything and not using the count. All you need is select top 1 * (or equivalent in the db of your choice) if you really have to know whether the table is empty or not.
Admin
A bit inconsistent about the tables checked vs tables searched.
But - points deducted for not reading the table names from query string parameters - needs more SQL injection.
Admin
Depending on the WHERE clause, running a
SELECT COUNT(*) FROM TABLE
can require actually retrieving every matching row from the table internally. It's much better toSELECT COLUMN FROM TABLE FETCH FIRST ROW ONLY
(or equivalent), which performs better for any WHERE, even one that matches a million rows.(
FETCH FIRST ROW ONLY
is DB2 lingo for retrieve any 1 row; most databases have some similar construct.)See, semantically, SELECT COUNT(*) is efficient if you actually need a count of all the rows. It is inefficient if all you want to know is if ANY matching row exists.
But don't do this, the limit doesn't help:
SELECT COUNT(*) FROM TABLE FETCH FIRST ROW ONLY
One of the best cleanups I've done on a program as a cleaner was one that had logic similar to that in the article:
That section was a little slow and I thought, why waste time doing that? So my first change was:
That killed half the runtime, which was about 3½ hours. Then, well, the query wasn't really bad; the database profiler projected it would take only 0.02 seconds to run.
Hmmm...do you suppose the program could be slow because it's looping to that query 1,500,000 times? 1,500,000x2x0.02 = 16 hours. Well, the database was doing a bit better than the profiler said, but still.
So let's add a targeted index. Hmmm....now the profiler says 0.0001 seconds. Suppose that will help?
Runtime of 3½ hours reduced to, seriously, 12 minutes. Or less. Because of that one stupid query logic section.
Admin
That sounds like time to merge separate updates to the same table into one call?
Admin
In keeping with the spirit of the article image...
"I'm sorry, did I break your concentration? I didn't mean to do that. Please, continue, you were saying something about best intentions. What's the matter? Oh, you were finished! Well, allow me to retort. Is the table empty or not?
"SELECT * FROM TABLE"
"What database are you using?"
"What? What? Wh - ?"
""What" ain't no database I've ever heard of. They speak SQL in What?"
"SELECT * FROM --"
"SQL, motherfucker, do you speak it?"
"Yes! Yes!"
"Then you know what I'm sayin'!"
"Yes!"
"Tell me if the table is empty or not."
"SELECT * FROM --"
"Query the entire table again! Query the entire table again, I dare you, I double dare you motherfucker, query the entire table one more Goddamn time!"
Admin
I look back with delightful nostalgia on the day when I found someone had implemented, in a Java program, a bubble-sort on the elements of an ArrayList. In fact, the method in question had two bubble-sorts, side by side, differentiated by an "if", which were dependent on which property the elements of the list were to be sorted on.
Admin
This is just the cost of abstraction. You don't want to do a CPU-intensive join and retrieve all rows from the other tables when one of the tables is empty.
It works after all. If it's slow you just but another couple of servers.
Admin
This?
If so, no, it's not.
Admin
I remember a section of code which, in order to determine if a table exists, would run the query:
And return 0 if any SQL error occurred while executing the query, 1 otherwise.
Admin
It's the sort of thing you get when someone has no idea what a database does, and so decides that it's just a key-value store with a funny syntax.
Admin
For future articles, please anonymize data instead of censoring it. Maybe throw in a
hunter2
or abrillant
or make references to Telligent or Initrode or other companies that don't exist.Admin
For that matter, why does that method have connection parameters and credentials hard coded in it?
https://youtu.be/OWwOJlOI1nU
Admin
Yeah, I was wondering if C# would build prepared statements using
****
:wtf:Admin
ahhh, it's good to have the swearing @Matt_Westwood back.
Admin
Actually, in a lot of these cases, you don't have to check whether the table is empty or not: If you try to perform a select or update from an empty table, you'll know from the response that it's empty (either you get no rows back, or you get back an indication that no rows were updated).
Also, if you have software that goes haywire if a list of something turns out to be empty, then the fact that you are using your database insanely inefficiently is only one of your problems.
Admin
Would be nice if I could.
Admin
I think that depends on the database and its optimizer. I'll confess, I don't have a lot of expertise outside of Microsoft SQL Server, but I've used that for 15 years across multiple versions and seen how optimizer changes have made it capable of smartly realizing when you are just doing an existence check and it only needs one row. Of course, you need to be doing the check on the server side for the optimizer to realize it. At the risk of digressing, if you run this in any recent version of SQL Server with execution plans on, you can see that you only have problems if you just try to get a straight count (the third query):
So of course, if you don't know what you're doing (as it's clear was the case with the original writer of IsTableEmpty), or you don't know what the back end database is going to be and how the optimizer will work, yours is probably a good rule of thumb; but possibly a better rule of thumb is Involve A Database Expert In Data Intensive Development! :smile: (Based on the rest of your post, it's obvious that you are very aware of how tiny tweaks can make the difference between lightning fast and agonizingly slow database operations.)
I mean, in the ideal world, not only would you write a much better IsTableEmpty, but... you might not even need it. Filling up table objects and doing the joins yourself in the front end is probably a terrible idea versus just calling a database stored procedure (or view or even raw query) to get those, and any remotely good optimizer is going to already intelligently handle some of the tables possibly being empty and getting and joining all that data more efficiently than the client app. But that probably goes without saying too!
Admin
That's exactly the thing.
DB2 is smart about this as well; if you use EXISTS(SELECT *) it will do the top 1.
But the COUNT(*) function requires the determination of row count, which means that, properly, the database engine must find and enumerate all rows matching the WHERE clause.
Our people developed a bad practice in the early days of DB2 (V1 and V2) of doing a SELECT INTO (one-step query/return) in this form (COBOL):
In that usage, the database engine is forced to count all the rows because the test for zero is outside its domain of operations. If all the WHERE clause predicates are indexable, the performance isn't usually too bad. If there are a lot of rows and any non-indexable column is involved, the performance can suck big time.
And this was pure laziness, because even then, when FETCH FIRST ROW ONLY had not been added as a feature, there were ways to do this efficiently. One is, for example, set up a cursor and fetch the first row.
"But why go to all that trouble when DB2 can count 100K rows really fast?" (I should emphasize that this is a paraphrase quote from a lead programmer.)
Sigh.
I published a strong anti-pattern warning; despite that, we're still finding and fixing these crummy things.
Admin
I think my eyes got just a little bit wider with each line of "IsTableEmpty" until they were about ready to pop out of my head.
A previous co-worker at my current employer had this weird anti-pattern where should define a Linq2Sql query, then do if(query.Count() > 0) dataGrid.DataBind(). I repeatedly tried to point out to her that this just resulted in the grid not being updated if her query returned no results. Despite the fact that all she did for the project was CRUD screens, so none of these queries were particularly expensive to require a COUNT(*) query first, and the grid should always be updated if a search resulted in no records. I even tried introducing her to the "EmptyText" property of the data grid, but she insisted on using this anti-pattern for every page she worked on.
And that was the milder stuff she did. After she left, everything she had worked on had to be redone, due to swallowed exceptions and bad (often, really bad) logic. Some of her work is even on TDWTF ;)
Admin
Involuntarily?
Admin
^^ So much this.
The only scenario where this kind of optimization would help is a situation where the actual joins caused the slow-down, rather than the WHERE condition. If you have that problem, it's time to revisit your database design.
The implementation in this article was probably due to the original dev thinking "I don't want all these 'for' statements to run if there's no data, because that will just waste a bunch of CPU power." I have actually worked with programmers who would think that way.