• (disco)

    isTableEmpty('/t/1000')


    Filed Under: DTFY

  • (disco)

    So let me guess, the other method basically calls the same thing. So it's making two trips to the database for every table. Not to mention the real WTF being using raw ADO.NET

  • (disco)

    ewwwww!

    [image]

    Doing a select all followed by return ds.Tables[0].Rows.Count > 0 would be bad enough, but iterating through as well?

  • (disco)

    Tina Two Times:

    I'm gonna go get the tables, get the tables.

  • (disco)

    11 posts were split to a new topic: Mod sign

  • (disco)

    Doh!

    Why do people do stuff like that???

    What is wrong with getting the list to process (with appropriate where clause/aggregates) an then bail out if the result is empty ?????

    Then again, if no one made stuff like that, this site would be boring....

  • (disco) in reply to Yazeran
    Yazeran:
    What is wrong with getting the list to process (with appropriate where clause/aggregates) an then bail out if the result is empty ?????
    The way it's implemented in the article is awful, but the sentiment is occasionally useful.

    Running a COUNT(*) upfront with a simplified query and using that to short circuit out before running the Big Kahuna query with all sorts of aggregates and shit can help you avoid excessive database loads, particularly in cases where indexing is not optimized.

    Now, TRWTF is a system that I inherited. The reporting component took an arbitrary query (the result of unsanitized concatenation if you must know), ran it once. It would count the rows it returned and determine how many threads it wanted to run and throw that away. Each thread would then mangle the query to get the correct 'page' of output. Each thread would then run the mangled query and output its results to a text file. The master thread would then merge the text files back together and continue onwards. Records would fall through the cracks, get duplicated, etc., things would take WAY longer than they needed to be, getting worse as number of rows increased. Nobody could figure out why.

  • (disco)

    All we need now is for there to be a critical table somewhere that has to have data or it will cause the app to crash. Oh no! The app's crashed! Make it not crash!

    Okay, says the developer, I'll make it Not Crash. I'll add some extra functionality to IsTableEmpty to add a row of dummy data. Here's the pseudocode:

    If (table = thistable) then Insert into thistable ... else if (table = thattable) then Insert into thattable ... else if (table = theothertable) then Insert into theothertable ... else ... end if end if end if

    There. I stopped it crashing for you.

  • (disco)

    Actually I've found it. TRWTF is that IsTableEmpty is broken. This line:

    "SELECT * FROM "+s,"server=;database=; User ID=; Password="

    ... will result in this string being built for e.g s = users:

    SELECT * FROM usersserver=;database=; User ID=; Password=

    (Extra space added so as to prevent semicolon P being interpreted as an emoticon.)

  • (disco) in reply to Weng
    Weng:
    What is wrong with getting the list to process (with appropriate where clause/aggregates) an then bail out if the result is empty ?????

    The way it's implemented in the article is awful, but the sentiment is occasionally useful.

    Running a COUNT(*) upfront with a simplified query and using that to short circuit out before running the Big Kahuna query with all sorts of aggregates and shit can help you avoid excessive database loads, particularly in cases where indexing is not optimized.

    Now, TRWTF is a system that I inherited. The reporting component took an arbitrary query (the result of unsanitized concatenation if you must know), ran it once. It would count the rows it returned and determine how many threads it wanted to run and throw that away. Each thread would then mangle the query to get the correct 'page' of output. Each thread would then run the mangled query and output its results to a text file. The master thread would then merge the text files back together and continue onwards. Records would fall through the cracks, get duplicated, etc., things would take WAY longer than they needed to be, getting worse as number of rows increased. Nobody could figure out why.

    True, but in that case the shift from simple get it to get the count should be based on some profiling beforehand.

    As regards your Inherited system, then that sounds just horrible.....

    I mean what was the original rationale for splitting the query in the first place?? I can not see that any performance gain could in any way result of that....

  • (disco) in reply to Matt_Westwood

    Use code blocks. Also, you misread the code, turning a , into a +. The function takes two aruments, not one.

  • (disco)

    Discourse inquired about employing a Cleaner, but the agency told Jeff that he would be referred to The Hague on charges if he forced a Cleaner to plumb through the Discocode.

  • (disco)

    So "pretty please... with sugar on top" quit trying to be a goddamn programmer.

  • (disco)

    The first rule of SqlDataAdapter is never to use SqlDataAdapter.

    And - a beautiful TDWTF - the more you look at the code, the more "beauty" is revealed.

  • (disco)

    Looks like this software developed in Maerica!

  • (disco) in reply to Nagesh

    Shouldn't there be one of those

     

    <!-- -->

     

    <!-- -->

    8 months later

     

    <!-- -->

     

    <!-- -->

    banners around that post or something?

  • (disco) in reply to Weng

    Whilst running a count(*) can be useful, in this case they're counting everything and not using the count. All you need is select top 1 * (or equivalent in the db of your choice) if you really have to know whether the table is empty or not.

  • (disco)

    A bit inconsistent about the tables checked vs tables searched.

    But - points deducted for not reading the table names from query string parameters - needs more SQL injection.

  • (disco) in reply to Weng
    Weng:
    Running a COUNT(*) upfront with a simplified query and using that to short circuit out before running the Big Kahuna query with all sorts of aggregates and shit can help you avoid excessive database loads, particularly in cases where indexing is not optimized.

    Depending on the WHERE clause, running a SELECT COUNT(*) FROM TABLE can require actually retrieving every matching row from the table internally. It's much better to SELECT COLUMN FROM TABLE FETCH FIRST ROW ONLY (or equivalent), which performs better for any WHERE, even one that matches a million rows.

    (FETCH FIRST ROW ONLY is DB2 lingo for retrieve any 1 row; most databases have some similar construct.)

    See, semantically, SELECT COUNT(*) is efficient if you actually need a count of all the rows. It is inefficient if all you want to know is if ANY matching row exists.

    But don't do this, the limit doesn't help: SELECT COUNT(*) FROM TABLE FETCH FIRST ROW ONLY


    One of the best cleanups I've done on a program as a cleaner was one that had logic similar to that in the article:

       if (rowExistsTable()) updateRowTable();
    

    That section was a little slow and I thought, why waste time doing that? So my first change was:

       updateRowTabley(); /* and ignore not found error */
    

    That killed half the runtime, which was about 3½ hours. Then, well, the query wasn't really bad; the database profiler projected it would take only 0.02 seconds to run.

    Hmmm...do you suppose the program could be slow because it's looping to that query 1,500,000 times? 1,500,000x2x0.02 = 16 hours. Well, the database was doing a bit better than the profiler said, but still.

    So let's add a targeted index. Hmmm....now the profiler says 0.0001 seconds. Suppose that will help?

    Runtime of 3½ hours reduced to, seriously, 12 minutes. Or less. Because of that one stupid query logic section.

  • (disco) in reply to CoyneTheDup
    CoyneTheDup:
    looping to that query 1,500,000 times

    That sounds like time to merge separate updates to the same table into one call?

  • (disco)

    In keeping with the spirit of the article image...

    "I'm sorry, did I break your concentration? I didn't mean to do that. Please, continue, you were saying something about best intentions. What's the matter? Oh, you were finished! Well, allow me to retort. Is the table empty or not?

    "SELECT * FROM TABLE"

    "What database are you using?"

    "What? What? Wh - ?"

    ""What" ain't no database I've ever heard of. They speak SQL in What?"

    "SELECT * FROM --"

    "SQL, motherfucker, do you speak it?"

    "Yes! Yes!"

    "Then you know what I'm sayin'!"

    "Yes!"

    "Tell me if the table is empty or not."

    "SELECT * FROM --"

    "Query the entire table again! Query the entire table again, I dare you, I double dare you motherfucker, query the entire table one more Goddamn time!"

  • (disco)

    I look back with delightful nostalgia on the day when I found someone had implemented, in a Java program, a bubble-sort on the elements of an ArrayList. In fact, the method in question had two bubble-sorts, side by side, differentiated by an "if", which were dependent on which property the elements of the list were to be sorted on.

  • (disco)

    This is just the cost of abstraction. You don't want to do a CPU-intensive join and retrieve all rows from the other tables when one of the tables is empty.

    It works after all. If it's slow you just but another couple of servers.

  • (disco) in reply to martijntje
    martijntje:
    This is just the cost of abstraction.

    This?

        private bool IsTableEmpty(string s){
                SqlDataAdapter adapter = new SqlDataAdapter("SELECT * FROM "+s,"server=****;database=****;User ID=****;Password=****");
                DataSet ds = new DataSet();
                adapter.Fill(ds);
                int count=0;
                foreach(DataRow row in ds.Tables[0].Rows){
                        count+=1;
                }
                if(count>0)
                        return false;
                return true;
        }
    

    If so, no, it's not.

  • (disco)

    I remember a section of code which, in order to determine if a table exists, would run the query:

    select 1 from %s where 1=0
    

    And return 0 if any SQL error occurred while executing the query, 1 otherwise.

  • (disco) in reply to PJH
    PJH:
    If so, no, it's not.

    It's the sort of thing you get when someone has no idea what a database does, and so decides that it's just a key-value store with a funny syntax.

  • (disco)
    Remy:
    `"server=****;database=****;User ID=****;Password=****"`

    For future articles, please anonymize data instead of censoring it. Maybe throw in a hunter2 or a brillant or make references to Telligent or Initrode or other companies that don't exist.

  • (disco) in reply to ben_lubar

    For that matter, why does that method have connection parameters and credentials hard coded in it?

    https://youtu.be/OWwOJlOI1nU

  • (disco) in reply to ben_lubar

    Yeah, I was wondering if C# would build prepared statements using **** :wtf:

  • (disco) in reply to Matt_Westwood
    Matt_Westwood:
    Actually I've found it. TRWTF is that IsTableEmpty is broken. This line:

    "SELECT * FROM "+s,"server=;database=; User ID=; Password="

    ... will result in this string being built for e.g s = users:

    SELECT * FROM usersserver=;database=; User ID=; Password=

    (Extra space added so as to prevent semicolon P being interpreted as an emoticon.)

    ahhh, it's good to have the swearing @Matt_Westwood back.

  • (disco) in reply to TVJohn

    Actually, in a lot of these cases, you don't have to check whether the table is empty or not: If you try to perform a select or update from an empty table, you'll know from the response that it's empty (either you get no rows back, or you get back an indication that no rows were updated).

    Also, if you have software that goes haywire if a list of something turns out to be empty, then the fact that you are using your database insanely inefficiently is only one of your problems.

  • (disco) in reply to PleegWat
    PleegWat:
    That sounds like time to merge separate updates to the same table into one call?

    Would be nice if I could.

  • (disco) in reply to CoyneTheDup
    CoyneTheDup:
    Depending on the WHERE clause, running a `SELECT COUNT(*) FROM TABLE` can require actually retrieving every matching row from the table internally. It's much better to `SELECT COLUMN FROM TABLE FETCH FIRST ROW ONLY` (or equivalent), which performs better for any WHERE, even one that matches a million rows.

    I think that depends on the database and its optimizer. I'll confess, I don't have a lot of expertise outside of Microsoft SQL Server, but I've used that for 15 years across multiple versions and seen how optimizer changes have made it capable of smartly realizing when you are just doing an existence check and it only needs one row. Of course, you need to be doing the check on the server side for the optimizer to realize it. At the risk of digressing, if you run this in any recent version of SQL Server with execution plans on, you can see that you only have problems if you just try to get a straight count (the third query):

    -- do it on the server, SQL Server realizes they are the same
    if exists (select * from SomeBigTable) print 'YES' else print 'NO'
    if exists (select top 1 * from SomeBigTable) print 'YES' else print 'NO'
    
    -- just get a value to do the check in the client, and the first one is slower
    select count(*) from SomeBigTable  
    select top 1 'dummy value' from SomeBigTable
    

    So of course, if you don't know what you're doing (as it's clear was the case with the original writer of IsTableEmpty), or you don't know what the back end database is going to be and how the optimizer will work, yours is probably a good rule of thumb; but possibly a better rule of thumb is Involve A Database Expert In Data Intensive Development! :smile: (Based on the rest of your post, it's obvious that you are very aware of how tiny tweaks can make the difference between lightning fast and agonizingly slow database operations.)

    I mean, in the ideal world, not only would you write a much better IsTableEmpty, but... you might not even need it. Filling up table objects and doing the joins yourself in the front end is probably a terrible idea versus just calling a database stored procedure (or view or even raw query) to get those, and any remotely good optimizer is going to already intelligently handle some of the tables possibly being empty and getting and joining all that data more efficiently than the client app. But that probably goes without saying too!

  • (disco) in reply to beeporama
    beeporama:
    So of course, if you don't know what you're doing (as it's clear was the case with the original writer of IsTableEmpty), or you don't know what the back end database is going to be and how the optimizer will work, yours is probably a good rule of thumb; but possibly a better rule of thumb is Involve A Database Expert In Data Intensive Development!

    That's exactly the thing.

    DB2 is smart about this as well; if you use EXISTS(SELECT *) it will do the top 1.

    But the COUNT(*) function requires the determination of row count, which means that, properly, the database engine must find and enumerate all rows matching the WHERE clause.

    Our people developed a bad practice in the early days of DB2 (V1 and V2) of doing a SELECT INTO (one-step query/return) in this form (COBOL):

         EXEC SQL
            SELECT
               COUNT(*)
            INTO
               :WS400-COUNT
            FROM TABLE
            WHERE criteria
         END-EXEC
    
         IF WS400-COUNT > 0 ....
    

    In that usage, the database engine is forced to count all the rows because the test for zero is outside its domain of operations. If all the WHERE clause predicates are indexable, the performance isn't usually too bad. If there are a lot of rows and any non-indexable column is involved, the performance can suck big time.

    And this was pure laziness, because even then, when FETCH FIRST ROW ONLY had not been added as a feature, there were ways to do this efficiently. One is, for example, set up a cursor and fetch the first row.

    "But why go to all that trouble when DB2 can count 100K rows really fast?" (I should emphasize that this is a paraphrase quote from a lead programmer.)

    Sigh.

    I published a strong anti-pattern warning; despite that, we're still finding and fixing these crummy things.

  • (disco)

    I think my eyes got just a little bit wider with each line of "IsTableEmpty" until they were about ready to pop out of my head.

    A previous co-worker at my current employer had this weird anti-pattern where should define a Linq2Sql query, then do if(query.Count() > 0) dataGrid.DataBind(). I repeatedly tried to point out to her that this just resulted in the grid not being updated if her query returned no results. Despite the fact that all she did for the project was CRUD screens, so none of these queries were particularly expensive to require a COUNT(*) query first, and the grid should always be updated if a search resulted in no records. I even tried introducing her to the "EmptyText" property of the data grid, but she insisted on using this anti-pattern for every page she worked on.

    And that was the milder stuff she did. After she left, everything she had worked on had to be redone, due to swallowed exceptions and bad (often, really bad) logic. Some of her work is even on TDWTF ;)

  • (disco) in reply to tenshino
    tenshino:
    After she left

    Involuntarily?

  • (disco) in reply to EatenByAGrue

    ^^ So much this.

    The only scenario where this kind of optimization would help is a situation where the actual joins caused the slow-down, rather than the WHERE condition. If you have that problem, it's time to revisit your database design.

    The implementation in this article was probably due to the original dev thinking "I don't want all these 'for' statements to run if there's no data, because that will just waste a bunch of CPU power." I have actually worked with programmers who would think that way.

Leave a comment on “The Cleaner”

Log In or post as a guest

Replying to comment #:

« Return to Article