• jub (unregistered) in reply to Cody

    > The wtf is the loop to make individual inserts.  (If I'm reading correctly.)

    Yes, because this not only prevents the database server to create a proper execution plan, it also forces the database server and the business logic server to transfer a gazillion records over the network. I/O is expensive. The posted solution keeps everything on the database server, which will not only prevent this network access but in addition allows the database server to decide on the optimal execution plan.

  • nutscrape (unregistered) in reply to foxyshadis
    foxyshadis:
    Anonymous:

    It requires three lines, but its pretty easy to do using atomicitiy and transactions:


    1. Select out the messages destined for the queue out of the input join into a temporary table.
    2. Delete or flag those rows from the input table as being queued.
    3. Insert from the temp table into the queue table. 

    Or without using a temp table:

     1. INSERT ... SELECT .... into the queue table, use a column in the queue table to flag a row to indicate that it is newly added.
     2. Flag or delete rows in the input tables if they're in the queue marked with the newly added flag.
     3. Remove the newly added flag from the rows in the queue table.
     

    Why be so circuitous? Just drop the queue table and recreate it every time. Make it temp, if you have the memory. When the SMS engine starts, it'll drop and populate the queue, then start pumping them out and marking them sent in the main database, and when done (or nearly done) can start the process over. (The SMS engine need not be the SMS server, it should be a load-balancing distributor if there's more than one sending server.)



    That design is a small WTF because any problem in the sending loop and you've just lost track of which messages have already been sent.


  • Ok with the WTF but something else smells here... (unregistered)

    Ok, the wtf is you can use  the set insert bbut something else smells here!

     2 hours for a 2000 insert, let's see it's 7200 secs / 2000 = >3 sec per insert,

     well, either numbers (for times) are a bit inflated or the database is really a crappy one!!

    [pr] 

     

  • Pablo (unregistered) in reply to maweki
    maweki:

    JOIN is evil, indeed very evil.

    You can conclude that JOIN was invented by Oracle :D
     

    Must be a Java developer  

     

  • Martijn (unregistered) in reply to RH

    The real WTF is that I didn't even go "WTF"; it's just how it goes in any company.

    Managers don't listen to people smarter than them; it makes them feel dumb. In order for a manager to make the right choices, you have to manipulate him into slowly coming to the same conclussion you could have just told him right away. Obviously you won't get credit for the idea and the manager will probably get promoted, but atleast you won't be suffering from the dumber choice.

  • U-wing (unregistered) in reply to ParkinT
    ParkinT:

    Isn't that typical?

    Don't solve the software problem.  Just throw more hardware at it and hope it keeps working!

     

    How many times have you seen an example of this in your career?

    Yes, I am asking for some responses to the question.

    More often than not. I'd say in 90% of the cases more than 90% of the problem solving was done by throwing more hardware. ;-) Hardware is ALWAYS cheaper than people. Probably because the cost is easy to calculate, even in Excel.

    But seriously, I've seen similar code in a similar case. I had the privilege of being the doer of things and my first idea was to use just SQL Server tables for inbox, outbox and sent. I ended up having a combination of flagging and queing in db, sending bulk with flat files.

    Couple of notes though: Nothing wrong with joins and tables, just keep it simple. Nothing wrong with loops either. Nothing wrong with adding layers of logic. But the idea of dynamic SQL in any logic layer is almost always bad. Like here, would have been better to put that in db too. Performance is a good reason to use stored procedures.

    In my case I was amazed with the speed the db engine handled the stuff with a couple of el cheapo pc boxes. I also needed to "architect" my initial solution to smaller individual services instead of one SMS server service. You are very likely to need a SMS dispatcher type of service to route to different queues/operators etc.

    And here's the real WTF in my two-cent SMS solution: routing to different operators was easy, you can even buy a gateway module, instead of figuring that crap out yourself. And yes, the actual bottleneck is there, not in your db engine, whatever it is. But there was this one operator which made you use their gateway, a VB6 ActiveX, quality not something you put on your production server...and it had code examples and everything just like this WTF story...AND it took hours to work through a miserable amount of msgs, say 2000....and it would crash from time to time (yes, you better think of retry options!)...and their "service" was the most expensive one...and they managed to put most of the smaller competitors (who did things much better from the code point of view) out of business.  AND all the time, financial analysts kept hyping this one operator's software as their crown jewel.

    So, to add groups and support, say, 20 million users is a different story. But then you can probably afford to throw in some serious hardware to handle all those extra tables...

  • brazzy (cs) in reply to RogerC
    RogerC:

    Where "all else fails" is defined as "we have a good solution but we don't like where it came from". This attitude alone is sufficient to keep mankind from ever reaching the stars.

    Funny you would put it that way :)

    In Mike Resnick's novel "Birthright", the human race, at its apex of power while controlling the entire galaxy, tries to cross over to other galaxies. They figure out an engine technology that can get them there, but it can't take enough payload for the massive life support systems necessary for the years it would still take. Then they find a lifeform that can act as symbiotic life-support system for a human: it can live off human waste and produces edible food and breathable air as its own waste. Several human pilots are mentally conditioned to overcome their disgust at eating alien poo and set out in separate ships (redundancy). About half-way, they all commit suicide by disconnecting from the symbiont because the conditioning could not overcome their aversion to having to share this great achievement with a non-human.

  • Noogen (unregistered) in reply to U-wing

    This is not Java.  It's dotNet.  How do I know?  DataTable is a dotNet thing.  They obviously implement their own Data Access Block.

    And yes, the WTF is a loop thing.  The WTF perspective is that this can simply done as the author suggested and that this is not done through a Windows Service.

    But if the design is to do this through a Service because this or other operation must hit another database server, this is the way to go (unless you can DTS it).

    Okay, now back to the WTF of why it takes so long.  From the code that is presented, the custom DataAccess helper can be openning and closing connection for each insert.  This is the standard behavior of most DataAccess helper because the rule is that you should always close a connection so that you don't run out of connections.  Open and closing connection is a time consuming process.

    To improve performance, the developer can open and close connection outside of that loop and use that single connection.  Or in c#, use the using statement to automatically dispose a connection.

    using (SqlConnection conn = DataHelper.OpenConnection(connstring))
    {

    //loop goes here

    }

    Another WTF is that the command is being built and execute each time.  Even using a single connection, a command has to be build and initiate communication with the DB Server (ExecuteNonQuery) so it can prepare to receive and run the query.  So if it must be done this way, the developer should've batch up and execute all at once (hint: StringBuilder).

    In the end, there is no defend for doing it this way if it can all be done on the database as the author suggested.

  • tom (unregistered) in reply to Cody

    you're talking about 1 transaction vs 2001 transactions, not to mention all of the network traffic, opening/closing connections, etc.

  • Tragomaskhalos (unregistered)

     

    This is the first time for a while that I've physically winced at a WTF. What's so painful is not the original coding stupidity - there are plenty of people who don't understand diddly about databases after all - it's the fact that JM identified a perfectly good fix but there was no procedure to put it in place. Idiocy happens, the real WTF is not allowing non-idiots to sort out the ensuing mess.

     

  • Keith Gaughan (can't be bothered to log in) (unregistered) in reply to Cheong
    Anonymous:
    adamsane:
    Yes, they are identical functionally.

    However, on some db systems they will work differently.

     Command #1:

    1. Join all rows together
    2. Use the where clause to exclude rows
    3. Return the result set

    Command #2:

    1. Join only the rows were the On clause is true
    2. Return the result set

    Most of the large (expensive) DB like oracle and MSSQL will convert this to the sane execution plan.

    Yes. In MySQL(as of 4.X) the second statement is almost always faster than the first one at some degree.

    Haven't studied if there's optimization in 5.X yet.

    I dunno, Chuckie. I remember reading that one of the trivial initial query optimisations is convert queries in the former form into the latter. It is, after all, a trivial mechanical optimisation that requires no knowledge of the DB schema itself.

    A quick test I ran on some (sufficiently large) data I'd handy and not only did both forms produce identical execution plans, but the timing differences, both in terms of median and average run time, were statistically insignificant, which leads me to believe that what I heard was correct.

    I tested this on MySQL 4.1.10.

    I wish programmers would check things like this out first, and maybe learn a thing or two about statistics.

    Capcha: knowhutimean 

  • Matt (unregistered)
    Alex Papadimoulis:

    Consultants had minimal supervision and an hourly billing rate that could only be expressed using three digits.

    Please say it was also expressed using a decimal point, possibly after the first digit...
     

  • Keith Gaughan (can't be bothered to log in) (unregistered) in reply to Ok with the WTF but something else smells here...
    Anonymous:
    Ok, the wtf is you can use  the set insert bbut something else smells here!

     2 hours for a 2000 insert, let's see it's 7200 secs / 2000 = >3 sec per insert,

     well, either numbers (for times) are a bit inflated or the database is really a crappy one!!

    [pr]

    Or it's a slow network; or something is making getting an exclusive lock on the Queue table expensive; or connection pooling might not be enabled, which would ramp up the amount to time required to run a simple insert significantly (this depends on the internals of DataHelper, which are probably equally nasty); or Alex elided a few details and the actual number was more like 20,000; or any number of other things or a combination of factors.

    BTW, SQL Server is explicitly named as the DBMS in question. Not surprising, seeing as the app is written in C# and MS shops will tend to go with MSSQL rather than a DBMS from some other vendor.

    Captcha: mustache 

  • Jason (unregistered)

    I have several vbs files that were developed by contractors that do similar things with record sets, and looping through single table record sets to query a second table - rather than run a join. 

    The files also connect to an Oracle database via shelling out and running SQLPlus...  The query to extract data from the Oracle instance is run, and then run again with a count(*) to get the number of records...  yes, the number of records. 

    There are no prepared statements being used on the inserts or updates, there are no stored procedures, at last count, there are 12 round trips with the data for a relatively simple ETL.  My boss asked me to look at why it was taking 5 hours to complete a run, after the PM, and development lead had no answers.  It turns out there was never a code review on the code from the off shore team, only a glance at the Unit Test results, and then a release to production.

    My recommendation was to pour bleach in my eyes so I never see that code again, and start over with a script to Extract the data, and then some stored procedures to insert, and transform the data.

     

    CAPTCHA is quality...   PERFECT! 

  • Jason (unregistered) in reply to maweki
    maweki:

    JOIN is evil, indeed very evil.

    You can conclude that JOIN was invented by Oracle :D
     

     

    Buwahahaha, WTF post of the year!

    How do you recommend getting relational data out of a relational database? 

     CAPTCHA - clueless - another PERFECT!
     

  • anonymous (unregistered) in reply to SJ

    Anonymous:
    The WTF is if the contractor who is more qualified than the consultants remains a contractor.

    No, the WTF is, the nitwit who submitted this doesn't even know why the "consultants" were paid more than the "contractors", and thinks the two groups did "the same thing".

    Now, I'm not saying a consultant who thinks a procedural loop through a result set is the right way to perform a join in SQL should be working on DBs.  But, a simple code monkey should know his place.  This is like a brain surgeon not knowing how to change the oil in his car, and the mechanics laughing at him for being stupid. 

  • Keith Gaughan (can't be bothered to log in) (unregistered) in reply to Noogen

    Anonymous:
    Okay, now back to the WTF of why it takes so long.  From the code that is presented, the custom DataAccess helper can be openning and closing connection for each insert.  This is the standard behavior of most DataAccess helper because the rule is that you should always close a connection so that you don't run out of connections.  Open and closing connection is a time consuming process.
    Or... it could do what most well-written large systems do and maintain a connection pool to avoid the overhead. They're pretty trivial to write, and the pool need only be small.
    Anonymous:
    To improve performance, the developer can open and close connection outside of that loop and use that single connection.  Or in c#, use the using statement to automatically dispose a connection.
    Yup, that would indeed be better, though far from perfect. Then again, DataHelper itself could be keeping the connection open between calls; we have no way of knowing.
    Anonymous:
    Another WTF is that the command is being built and execute each time.  Even using a single connection, a command has to be build and initiate communication with the DB Server (ExecuteNonQuery) so it can prepare to receive and run the query.  So if it must be done this way, the developer should've batch up and execute all at once (hint: StringBuilder).
    Or, better yet, use a prepared statement. That way, not only is there less data sent over the wire (this depends on whether the DBMS's wire protocol has support for prepared statements), but the execution plan is only built once and you're automatically protected from SQL injection attacks. The SQL statement's also a lot easier to read and maintain too.

    Captcha: null 

  • poochner (cs) in reply to An apprentice
    Anonymous:
    Anonymous:

    the ones that don't automatically restart when the computer is rebooted.

    WTF? Have you never heard the startup folder in the start menu?

    :D 

    Rich 

    I think programs in the startup folder (and registry Run* entries) are not processed until someone actually logs in. While a service can be run in the background even without logged users.

    And a system can be configured to login a specific user when it boots up.  Not that this is a good thing, but it does work around idiocies like this.  For example, a stock quote system I used to have to work with was a desktop app; we had to make the system autologin the user so a reboot would bring the system back to a working state.  Oh, and it was really several apps, that had to be started in the right order, waiting until the previous app was completely initialized before starting the next, and the times weren't dependable.  This is where tools similar to the PFB, along with window scrapers, come in handy--using our own WTFery to beat the venders' WTFery.
     

  • Wombat (unregistered) in reply to anonymous
    Anonymous:

    Anonymous:
    The WTF is if the contractor who is more qualified than the consultants remains a contractor.

    No, the WTF is, the nitwit who submitted this doesn't even know why the "consultants" were paid more than the "contractors", and thinks the two groups did "the same thing".

    Now, I'm not saying a consultant who thinks a procedural loop through a result set is the right way to perform a join in SQL should be working on DBs.  But, a simple code monkey should know his place.  This is like a brain surgeon not knowing how to change the oil in his car, and the mechanics laughing at him for being stupid. 

    Sort of, but not entirely.  The "consultants" should have known better than to try their hand at programming if that wasn't what they were there for.

    To continue the analogy, it's like the brain surgeon changing his own oil in his car and using olive oil instead of engine oil and damaging his car in the process.  Then the mechanics are justified in laughing at him because he should have known not to try to change the oil if he didn't know what he was doing.

     

  • I8ABug2Day (cs) in reply to Not a consultant

    Ahh........  Here's mine:

     

    "Stupendous Thinker-Upper of  Stuff That Costs a LOT"

     

  • Positively Evil Person (unregistered) in reply to Martijn
    Anonymous:

    The real WTF is that I didn't even go "WTF"; it's just how it goes in any company.

    Managers don't listen to people smarter than them; it makes them feel dumb. In order for a manager to make the right choices, you have to manipulate him into slowly coming to the same conclussion you could have just told him right away. Obviously you won't get credit for the idea and the manager will probably get promoted, but atleast you won't be suffering from the dumber choice.

    At a company I worked for a few years back, my boss moved and left the company. They hired a friend of the upper boss to take my old boss's place. The new guy was completely inept (as opposed to my old 'ept' boss), and it was clear he would never change. I decided to get rid of him with ingenuity. I busted my butt to get a highly visible pet project of the CEO done waaaay ahead of schedule, and publicly gave my boss 100% of the credit. He got promoted, and we got a new boss.

     

  • Don (unregistered)

    There's another WTF in the code - the insCmd variable should have been created once outside the loop, then bound to different values and executed in the loop.  This would probably cut the time by an order of magnitude, even with the otherwise brain-dead design.  Array-insert would be better, and of course the consultant came up with the most efficient way - tell the optimizer the whole problem and let it figure out how to do it.

     

  • anonymous (unregistered) in reply to achille
    achille wrote the following post at 10-18-2006 5:25 PM:
    [image] rbriem:

    The INNER JOIN suppresses any records from either table that don't have a matching key field in the other (pretty sure - been a while since I SQL'ed); IIRC, this is the default behavior for some databases, so this *may* be functionally equivalent.

    Most databases will optimise the query and execute exactly the same in both cases....

     

    most, but not all.  sometimes oracle will run faster with query #1, sometimes oracle will run faster with query #2.  it all depends on how the CBO works the query out.

     

    sad but true, sometimes going from query #2 to query #1 would speed up slow queries by 200-300%, with cache cleared.

    at least that was the case last time i had to use RAC (9i) in a data warehouse environment. 

  • sms is a joke (unregistered)

    "Text 782537 Your $1.99 Joke-A-Day Subscription" services."

    The real WTF is that anyone uses those services.  They deserve 3 hour delays.

     CAPTCHA: craptastic
     

  • Noogen (unregistered) in reply to Positively Evil Person
    Anonymous:
    Anonymous:

    The real WTF is that I didn't even go "WTF"; it's just how it goes in any company.

    Managers don't listen to people smarter than them; it makes them feel dumb. In order for a manager to make the right choices, you have to manipulate him into slowly coming to the same conclussion you could have just told him right away. Obviously you won't get credit for the idea and the manager will probably get promoted, but atleast you won't be suffering from the dumber choice.

    At a company I worked for a few years back, my boss moved and left the company. They hired a friend of the upper boss to take my old boss's place. The new guy was completely inept (as opposed to my old 'ept' boss), and it was clear he would never change. I decided to get rid of him with ingenuity. I busted my butt to get a highly visible pet project of the CEO done waaaay ahead of schedule, and publicly gave my boss 100% of the credit. He got promoted, and we got a new boss.

     

     NOOOOOOOOOOOOOOO!!!  This is a WTF in itself.  You're just promoting idiocy.  Your action result in many future WTF.

    You remind me of that Jerry Seinfield episode where Elaine promoted that mail guy because she was afraid of him.

  • Pablo (unregistered) in reply to Jason
    Anonymous:
    maweki:

    JOIN is evil, indeed very evil.

    You can conclude that JOIN was invented by Oracle :D
     

     

    Buwahahaha, WTF post of the year!

    How do you recommend getting relational data out of a relational database? 

     CAPTCHA - clueless - another PERFECT!
     

    I know this one, you select from each table individually, then connect the data together in the code. Apparently it's faster than using those nasty joins.

     

  • Rich (unregistered) in reply to An apprentice
    Anonymous:
    Anonymous:

    the ones that don't automatically restart when the computer is rebooted.

    WTF? Have you never heard the startup folder in the start menu?

    :D 

    Rich 

    I think programs in the startup folder (and registry Run* entries) are not processed until someone actually logs in. While a service can be run in the background even without logged users.

    Have you never heard of autologin?

     :D (again)

     

    Rich  
     

  • foo (unregistered)
    Alex Papadimoulis:
    INSERT INTO Queue (...) SELECT ... FROM Messages m, Recipients r, ...

    This is an example of knowing your tools. JM knows his database tool. The other blokes obviously didn't.  Either that or they were putting in obvious inefficiencies so that they could bid and upgrade project that was n-times faster.

  • sdether (unregistered) in reply to ParkinT
    ParkinT:

    Isn't that typical?

    Don't solve the software problem.  Just throw more hardware at it and hope it keeps working!

     

    How many times have you seen an example of this in your career?

    Yes, I am asking for some responses to the question.

     

    Plenty of times, but it's not a black and white question.

    "Oh, god this part of the system is bringing the web cluster down once a week. To fix it we need to re-architect the thingamajog workflow."

    "Ok, how many hours will that take to code, QA and release?"

    "At least 200 hours"

    "I see, and if i throw another $2000 computer in the farm, it will spread the load to alleviate the problem."

    "I guess so."

    "Well, what's it gonna be, new computer or will you work for less than $10/hour?"

    Sure some things are systemic and will cause untold costs down the road if not fixed, but others are not worth fixing with engineering time, when hardware will solve the problem.
     

  • Charles (unregistered) in reply to Pablo
    Anonymous:
    Anonymous:
    maweki:

    JOIN is evil, indeed very evil.

    You can conclude that JOIN was invented by Oracle :D
     

     

    Buwahahaha, WTF post of the year!

    How do you recommend getting relational data out of a relational database? 

     CAPTCHA - clueless - another PERFECT!
     

    I know this one, you select from each table individually, then connect the data together in the code. Apparently it's faster than using those nasty joins.

     

    Or, just put all the data into one table, of course... 

  • Luciano Mollea (unregistered)

    three hours to five minutes for 2000 rows...

    mmmm... been there,, done that.

    some index missing, loads of tables that strictly follo all possible DB normalizations,

    I guess that in one of that tables although there was a primary key there has beeb duplicate rows... 

  • Oliver Klozoff (cs) in reply to Anon
    Anonymous:

    CodeRage:
    Why not use just ONE table, and have some sort of STATUS field on it, and set such status field to "SENT" or similiar after each messages was processed/sent/whatever?  Really, given the need to fix the problem quickly, our hero's solution was great, but WTF, what kind of design is this in the first place?

    Separate queue tables are good for processing transactions that may fail and be re-tried multiple times.  You can have a single "INSERT INTO foo SELECT FROM ..." that executes in a couple of seconds.  Then you have a process (or set of processes) that work through those sending the messages.  This is why MSMQ exists.

    Where I work, we have a table that's setup much like the one CodeRage described.  It was put together that way about 5 years ago, when it only had to track the status of a few thousand entries.

    That table has nearly a million rows in it now, and almost all of them have been processed and are VERIFIED=1; the query that searches for VERIFIED=0 gets slower and slower as we add new clients, and it now times out when the database server is busy doing other things.

    verified    count      
    ----------- -----------
    0           2625
    1           935590

    If the original developer had created a separate 'processing' table instead, the query would be much, much shorter. Plus, it would be easier to see if something has been waiting a very long time to be processed (of those 2625 unverified entries, 2527 of them are from before October 1st. While this is not unusual, this normally means that the item is stale and should be removed.)

    Having a separate 'processing' table that's very small has another advantage: You don't need as many indices on it (you may not need any, if it's small enough), which speeds up the insertion/deletion speed.

  • Keith (unregistered) in reply to ahmusch
    Anonymous:
    CodeRage:

    Honestly, I think the real WTF here is using the technique of copying a bunch of rows from one table to another to indicate a change in status. This sounds like a really stupid design.  I guess something like this might need to be done if there were two different databases and systems involved, but the optimization done by our hero would indicate that these two tables are in the same database.

    Why not use just ONE table, and have some sort of STATUS field on it, and set such status field to "SENT" or similiar after each messages was processed/sent/whatever?  Really, given the need to fix the problem quickly, our hero's solution was great, but WTF, what kind of design is this in the first place?

    Please fill me in if I'm completely missing something here!

     


    Such designs aren't uncommon, and have merit.  Think about it.  There's a queue table (Pending) and a history table (Complete).  The history table may be date oriented or partitioned for manageability and performance, especially for lookups and purges.  However, your batch broadcast shouldn't have to look through all your historical data to find only the Pending records.

     If one were designing it now, for a DBMS such as Oracle, you'd use composite partitioning (range by date, list by status), or you'd have a function-based index to allow you to highlight only the records in a Pending status.

     But if you want something simple, and you don't really understand all the features and functions of the underlying RDBMS -- and exactly how many code-monkey contractors do? -- you'd design it exactly in such a way.


    CodeRage's colution doesn't really fit, because this process was obviously sending out new messages to subscribers. The code sample is where the message is created and placed into the queue. Anon's solution has some merit. You have to remember that these systems can have a history that involves millions upon millions of messages. If the database is not well-engineered then that is going to be darn slow.

    I worked at a company in the same field which solved this problem this way. But they weren't content to leave it there, because they decided that the historical tables should be broken up to improve speed. There were twelve tables: log1, log2, log3 ... log12. The logs for the current month went into the appropriate tables. Of course that meant that each table would hold data from multiple years, for example, January 2003 and January 2004.

    I have to create an application that created reports based on this data across any specified date range. This was of course made much more interesting when I realised that all dates were stored in mm/dd/yyyy format as strings.

  • Rich (unregistered) in reply to brazzy
    brazzy:

    Funny you would put it that way :)

    In Mike Resnick's novel "Birthright", the human race, at its apex of power while controlling the entire galaxy, tries to cross over to other galaxies. They figure out an engine technology that can get them there, but it can't take enough payload for the massive life support systems necessary for the years it would still take. Then they find a lifeform that can act as symbiotic life-support system for a human: it can live off human waste and produces edible food and breathable air as its own waste. Several human pilots are mentally conditioned to overcome their disgust at eating alien poo and set out in separate ships (redundancy). About half-way, they all commit suicide by disconnecting from the symbiont because the conditioning could not overcome their aversion to having to share this great achievement with a non-human.

    WTF? What about all the beneficial bacteria in the gut? We're never alone.  

     

    Rich 

  • foo (unregistered) in reply to RH
    Anonymous:
    CodeRage:

    Honestly, I think the real WTF here is using the technique of copying a bunch of rows from one table to another to indicate a change in status. This sounds like a really stupid design.  I guess something like this might need to be done if there were two different databases and systems involved, but the optimization done by our hero would indicate that these two tables are in the same database.

    Why not use just ONE table, and have some sort of STATUS field on it, and set such status field to "SENT" or similiar after each messages was processed/sent/whatever?  Really, given the need to fix the problem quickly, our hero's solution was great, but WTF, what kind of design is this in the first place?

    Please fill me in if I'm completely missing something here!

     


    I'm not defending the WTF code, but CodeRage, you missed the fact that the data is coming from several tables, not just one.  They were cherry-picking values from across the data model and inserting them into a single record.  A good coder with deficient data modeling skills would see this as a good solution.

    However, since this is a "batch" and the data seems to be coming from far and wide, I would imagine that there is a whole lot of data duplication here.  A single record probably looks something like this:  
    (sender, recipient, time_sent, subject, message, etc.)  I draw this conclusion from the OP: "...the web UI would send the service a message with a list of recipients and the service would convert it to individually-addressed SMS messages and then send the messages to the appropriate queue."  The last three columns in the record are dupes.

    Additionally:  3 hours for 2000 records....16 rows a minute?  BS.  Something else must have been wrong.  Even 2000 records in 5 minutes sounds slow on a reasonably busy system.  And don't respond with "Well, perhaps the sender service was hammering the DB with updates and deletes."  It couldn't hammer the DB; It's quite obvious that they couldn't populate the queues fast enough to give the sender service any considerable amount of work to do to generate your scape-goat updates and deletes.

     

    The real WTF to all of this is that nobody suggested that they use a View. You know a View ... a big query that pretends to be a table. The DBA sets it up and the programmers think they have a magic table that automatically changes with the other tables. 

  • chocolate biscuit (unregistered) in reply to Charles
    Anonymous:
    Anonymous:
    Anonymous:
    maweki:

    JOIN is evil, indeed very evil.

    You can conclude that JOIN was invented by Oracle :D
     

     

    Buwahahaha, WTF post of the year!

    How do you recommend getting relational data out of a relational database? 

     CAPTCHA - clueless - another PERFECT!
     

    I know this one, you select from each table individually, then connect the data together in the code. Apparently it's faster than using those nasty joins.

     

    Or, just put all the data into one table, of course... 

     

    Yes!  First normal form, because first is the best, right?  Who'd want to be second, or sixth or whatever? 

  • Gsquared (cs) in reply to BradC

    MS-SQL actually runs both of these statements in exactly the same way.  Same performance, same CPU, same RAM, because they have the same optimization.

    I use the second format myself, but only for readability.  It makes more sense to me to have the condition of the join as part of the join statement.  Makes it clear that that part of the script is to handle the join, while the Where statement determines which subset of rows to return from either table.  Other DBAs use the other because it makes more sense to them and is more readable to them.

    Since it executes the same, it's just a consistency question.

  • anonymous (unregistered) in reply to Wombat
    Anonymous:
    Anonymous:

    Anonymous:
    The WTF is if the contractor who is more qualified than the consultants remains a contractor.

    No, the WTF is, the nitwit who submitted this doesn't even know why the "consultants" were paid more than the "contractors", and thinks the two groups did "the same thing".

    Now, I'm not saying a consultant who thinks a procedural loop through a result set is the right way to perform a join in SQL should be working on DBs.  But, a simple code monkey should know his place.  This is like a brain surgeon not knowing how to change the oil in his car, and the mechanics laughing at him for being stupid. 

    Sort of, but not entirely.  The "consultants" should have known better than to try their hand at programming if that wasn't what they were there for.

    To continue the analogy, it's like the brain surgeon changing his own oil in his car and using olive oil instead of engine oil and damaging his car in the process.  Then the mechanics are justified in laughing at him because he should have known not to try to change the oil if he didn't know what he was doing.

     

     

    It's more like the brain surgeon changing the oil in someone else's car, using olive oil instead of engine oil and damaging the car, and, then as the mechanics all laugh at him, he charges the other guy some money to have it fixed.

     

  • anonymous (unregistered) in reply to Gsquared

    USE TDWTF; 

    DROP WTF WHERE  WTF.WTF = "Not optimized" and WTF.LEVEL < SEVERE;

    COMMIT; 

  • Your Name is missing (unregistered) in reply to Hanneth

    My eyes...the goggles do nothing!

    As for the forms of select, on MS SQL Server 2000, the first form is about 6 times as slow as the second form. They will return the same results. I'm basing this off of a stored procedure that would take 3 hours to run and when I switched it from the layout of #1 to #2 it took only 30 minutes. I think 7 tables were being joined together. I can't  say anything about the speed of MS SQL Server 2005 as I keep being put on adding new features to old systems, reusing old code for a new site, or bug fixing.

    [image] BradC:

    So are these two SQL statements functionally identical? (and identical speed to run?)

    SELECT *
    FROM table1, table2
    WHERE table1.key = table2.key

    and

    SELECT *
    FROM table1
    INNER JOIN table2
    ON table1.key = table2.key

     I usually use the second, but the example uses the first--is this a secondary wtf? Or just a usage convention difference?

    Wow. Any DB worth its salt should generate the same execute plan. Even MySQL and Postgres are smart enough to do this.

     

  • Toger (cs) in reply to BradC
    BradC:

    So are these two SQL statements functionally identical? (and identical speed to run?)

    SELECT *
    FROM table1, table2
    WHERE table1.key = table2.key

    and

    SELECT *
    FROM table1
    INNER JOIN table2
    ON table1.key = table2.key

     I usually use the second, but the example uses the first--is this a secondary wtf? Or just a usage convention difference?

     

    In Oracle terms they are the exact same thing, just using different syntax forms. The INNER JOIN / NATURAL JOIN /CROSS JOIN  keywords are a comparatively newer ANSI syntax, internally they get mapped the same way.

     

  • Anonymouse (unregistered) in reply to ParkinT

    I was hired for a job as a developer, mainly because I knew VB3. The old system for this non-profit org was written in VB3, talking to a db2 database. They decided they needed to upgrade the system. The front end they got was written in VB5, talking to MSSQL 6.5 (I think. This was circa 1998). The front end used RDO to communicate to the database, and the geniuses involved in the development of this Wondrously Fine Thing decided they needed to write their own classes to wrap the RDO objects (adding another layer of communication and slowing things down). Performance was pretty craptacular. Their solution was to get more hardware. So we, as a non-profit org mind you, ended up with a quad processor box, with 4gig of ram and around 500 gigs of drive space. Eight years ago, that was a hefty substantial box. Performance increased by maybe 2-3%, at most.

    Of course, this vendor also asked us for some specs on a later project we wanted them to do. After sending them a 5-6 page document explaining what was needed, they tacked on a cover sheet, a chart, and a $4000 bill for the document I wrote and sent to them.

  • jminkler (unregistered) in reply to achille

    Don't know if anybody caught this but ....

     

    for(int i=0; i<dt.Rows.Count; i++)
    {
    //Insert Sql command
    SqlCommand insCmd = DataHelper.CreateCmd(
    "INSERT INTO Queue (Sender_Id, Recipient_Id, ...) "
     +      " VALUES (@Sender_Id, @Recipient_Id, ...))";
     
    This is setting up the Prepared statement EVERY TIME in the loop the create command should be OTUSIDE the loop !!! 

     

  • jminkler (cs) in reply to foo
    Anonymous:

    The real WTF to all of this is that nobody suggested that they use a View. You know a View ... a big query that pretends to be a table. The DBA sets it up and the programmers think they have a magic table that automatically changes with the other tables. 

    ya for real .. :)  problem solved .. 10 seconds .. :)   

  • mjk (unregistered)

    Inserting into a second table violates normal form and is generally (as in nearly always) a BAD idea.  A query should be used instead.

  • Angstrom (cs) in reply to achille
    achille:

    rbriem:

    The INNER JOIN suppresses any records from either table that don't have a matching key field in the other (pretty sure - been a while since I SQL'ed); IIRC, this is the default behavior for some databases, so this *may* be functionally equivalent.

    Most databases will optimise the query and execute exactly the same in both cases.... 



    Hell, even MySQL will run the same query plan for both queries.  PostgreSQL can do join-ish optimization on WHERE clauses more complex than equality, in some cases, too.

  • Charles (unregistered) in reply to Gsquared
    Gsquared:

    MS-SQL actually runs both of these statements in exactly the same way.  Same performance, same CPU, same RAM, because they have the same optimization.

    I use the second format myself, but only for readability.  It makes more sense to me to have the condition of the join as part of the join statement.  Makes it clear that that part of the script is to handle the join, while the Where statement determines which subset of rows to return from either table.  Other DBAs use the other because it makes more sense to them and is more readable to them.

    Since it executes the same, it's just a consistency question.

    Well it only executes the same if there are only two tables in the query, or, when there are more than two tables involved, if the database vendor has coded the query processor to "optimize" both querys the same way. If one special case, when there are more than two tables involved and the joins are outer joins and not inner joins, it is not logically possible to do this and the two syntaxes cannot execute the same way.

  • Rich (unregistered) in reply to Gsquared
    Gsquared:

    I use the second format myself, but only for readability.  It makes more sense to me to have the condition of the join as part of the join statement.  Makes it clear that that part of the script is to handle the join, while the Where statement determines which subset of rows to return from either table.  Other DBAs use the other because it makes more sense to them and is more readable to them.

    There are things you can do in the second form (with joins) which are just not possible in the first. I can't remember the example off the top of my head, it may be if you have more than two tables...

     

    Rich 

  • Walrus (unregistered) in reply to Rich

    Anonymous:
    WTF? Have you never heard the startup folder in the start menu?

    I suppose you could fudge your Windows server with all your precious data to autologin too, maybe not the best plan though.

  • Zygo (unregistered) in reply to Luciano Mollea
    Anonymous:

    three hours to five minutes for 2000 rows...

    mmmm... been there,, done that.

    some index missing, loads of tables that strictly follo all possible DB normalizations,

    I guess that in one of that tables although there was a primary key there has beeb duplicate rows... 

     


    Sadly this sort of thing is not only not unusual, but commonplace.  I recall optimizing a loop somewhere that reduced search time from 75 hours to 6 minutes.  A small code reorganization mixed those 6 minutes into 90 minutes of data acquisition time, effectively reducing the search time to zero.  As a result the code was infinitely faster.  ;-)

Leave a comment on “Very Slow Service”

Log In or post as a guest

Replying to comment #:

« Return to Article