- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
So the frist thing our contributor did was become a SQL performance wizard by finding & fixing idiocies.
The secnod thing they can do once that well runs dry is find all the speed-up loops in the code. Of which there are many, both deliberate and inadvertent / stupid.
Admin
Why is that bad news?
Admin
I used to be Jakard. I miss those days.
Admin
Because as soon as he runs into a non-trivial performance issues he's going to look like an imbecile since the other ones he solved fast, so clearly anything he's given can be done equally fast.
Admin
The word Wizard here is not chosen randomly. The trap of the magical rock star wizard developer is an insidious one.
What you need to understand is that when it comes to people above you in the hierarchy, or even colleagues that don't know what they are doing what us serious developers do looks like magic. They do not understand the difference between a stupid mistake like this one that can be easily fixed and a real, serious problem that can not. Indeed, they are incapable of comprehending even the fact that a difference exists.
Indeed in the case of management they quite literally should not, because that is not their job. Their job is to make sure all the workers are working right and that the best, most qualified worker for a task gets that task.
So in situations like this all anyone sees is "SQL go slow." And the limit of their understanding is that when "SQL == slow" they should call the "Wizard" who has the arcane knowledge and a seemingly magical ability to make it not slow.
And while this is good boost for ones superiority complex and ego early on while there is a lot of such low hanging fruit available sooner rather than later every wizard ends up in a situation where he has fixed all of these trivialities. At which point all that is left is the real problems that are much more difficult or some times even impossible to fix.
However, because of the fact that sufficiently skilled software development is indistinguishable from magic it is virtually impossible to explain to management that this is the case. And our "Wizard" developer is now trapped in a situation where his superiors have wildly insanely impossible expectations of him which to them and their understanding of the situation seem quite reasonable.
After all, he fixed all those slow SQLs before. Why can't he just fix this new one? His reputation says that he should. He must be lazy or spoiled or something.
And that is why you should always beware of earning a reputation as "the wizard".
Admin
Hm, when the "wizard" cant't fix the "new one", they could just say, his Mana is exhausted and they need 3 weeks PTO.
Don't know, what to do after the 3 weeks, though :P
Admin
by then they have found a new job with new low hanging fruit to wizard away...
Admin
They should teach SQL at Hogwarts.
Admin
He probably will be able to. If it's a query taking of the order of magnitude of 30 minutes, the cause will almost certainly be obvious. Anything taking a smaller order of magnitude is not worthy of the Wizard's attention and can safely be left to lesser mortals to fix*, if such queries are deemed to be an issue at all given that they have been living with 30 minute queries in the past.
*because the difficulty of solving a performance problem is proportional to the performance cost... right?
Admin
The real WTF here is "why was the original query so slow?" Why didn't the optimizer deal with it properly?
Admin
Point is that you need to manage your reputation carefully so as to ensure that while your skills and abilities are understood and respected by management they do not get such an inflated image of them that their expectations of you in the future become greatly above what you can actually manage.
If you fall into that trap you will quickly find your self in the age old situation of quotas raising to meet the skill of the miracle worker who keeps up with the raising quotas. And that's an one way road to burnout.
Admin
Optimizers tend to have assumptions about how you are going to word your query, and when you give them something totally weird, that leads to bad assumptions. It's entirely possible that for this particular DB version, the nested queries and distincts tricked it into not using an index at all. It's hard to say for sure, because for the two major enterprise DB vendors- Oracle and Microsoft- their optimizers are secret sauce and closely guarded. A patch to the DB could wildly alter the performance of this query- it's entirely possible that when written, it worked perfectly fine, they updated the database and it broke.
Admin
Yeah, no. There's no uniqueness in the keys there because you can have a [Category:'One', Name:'Category 1'] in CategoryStorageA and a Category:'One', Name:'Category 01'] in CategoryStorageB (minor difference in the Name) and you'll get both records in the final resultset. The only way to fix it is to add a constant like "select Table = 'A', Category, Name from CategoryStorageA union select Table = 'B', Category, Name from CategoryStorageB" and treat [Table, Category] as the primary key.
Addendum 2023-10-19 09:04: With the table constant you can use "union all" instead for a bit more efficiency since the result won't change.
Admin
Am I missing something here? there are 2 tables with at most 30 rows in each and in the un-optimized form it takes 30 minutes to perform a union? assuming the worst case scenario is O(n²) that's 1800 seconds to process 900 items. seriously with that amount of data I would not expect any measurable performance increase whether indexes were used or not.
Admin
When you include the distinct in the query, the server takes all the data and then sorts it so that it puts the duplicate rows next to each other and then it can grab the first unique row it sees. The server needs memory to store the data for the sort, and of course, memory to do the sort. But if the server underestimates how much space it will need (or the data is too big to fit in the memory it's willing to give to the query) then it has to start writing to disk to do the sorting. Also, the server can't stream the data back to the client until it has completely gotten all the data and sorted it to make sure it's only sending you the distinct rows. Now, technically, the union operator still sends back unique data. Union All would send back everything without worrying about uniqueness. With the rewrite as given, either the server was able to handle only sorting the data once, or the rewrite was actually a Union All and the server could just stream the data back to the client as rows were coming in from the source tables and memory usage would be minimal because it doesn't need to hold on to the data to sort before sending it back.
Admin
Optimisers aren't magic. If you try to anticipate and handle every idiotic thing that someone might come up with, you'll just make it slower, harder to maintain and more likely to have bugs - and chances are there's an even bigger idiot right around the corner who'll still manage to produce something horribly slow anyway. It's more sensible to focus on efficiently dealing with things a reasonable person would write.
Admin
The query is oddly written but there's nothing inherently terrible about it, it more just has a lot of redundancy. That is most definitely a very weird bug in the optimizer.
Admin
Even though the query was obviously badly written, it's strange that it took that long to run... The DB engine should have known that those DISTINCT were useless and just ignored them. I wonder what RDBMS it was...
Admin
"Why is that bad news?"
I think Nietzsche put it best after quitting his job as a DBA. "He who fights with databases should look to it that he himself does not become an Oracle. And if you gaze long into an abyss you become known as a subject matter expert on abysses and will have every abyssal ticket assigned directly to you."
Admin
I admit it's been over 20 years since I graduated from university (degree in computer science), but UNION doesn't guarantee unique rows; it takes the results of the second query and appends them to the results of the first query. The table structure ensures that Category will be unique in StorageA, and that Category will be unique in StorageB (making the inner DISTINCTs unneeded); however nothing enforces uniqueness between both tables.
Admin
Yeah, that's what I was thinking. I was thinking you still need the outside "distinct", just not the inner ones.
Admin
I've been that "wizard". Past the first few ones, I made sure to let as many people as possible know that this new one was a bit more complex (even if it wasn't). I also asked the powers that be to pay for additional training so I could become the actual wizard they thought I was. And I did.
Working on small and larger optimization problems is very satisfying for me, as they are usually self-contained, of limited scope, and at the end the client is thrilled that what was a 30 minutes process now takes mere seconds.
Admin
You're thinking about UNION ALL. On SQL Server, UNION does guarantee unique rows.
Admin
For some databases UNION does guarantee uniqueness, whereas UNION ALL will return all rows regardless
Admin
I will bet my balls that this happened on Oracle. Why? as a sql-monkey for 15 years I have seen some things... Horrible, horrible things... And often similar unexplained behaviour, where all kinds of impossible can happen. Just rearrange A and B, or wait for next full moon, and voila! its fixed.
Admin
People should use ANALYZE and EXPLAIN more.
Admin
Ah sub queries - a lot people forget that those can result in hefty performance issues; but honestly, it's pointless to think a lot about it. Better use a query language processor that translates to the target database instead of trying to come up with something that might be broken a server version later.
Admin
So after Jakard has long left the company, there will be sections of codebase that are forbidden to touch, for those are the Works of the Great Wizard, and are to be reverred as Holy Scripture. Any questions regarding anything about the database will be responded to with "the Great Wizard understood it. It is beyond our understanding."
Admin
As a (now-retired) long-time SQL Optimizer Wizard, for a long time MS SQL Server had issues with sub queries. Oracle did as well, in different ways. Both got better at looking through the sub queries, but to the optimisers, they were largely opaque black boxes. Current versions of MS SQL Server should see through the subqueries pretty quickly and deliver a good plan in most cases.
Both major RDBMSs and most minor RDBMSs should have been able to optimise out the DISTINCTs in the subqueries if the Category in each query was defined as the PK. But the distinct * in the outer query is still a problem, even after the union. Not a 30-minute problem, but a problem, nonetheless. It still requires the optimiser to look at each row in its entirety and sort and build a key over each row for the distinct test, not just the Category.
This stinks of the original developer (query writer) not understanding the data, or the expansion of *, or both.
And Remy, I've got to take issue with your comment about the vendors optimisers being secret sauce and closely guarded - that's not really true; I've paid more attention to MS than Oracle, but both tell you about how their optimisers work, what they look at to make decisions, how they look through queries to choose plans, and provide amazingly detailed explanations pre-, during- and post-query execution of the plan, the execution and the result. Sure, they're not open source, but the optimisers are incredibly well documented and publicised.
Admin
I'd love to know what RDBMS they're using so I can avoid it like the plague. Either this WTF was from at least 20 years ago or they must be using something pretty awful. I'm not even convinced that the "distinct *" should be a problem - apparently there are 30 records in total, there are no duplicates in there, and the * only expands to 2 fields, one of which is the PK in each table. What kind of execution plan could drag that out for half an hour?! You could literally type the results by hand quicker than that. I think there's probably a lot going on which isn't mentioned in this article. - is Category really the pk or are there really millions of duplicate rows in each table? Is this query being used to write to another table which is getting blocked? Are those "tables" actually views with some complex query behind them? I'm much more interested in why the original was broken than in just tidying up the query and saying it's fixed without ever finding out why.