- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
ARGHGHG. My eyes! They burn! Please shoot this idiot! He does not deserve to live!
Admin
For us nubs who haven't had to use overly much SQL yet, could you post the single SELECT statement?
(Or better yet, a link to a guide to good and speedy ways to use SQL? Or a link to bad and slow ways, purely for entertainments sake? Like the UI halls of shame...)
Admin
This beast of a WTF stored procedure also included several temp tables, a few cursors, 20 inserts, 38 updates, 15 deletes, and 66 selects. It was 2,000 lines long, and took 30 seconds to run. It was painfull just to look at much less rewrite.
Admin
To do it within the database layer, see the posts from OneFactor and Maurits. Alternatively, you could dynamically build a SELECT statement in the middle tier, containing only the relevant clauses:
SELECT sProjectID, sProjectName, ...
FROM tbl_Projects
WHERE true
AND sProjectID = ### /* omitted if sProjectID is not specified */
/* 4 more criteria */
AND sProjectName = 'xxx' /* omitted if sProjectName is not specified */
Also, if you want to send e-mail from the database layer, I suggest just writing relevant data to a EmailToBeSent type table, and having a separate process periodically scan that table and send whatever it says to send, purging or archiving each entry after it's successfully processed.
Admin
There is a problem with all the proposed alternatives which is that SQL Server will compile the stored procedure the first time it is run, and that compilation may not use the optimal execution plan for most of the times the procedure is run later.
Here is an alternative procedure design (based on the Northwind data) which will compile for each combination of parameters. It is a sort of dynamic SQL wrapped inside the procedure, using the parameters of the system procedure sp_ExecuteSQL:
ALTER PROC dbo.usp_GetOrders
@OrderID AS INT = NULL,
@CustomerID AS NCHAR(5) = NULL,
@EmployeeID AS INT = NULL,
@OrderDate AS DATETIME = NULL
AS
DECLARE @sql AS NVARCHAR(4000);
SET @sql =
N'SELECT OrderID, CustomerID, EmployeeID, OrderDate, filler'
+ N' FROM dbo.Orders'
+ N' WHERE 1 = 1'
+ CASE WHEN @OrderID IS NOT NULL THEN
N' AND OrderID = @oid' ELSE N'' END
+ CASE WHEN @CustomerID IS NOT NULL THEN
N' AND CustomerID = @cid' ELSE N'' END
+ CASE WHEN @EmployeeID IS NOT NULL THEN
N' AND EmployeeID = @eid' ELSE N'' END
+ CASE WHEN @OrderDate IS NOT NULL THEN
N' AND OrderDate = @dt' ELSE N'' END;
EXEC sp_executesql
@sql,
N'@oid AS INT, @cid AS NCHAR(5), @eid AS INT, @dt AS DATETIME',
@oid = @OrderID,
@cid = @CustomerID,
@eid = @EmployeeID,
@dt = @OrderDate;
GO
Thanks to Itzik Ben-Gan of Solid Quality Learning for this!
With regard to sending emails from the database, I would not use it for a "real" application, but I use it the whole time for alerts on SQL server job or DTS package run failures and errors.
Admin
You are joking, right. This completes in at least O(N * M) time where M is the number of criteria, where it COULD have completed in O(N) time with a normal where clause.
Admin
omg. this is the best wtf yet.
Maybe... just maybe this poor programmer was not aware of using isnull for optional fields in the where cause.... then his ingenuity got the better of him
Admin
im no SQL guru but for optional parameter selects for searches i always use
"SELECT * FROM Table WHERE Col1 LIKE '%" + parm1 +" %' AND Col2 LIKE '%" + parm2 "%'" etc....
makes it one statement...is there something wrong with this? Is this a serious performance hit or does it not allow use of indexes or something? please let me know..
Admin
I would also prefer 2 over 1 since I have a feeling that there is a better chance that the query optimizer can optmize the statement based on the constant values. Anyway, as long as you don't have indexes on the searchable columns, all versions should perform equally fast (that means: full table scan). If you have e.g. 2 of 6 columns indexed, you could make a mix of 3. and 2. to get the best possible performance (3. for the indexed colums, 2. for the rest). Dynamic SQL is obviously best in terms of performance, but more difficult to maintain.
Admin
To make sure it works all the time, no matter which eMail program the client has installed.
Admin
T-SQL might be a bitch (must admit I don't know it), but Oracle's PL/SQL is a nice language to write business logic. We regulary put almost everything into the database, except a thin presentation layer and some programs communicating with other systems using "strange" APIs. Of course, this means that replacing the database with anything else than Oracle is currently not an option. Maybe in a few years competing products like Fyracle or EnterpriseDB may reach a level of PL/SQL-compatibility that makes them a viable alternative.
Admin
Wow, this is the smartest way I've seen to run custom queries on a database that doesn't allow regular users to execute dynamic SQL, only stored procedures.[Y]
Well, it might not be the best performance, but who cares. hardware costs aren't the issue these days, right?...[H]
right guys?...
[uncomfortable silence]
Admin
wtf??? I was staying away from this thread -- until this. You mean your only alternatives to send e-mail are the server's DB service or the client's mail app? Have you ever heard of an MTA?
dZ.
Admin
Hehe, well, he stated:
"T-SQL might be a bitch (must admit I don't know it), but Oracle's PL/SQL is a nice language to write business logic. We regulary put almost everything into the database, [...]"
Yeah heck, who needs application servers, if you have a DB server. Sure this must scale like a charm. And there you also have the explanation why he sends email from his DB. He's lacking a middle tier... what a WTF!
Admin
This is a great WTF from the SQL Server team. They couldn't be bothered with an immutable or constant variable declaration so that these optimizations can take place where they belong.
Admin
This seems to be mainstream thinking where I work [+o(]
Sometimes I think the WTFers are just trying to keep their jobs, like writing crap that only they will understand, consequently making them undisposable (is that a word?) for the company, and it seems to work.
I prefer to live dangerously [:D]
Admin
Re-inventing the WHERE clause... using WHERE clauses...
Somewhere along the line of coding this, you'd expect the quarter to drop and go cha-ching and he'd think wait... what am I doing.
Admin
Almost ;-) but I get the point, I think you mean something like this:
public class ITGuy : Employee, IUndisposable{
public void Undispose(){
WriteCrap();
}
}
programming == universal language ;-)
Admin
At the very least, he should have checked whether "LEN(@sProjectID) > 0" and "LEN(@sProjectName) > 0"
before creating the search table.
Admin
I am also fairly inexperienced in SQL, but whenever I've had a situation where I needed a dynamic query with optional parameters I also use something similar to LIKE "%" & param1 & "%". <FONT color=#000000>Does running a query like this come with any sort of performance hit over doing it some other way? </FONT>
Admin
Of course the stored procedure connects to a MTA to send the mails. If I had a middleware, I would let it send the mails.
Admin
It doesn't have to scale like a charm. The application (warehouse management) has - depending on the client - about 50 to 200 users. A single server with 2-4 processors running Oracle can easily handle that.
Admin
I'm fairly inexperienced as well, but from I have read, using LIKE can potentially slow things down, especially if the first character is preceeded by a"%" as in, "SELECT * FROM Table WHERE Column LIKE '%F%'". http://www.sql-server-performance.com/transact_sql.asp says that if you can rewrite that to be "SELECT * FROM Table WHERE Column LIKE 'F%'" then, "<font face="Verdana" size="2">the Query Optimizer has the ability to potentially use an index to perform the query, speeding performance and reducing the load on SQL Server.</font>" I don't know if the same rules apply to other databases.
On a somewhat unrelated note, I thought that dynamic SQL was unspeakably bad and to be avoided at all costs. Am I missing something that makes the examples you all are talking about ok?
Admin
Thx for the video of the sparkler adventure. Well done[:P]. visit us ad whores at url pisted and post if you get the spark to comment.also 925m.com welcomes code geeks to post at will too. i found your pastel back color interesting. same as Go Yellow chose in their latest paris Hilton ad that is spreading faster than shiznitz is suposed to spread. Thuis is just the press conference: ttp://www.goyellow.de/insight/paris/pressekonferenz.html and here: http://www.goyellow.de/insight/paris/ as a color geek, I find the use of orange in digital cerebrations hot. thx for all the interesting codes wtf! [I]
Admin
The same rule is true for Oracle and probably for every database that has a "like" operator and a query optimizer.
Dynamic SQL is not unspeakably bad, but should not be used without a good reason. One particular problem with dynamic SQL and Oracle (at least in version 8i, I'm not sure about the more current versions) is that whenever the literals in the query change, Oracle will not reuse a previously parsed statement; for example, "select * from x where y=1" and "select * from x where y=2" are two different statements and Oracle will parse the second one again although the only difference to the first one is the literal value. Using bind variables instead of literals is better for Oracle.
Admin
If you cannot visualize why
LIKE '%def%'
might be inefficient, regardless of the RDBMS being used, think about it logically:
If I give you a large book with an index and ask you to give me a list of page numbers with all key words that begin with the letters 'def' (LIKE 'def%'), you can jump right to the index at the end, jump to the d's, quickly scan the index alphabetically, and there they all are.
But if I ask to give me a list of all words containing the letters 'def' (LIKE '%def%'), then you need to read the entire index (or the book itself) and search all of the letters of every possible word to determine which pages to return.
Admin
While Oracle has gotten better about handling dynamic SQL with 9i, and shows more improvement with 10g, I still feel that bind variables are a better way of doing things overall, due to all the things you mentioned.
On a different note regarding having the database sending emails, there are a lot of systems out there which are written to do exactly that, and all for the reasons that Mauritis mentioned earlier. Oracle's HR, Peoplesoft and Rational, just to name a few.
Admin
Forget performance, you've got a bigger problem:
What if I put in my search criteria as "a'; drop table xyz; --"?
Admin
In most cases, it simply doesn't work. At least not if you use dynamic SQL inside PL/SQL in Oracle, since these procedures expect exactly one DML statement.
That put aside, it's bad enough if the program crashes when you try to search for "John O'Connor".
Admin
Yes, but M is a constant having nothing to do with the number of rows. Since constants don't matter, this is still O(N).
Admin
I agree with you for the most part if its a very small project. However, most of the projects I work on are large or become large after a very short time. So I almost always make 3 logical layers to my app, even if its relatively small to start with. I try to keep business logic, other than RI, keys, cascade deletes and such out of the database. On most new projects, I'm building the database from scratch and there are few if any other programs accessing it.
The object table system (objectTypes, objectAttributes) sounds truly horrific. Most of my 3 layered solutions have a class which matches a table in the database, more or less, with the fields in the table matching the primitives fields in the class. Not everything matches up that neatly, and its a huge amount of work, even with codesmith, so I've been scouring the web looking at several OOP databases and O/R mappers. I wish OOP databases would become more mainstream but i don't really know if they'd work.
I'm sure Yukon WILL have some truly horrifc WTFs. When I heard they were going to have the .NET runtime in it, I was initially dismayed because I foresaw developers using C# or VB to do all sorts of things that SQL is better suited for. Slowly I've become more comfortable with the idea, and I think its one step closer to SQL Server being an Object-Relational DBMS. In the past, I didn't realize that the primary reason I didn't like (and other developers didn't like) putting a lot of business logic in the database was because of the limitations of T-SQL. If you take away that limitation, by allowing .NET languages and/or true objects in the database, then there's less reason to have a middle tier, IMHO. Also, in most cases, you want business logic to apply to the data regardless of what client program is accessing it. With a middle tier, you have to copy DLL's around to multiple client programs, or use remoting, or (uggg!!!) copy & paste code so the same logic applies in all places. If all this was on the database, you wouldn't have to jump thru these hoops. Objects are a great way to encapsulate business logic, and to a lesser extent, data. So it would make sense that if our database become more robust, more OOP, then the middle tier could go ahead and shrink.
But I digress. The way things are now, its an absolute ass whip to try to debug email generating code, text parsing, complicated forumlas and other business tasks when they are written in T-SQL. I'm actually very comfortable with T-SQL but it just isn't very well suited to that kind of work. It was created to manage sets of data. My boss at my last job had this huge, complicated stored procedure that looped over a cursor, appended HTML together and sent out an email. It would have been so simple to do the same thing in VB.NET but everyone seemd to love putting that crap in stored procedures. He was also more of a Microsoft Access developer than a .NET/SQL Server developer so he had little concept of a 3-layered/'tiered architecture. When I got here, a contractor/project manager our company uses tried to get me to do the same thing. I told him no way.
Admin
Ok, here's where I'm a little fuzzy on how SQL injection attacks work. I'm going to give two scenarios, one where that SQL injection attack will work, and one where it will not. If I'm wrong about either, please let me know:
1. In this situation, I take your input, "a'; drop table xyz; --" from, say, a web form, and set a string named param1 equal to it. Then I set a string named sql equal to "SELECT * FROM Table WHERE Blah LIKE '%" & param1 & "%'", so sql becomes "SELECT * FROM Table WHERE Blah LIKE '%a'; drop table xyz; -- %'". Then I execute sql and rue the day.
2. In this situation, I take your input, "a'; drop table xyz; --" from, say, a web form, and send it to a stored procedure named GetEverythingByBlah. The meat of this sp is "SELECT * FROM Table WHERE Blah LIKE '%' + @param1 + '%'. When the stored procedure is run it selects everything from Table where the value of column Blah is like "a'; drop table xyz; --". So, if for some weird reason there happened to be a row in Table where column Blah had a value of "rgwrgwerga'; drop table xyz; --aeherghaseth", then that row would be returned. Otherwise, no rows would be returned. Regardless, the SQL injection attack fails.
So, am I right?
Admin
All: When comparing to null you cannot use =, which is what throws a lot of folks off initially. SELECT @varExample = null ... IF @varExample = null fails. You have to use IS to test against null. IF @varExample IS NULL. The function ISNULL(@varExample, valueToUseWhenNull) takes a variable or statement and if the variable or statement resolves to a null value, substitutes the valueToUseWhenNull. (I think this is a poorly named SQL function as I initially expected it to return a boolean value as to whether something is null or not when I first started working with SQL Server.) This test against null for equality is the first stumbling block for many.
The single SQL statement, without using dynamic SQL subject to SQL injection attacks, that I would use would be along the lines of: SELECT Column1, Column2, Column3, Column4 WHERE (Column1 LIKE @Column1Parameter OR @Column1Parameter IS NULL) AND (Column2 LIKE @Column2Parameter OR @Column2Parameter IS NULL) AND (Column3 LIKE @Column3Parameter OR @Column3Parameter IS NULL). You can expand this at will. In the stored procedure or query definition simply specify default values of null for all the parameters you might be searching on.
Some folks choose to rewrite that using ISNULL along the lines of: SELECT Column1, Column2, Column3, Column4 WHERE Column1 LIKE ISNULL(@Column1Parameter, Column1) AND Column2 LIKE ISNULL(@Column2Parameter, Column2) AND Column3 LIKE ISNULL(@Column3Parameter, Column3)
The reason the above is more difficult for many to grasp is the set nature of SQL Server. Each statement is evaluated for each row of SQL Server, but understanding that the references to Column1, Column2, and Column3 in the above statement are translated to the values of the row under consideration escape many.
I prefer the first statement because it seems clearer to me what the logic is, and I prefer to keep SQL functions out of a query if I can. (Why? Gut instinct, that I should probably check with the execution plans that SQL Server produces.)
Admin
hmmm.. this one's got a WTF in the text provided by the submitter...
"... all the while ignoring the WHILE clause." (???)
Admin
I believe 2 performs better than 1 in the presence of indexes because of the possibility of applying short-circuit boolean logic. The indexes go all funny in 1 because the isnull function gets passed an argument which changes from row to row. I seem to recall that indexes work best when functions depend only on passed in parameters and work worst when functions depend on both passed in parameters and row values and that "functional indexes" can help out with functions which depend only on row values.
Thanks to all for reminding me of solution 4 which is to build the SQL dynamically (either in middle tier or in sproc). Does it actually perform better than clever where clauses even though the dynamic sql gets compiled each request?
Admin
From what I know, you have it correct. The worst thing you can do is concatenate the values you have received from the user into the query string you are about to execute.
This is where I can get hazy, due to not knowing all of the specifics on SQL Server. In general the db user that an application is using should only have read/write permissions(unless you're a coding god and just use sa). So, just having those permissions, can this db user actually drop a table? I'm not necessarily saying that a knowledgable person couldn't do damage with SQL Injection with a limited db application account, but table dropping is generally out of the question, right?
Admin
If you don't need to search on the optional parameters being null and you want an AND search, just set the default value for those parameters to null and have something like the following where clause:
where
(firstField = case when @firstParam is null then firstField else @firstParam end)
and
(secondField = case when @secondParam is null then secondField else @secondParam end)
...
Admin
Dang, can make that simpler. Change every comparison to a coalesce: (firstField = coalesce(@firstParam, firstField))
Admin
Actually, one way around the performance of the (@var is null OR field = @var) is this: (@var is null OR (@var is not null AND field = @var))
This prevents the DB from doing an exhaustive search as a result of the right side of the OR expression, since the first test fails. Do this and you'll almost never need to use dynamic SQL again.
Admin
Funny thing about databases, memory usage means almost nothing. Database optimizations are all about reducing disk read/writes. I'll take O(n*n*n) in main memory over O(3n) in disk reads (and writes!) any day.
Admin
I usually use column = COALESCE(@column, column) for such situations. It tends to perform better in my experience.
Besides, coalesce is a cool word.
Admin
Then you can test whether there's an idiot sysadmin at work who allows the DB Users used by scripts to have access to everything.
Admin
The actual SQL doesn't matter. If the account running is not sa, I could just put in " a' or 1=1" or something similar.
Admin
O(3n) is the same as O(n). There is definitely a value for n that makes you regret your decision (e.g. when the swapping starts)
Admin
The stored procedure will execute with the permissions granted to the stored procedure owner. So if dbo created the stored procedure then dbo is who it executes under. Stored procedures make a great way to minimalize security risks by restricting access to tables and providing access through views/stored procedures, but if you don't take care it can blow up in your face.
Admin
Ok, so an sp executes with the permissions of the user who created it. Presumably, you wouldn't give JoeWebUser access to an sp that did anything horrible. But let's say you give him access to an sp that creates a sql statement dynamically, like @DynamicSqlString = "SELECT * FROM Table WHERE Blah = '" + @param1 + "'", and then executes it, like Exec(@DynamicSqlString). If JoeWebUser enters something SQL-Injection-irific for @param1, then you're probably screwed because the sp will execute with the full set of permissions held by the developer that created the sp. However, if you were dynamically to create this same SQL statement with the same SQL-Injection-irific input for @param1 and then execute it directly from your application, then, since you'd be logged into the database as JoeWebUser with limited permissions, the SQL injection attack probably wouldn't work on account of JoeWebUser not being allowed to drop tables, delete rows, etc.
Is that right?
Admin
Yes, if you simply concatenate the parameter values onto your dynamic SQL string you're just as screwed as you were before you did anything. It should be noted that this isn't the recommended method of doing this though.
Parameters are translated... although I'm not sure that's the proper term. They are not treated as literal replacements. The SQL Server engine ensures the value inside the parameter is suitable for use as a value by treating it differently than a text query.
JoeUser enters
' OR 0 = 0 --
into your username box. You use parameters to encapsulate this value which is passed to SQL Server to a stored procedure there.
Instead of using a dynamic query, you use a parameter:
SELECT Username, Password FROM Accounts WHERE Username = @Param1 AND Password = @Param2
SQL Server stuffs the value of @Param1 straight into the variables that are being compared. It doesn't first do a text replacement of the values and then execute the query (that would be no better than a dynamic query). What gets "executed" is effectively
SELECT Username, Password FROM Accounts WHERE Username = ''' OR 0 = 0 --' AND Password = @Param2
(Note: That's not what is actually executed. Can't stress that enough. :))
If you had used dynamic SQL though and concatenated the string values together you would've had someone with access to your system.
Parameters can be used outside of stored procedures. They can be used with straight SQL queries as well. Well, EXEC() can't take parameters but sp_executesql can. See MSDN at http://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref/ts_sp_ea-ez_2h7w.asp.
Where a dynamic query is beneficial is the fact that placement and usage of parameters is limited. You cannot, for instance, use a parameter as the column name or table name to select from. You cannot use it as an indicator as to which columns to select from the table. That requires dynamic SQL.
Using parameters also allows execution plan reuse, whereas constants and dynamic SQL garner no such benefit.
Admin
In Oracle, you can use bind variables in dynamic SQL. You can cobble up a nice SQL string using ':1', ':2', etc, for the placeholders, then call it with the appropriate values. So theoretically it should be able to reuse the query. Also you avoid the whole problem with reserved characters in the literal string then.
Unfortunately on every other level, Oracle makes baby Jesus cry [:'(]
I haven't checked if this is possible with SQL Server though - I must investigate...
Admin
I write Oracle batch code like that
all the time. Well, not quite, since
I don't like to update in place.
More like this:
create table first_pass as
select vars_i_care_about where
whatever ...;
Then I index what I know I'll need.
Then I might join another table,
do another create, etc.
This is typically for marketing data warehousing work.
Table sizes in the 10s to 100s of millions.
I used to construct very complex sql queries,
extract, joins, etc, but they were next to
impossible to truly troubleshoot and prove.
And the total process could run for hours,
which could fail and need to be redone. If
I kept it piecemeal, I could fix and restart
at a known point.
By cutting the work into these type of chunks
is allowed me to control the size and time of
each step, and made sure that a change in the
optimizer didn't burn me a year from now
since my joins were pretty simple.
Also, I know people who are managers who spent
years in Xywrite editor that did this type of
multiple pass transforms of data, massaging it
until it was what they wanted.
A real program would do it in one pass, but they
only had a crappy macro language to work with.
Might be based on that.
Admin
There's a good article about dynamic search conditions here:
http://www.sommarskog.se/index.html