- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
sol said:
So I have to wonder where crazy things like NEVER USE ODER BY use a web server sort fall in at oO Like for instance I have a screen that display (omfg) rows and columns of data.. and I cache it and instead of round tripping back to the database to filter or sort I use what I have in memory already... backing up the original for filters and of course simply sorting the active set of data for sorts...
I kid you not people have called me stupid for saying why use ORDER BY or why round trip to the database...
"Never use ORDER BY" is too drastic. You're generally correct: why roundtrip every time versus do some of the things in place.. BUT, this only works if:
a) the data sample you work with is small, otherwise if you have thousands or more rows, and you have to show the top 100 rows sorted by X, then downloading the entire database and emulating functionality in your app will result in abysmal performance.
b) if the emulated set of functionality is simple to replicate. sorting by one columns is easy enough, but many applications will offer much more complex filters and sorting, with boolean logic and so on. SQL has this, this is what it exists for, reimplementing it on the client side means you have no idea what you're doing.
Admin
My company's 2 main systems that we've developed for a huge multinational company use TEXT fields almost exclusively for fields. This decision pre-dates my joining the company, so I'm not to blame... new tables for features I've developed use more efficient fields :)
We're running into optimisation issues now where certain reports do indeed take a long time to run... in some cases they even fail to execute because the users have entered truly stupid values into their query fields - ProductNo LIKE '%%' was one I saw earlier today... this in a database with 2,000,000 products.
Oh well, maybe one day we'll fix it ;)
Admin
ITYM Keep a copy at home.
Admin
And of course be sure to place the hard copy on a wooden table and photograph it before emailing it as proof.
Admin
In case you hadn't heard, roughly 90% of a programmer's time is spent doing maintenance. The difference between a childish hacker and a professional software engineer is that the latter will always write the cleanest, simplest, most straightforward code possible, knowing that it will cut down on future maintenance time. (Favorite graphical representation of this is here.)
If, for instance, you were to write thirty source files using your own special "enhanced" (read: optimized) string class, you'd just be making considerable extra work for your co-workers, who will have the hapless burden of having to read and understand your code at a later date, and the gain in performance would be measurable only in microseconds, if there is a gain at all.
I believe we all agree on this, yes.Optimization is something that can be done at any time without causing shear to the system. This, on the other hand, was a fundamental design catastrophe, and fixing it will take at least as long as it took to originally write the thing.
Admin
don't remind me of the evil DOS as/400 days again
Admin
yep for large set I would be working with paged subsets and round tripping a lot
Admin
and not to double post, but reflection can handle all those funny complex filters, but I sure don't want to be the one to write the reflection code
Admin
That should read -
IBM: What feature from Informix would you like us to wedge into DB2 for you?
Admin
Do please tell the folks at SAP that. At least some modules, AFAIK all modules, store all dates as character strings. All of them.
Admin
Isn't this "We'll optimize later" just the DBA version of "Demo Hell." For those unfamiliar Demo Hell is the condition caused by "Just make it look like it works" demo which later is mandated as the fremwork for the product.
The only virtue of writing code with punch cards was you always had to re-engineer. Sometimes it only seems like we have enabled ourselves to be dumber faster to more people than ever.
I guess this IS job security.
Admin
I blame advances in hardware. It is possible that Chris's PHB started out back in the day, (about 1975 or so), when computers were small and slow. For people who started out back then, today's computers have infinitely large memories, infinitely large disk drives, run infinitely fast, and cost nothing. Therefore, why bother optimizing at all?
There actually is a fairly common management belief, that it is easier and cheaper to fix slow software, with faster and larger hardware.
Admin
That's insane and probably illegal.
There's a cleanup here every X days. It doesn't get deleted, it gets moved to an "old" folder and left there.
All work email gets filed under the correct permanent folder on the server.
I did notice, however, that email in the "Done" directory (created by me) did not get moved.
You should BCC yourself a copy of Important Emails to your home address. You can bet your ass that the bosses will blame you for anything illegal (including the removal of the emails), and if you can provide emails when they can't, then you get out of jail free.
For bonus points, report them and see if you get on Time, or get a cheque or something.
Admin
Or were you trying to refer to somebody elses post?
Admin
Oh! I know! How about the one that caused the Web server (which ran the web app and the Informix db) at my last job to go into a tail spin when you hit a query that used an index/foreign key that was out of sync, resulting in query costs of 2 billion? Oh, and did I mention all it took was the user hitting refresh on a query that wasn't returning to completely tie up both CPUs on the machine? Awesome . . .
Admin
Just looked at that... someone with too much time should update the article since it contradicts itself:
"Other names are [..] bozo sort" <-> "Bozo sort is another sorting algorithm" "It terminates for the same reason that the infinite monkey theorem holds" <-> "it may never terminate"
Admin
MAN O MAN thats a lot of Worse Than Failures...
Admin
Yeah, stupid users not having memorized every single product # in their system. How on earth do you put up with that?
Admin
Yeah, stupid users asking for everything with no concept of how big that is.
Admin
But it's also a bug. Sure, it's a non-fatal bug, and sure, removing it causes the program to run more efficiently, but using VARCHAR(100) is, actually, the WTF here, and rightly so. ;)
I'd say that "proper database schema" belongs with the "design" phase, not the "implementation" (or even optimization) phase. Of course, this project sounds like there was only a single phase - "developmestuction".
Admin
I'm guessing he had one of those jobs where you get the responsibility but not the actually umm responsibility that comes with it, only the blame part when it blows up.
Admin
Aobe post was in "reply" to. sigh
Admin
I agree, perhaps there are some DBM's that will catch that and simply strip off the LIKE, but just to be safe i wouldn't let that happen in my code. The fact that it has 2 milion products is cute, but i'm going to take a wild guess that such a field is going to be referenced a lot and thus will be indexed quite well. So 2 milion should be ok. (unless your using mysql, since i've read it doesn't handle tables with more then a milion rows very well. But that's form the top of my head, so don't hold it against me)
Then there's the datatransfer/databuildup, which can be sped up by using a LIMIT, to ummm. limit the amount of rows and thus data. Not always offcourse, but that's why god invented 'There where to many items, please limit your search' errors.
Also concering the discussion about using SQL or program logic to solve the sorting and grouping of data. When needed i mostly make a hybrid. Everything that can be done i'll do in the application, if it can't be done because the main output needs to be altered then i'll hand it to the database.
It's not really that much more code. (depending on the feature list) But will help in keeping it fast yet easy to implement fancy stuff. And besides, the extra work when abstracted right is only once or twice, because you can basically use it for all the following projects.
Admin
Why couldn't Chris just run an ALTER TABLE and change the column types to something more sensible? Column types are an optimization, and you should be able to change them later, and it should make no difference to your software. If Chris couldn't even change the VARCHAR(100) to VARCHAR(8) then perhaps Chris or SQL server is the real WTF?
Admin
Some excellent comments here on "optimise later". I've always phrased it as "you have to design for performance from day 1" rather than "optimise from day 1" to avoid misconceptions.
The bit that really makes me sad (or mad, I'm not sure which) is "Despite his best efforts, Chris took the blame for not tuning it correctly.". It seems like the DBAs get the blame for all of these situations. Hope all you DBAs are storing all your CYA documentation somewhere safe...
B
Admin
There are a few comments that need to be made here...
First of all, in the context of databases "premature optimization" typically means de-normalizing. When building an OLTP system, one should never, ever, under any circumstances build a system that is not 3NF until after the system is built and benchmarked. Even then, it is a bad idea. Unless one is very experienced in successfully building OLAP systems, they should be normalized as well in their initial iteration.
Second, this tweaks one of my pet peeves: fundamental failure to understand the strengths and weaknesses of a particular platform. The Sybase lineage platforms are historically extremely bad at handling VARCHAR types. This system may have run acceptably, not that I would recommend this, on Oracle since Oracle has long handled VARCHARs as a "first class" data type. Sybase lineage systems until relatively recently stored VARCHARs on different disk pages than the rest of the row. Aside from their poor locking schemes, these systems need multiple disk seeks to retrieve VARCHAR data types while Oracle stores then in the same page and so can retrieve them in one seek.
While their is no excuse for not using date and numeric fields appropriately, using overly large text fields is not a big deal provided that your platform can deal with them efficiently. I'm not up on current Sybase/SQL Server products, but for older products one should stay away from them as much as possible, especially for fields that might be searched. Maximum use of CHAR data types will do wonders for those platforms.
Admin
Admin
Admin
Admin
That's why you BCC: to your other (Hotmail, Gmail, ??) e-mail account...
Admin
Funny I had the same problem at my old company. I was the developer/designer of a Palm OS Application with an DB2 database. Some day ony of my bosses came and asked me if I could change all the fields to VARCHAR, because he mostly didn't understand why all those other types where neccessary. I told him we would get serious performancy issues if we followed his idea and I would take no responsibility for this. I think I don't have to mention that performance is a real problem on a m515 Palm.
Admin
In this case, the reason is a custom report-building system integrated into the rest of it. As we don't know at design-time what the users will ask for, we allow them to search on any field with any parameters.
It's not a case of users needing to memorise 2 million product numbers - they don't, because the system can handle that for them, providing they don't break the functionality by making stupid reports (and besides, the product numbers they are using are required by company policy to be written down and kept on paper). In this particular report the user had specified a product number correctly, for their first search term. Unfortunately in the other 5 possible parameters (which we allow them to either AND or OR), they had ProductNum LIKE %%.
It's not my decision to keep it like this, I was brought on-board to do maintenance and develop a few new sections, not entirely rewrite major parts of the database schema :D
Admin
My brain was ready to crash, and then i realised '3x36G 15K SCSI RAID5' was not l33t sp34k...
Admin
they should've bought a new server. our programming "guru" gets a new server when his programs run slow...
this article reminds me of his database design skills.
he even has in-table tables large blob fields with strange data stuffed in them.
this works like an own table. he has a field "invoicedata" and a function get_value()
when he calls get_value(invoices,invoicedata,customer)
the function returns the value for "customer" coded in the blob field "invoicedata" in table "invoices"...
he even uses this function in selects.
edit: i'm glad, he isn't allowed to touch our iSeries systems. he lives in his own world of windows, delphi and interbase
Admin
echo "Chris new that VARCHAR(100) was here to stay" | sed s/new/knew/
Admin
In my opinion it is the technical lead's responsibility to advice management management on technology decicions. If the CEO is talking rubbish and suggesting solutions that are not feasible from technical experts viewpoint, then the technical lead should put his/her foot down and stand his/hers ground, instead of sheepishly crawling back to his cubicle on the first counter argument.
The real WTF here is not that CEO made stupid decicion to continue with the current development model, but that a person who cannot communicate a clear case of "not going to work" to management had been appointed to lead technical role.
Admin
it's no optimization matter, it's a conception thing..
Admin
My post and the one it replied to reference the generic/general misuse of the "premature optimization" idea, not necessarily the exact code in the original post.
If anything, it is more related to the book mentioned and not the actual code.
Admin
No one's even mentioned running a DB server on RAID5 yet :-)
Admin
Pffft. Everybody knows all you would ever need is one table with two columns:
CREATE TABLE THE_DATABASE ( Ident BIGINT, Stuff XML )
Admin
Admin
Admin
I agree. As tech lead it was Chris's job to drop the hammer on this kind of sh*t.
He should have been able to explain that what would take a week(?) now would take months(?) in the near future.
If it was a public company then his responsibility would be to go to the board, etc. If it was a private company the the CEO owned it, he should gracefully get the hell out of there while he could get references.
If a CEO doesn't trust the tech lead in something as simple as this, in a tech company, then why the hell would you want to work there?
violentEd
Admin
Start from something halfway decent and optimising is really tweaks round the edges; start from something like this example and it's basically a rewrite.
Admin
Do you have a particular example of this not working? I'm sure they'd be happy to hear about it.
Admin
OMG, that sounds just like my old company. EVERYTHING was an nvarchar(50). There was even a guy named Chris. Chris? Is that you? Come be unemployed with me! It's better than working there . . .
Admin
Is that db in zeroth normal form, or negative normal form, or abnormal form?
I leave it as an excercise to the reader to make the funniest case for one of the above...
Admin
If your title/role isn't "analyst/programmer/developer", then there are plenty of programmers who will assume you must be a glorified system operator (someone who unjams printers, reboots machines, and resets passwords).
It's a mindset that stems from the ancient seniority structure: operator->programmer->designer->analyst
What's missing here? The people with most high-end skills were not even on staff - they were contracted from the hardware vendor. Nowdays they are on staff, and have to deal with much more complex systems, but are still seen as the bottom of the chain.
Sadly, some network admins / database admins etc deserve to be treated this way, and unfortunately its the kind of role that incompetent people get shovelled into. This site has a few examples.
Admin
Admin
humm yeah, i wonder why would one use the already hashed/indexed/cached sorting indexes of the database when you can just try to it by hand for 4 million records ... no .. really why would you ? :p
if databases would suck at sorting/caching/indexing, why would we use them in the first place ? we could just mount a nfs driver amongts our clustered machine and ignorantly live on the back of a textfile :p
speaking of which ...