The Daily WTF: Curious Perversions in Information Technology

OzPeter · 2013-03-14 Reply Admin

Stupid Pratik .. you start partying once the customer is happy .. not when you think that you are done.

2013-03-14 Reply Admin

OzPeter:
Stupid Pratik .. you start partying once the customer is happy .. not when you think that you are done.

Exactly. Especially if they have your phone number ;-)

2013-03-14 Reply Admin

A system that has to do queries across 200 million rows and all the testing only bothered with 200. WTF! Very lucky that this was solvable at all, let alone without time being spent adding indexes.

2013-03-14 Reply Admin

OzPeter:
Stupid Pratik .. you start partying once the customer is happy .. not when you think that you are done.

Nice thought an most of us would love to do it that way but most of us have management that is far to wrapped around the 8 to 5 MTWTHF Axel to make that work.

Chances are good Patrick is not getting anytime off during the week to let loose say Wednesday after the go-live has been solid for a couple days unless he takes vacation. So what is he to do sit quietly by the phone all weekend just in case it he gets a call?

I sympathize because I have been there. If you are not getting flex-time you just can't let your job trample your own time; because you end up with no time that is really yours.

2013-03-14 Reply Admin

The real WTF is putting anything into production on a Friday.

2013-03-14 Reply Admin

Yes that really is amazing. Not totally hard to understand though. It always amazing me how many shops simply don't have a representative test environment. They will stand up shelf after shelf of fast SAS disks for a production database server, and tens of front end webservers; and insist you do all your testing on some old PC with 2 gigs a memory, one ide harddisk with a total capacity not even large enough to store the complete production database; let alone freespace and log space.

They then will get all pissy your testing was not adequate to discover certain performance and scaling issues before going to production.

2013-03-14 Reply Admin

Truth. Never go full retard, and never go live with new code on Friday. That's TRWTF here.

The secondary WTF is "The test DB had 200 records. The pre production db has 200 million records."

Zemm · 2013-03-14 Reply Admin

csrster:
The real WTF is putting anything into production on a Friday.

Indeed, it is policy at my place to never deploy anything large on a Friday. We even avoid Thursday if we can help it.

2013-03-14 Reply Admin

I stopped reading when Pratik started to celebrate before the post-launch post-mortem. Pratik, what r you doin? Pratik, stahp.

2013-03-14 Reply Admin

Yeah, I always insist that the test data set is at most five orders of magnitude smaller than the production data, six orders is just too much...

I have to admit I'm curious - how exactly did they 'stress' test with 200 records?

CAPTCHA: sino - sino weevil hairno weevil

2013-03-14 Reply Admin

And that's why I went on a week-long vacation just before our project went live. After half a year of catching up to deadlines after deadlines, it was time to say "Screw you guys, I'm going home; the rest of you will manage, I'm sure."

2013-03-14 Reply Admin

But apart from going on a binge before you absolutely know that the system is running smoothly under full load, there are other WTFs as well:

If production has 200 mio records, testing on 200 is not testing. It's cheating. For god's sake, chances are that those measly 200 records do not even represent real data, but something thrown together in a hurry.
Starting 500 Threads in parallel "so that as much information is processed in parallel as possible". It will not be processed in parallel because the end-user application will most probably not run on a machine sporting 500 cores. Just throw the word "cost of task-switching" at those who decided on the 500 threads and, if you can see the big question mark in their face, a book on parallel computing.

2013-03-14 Reply Admin

Anonymous Paranoiac:
I have to admit I'm curious - how exactly did they 'stress' test with 200 records?

Clearly they didn't, which begs the next question, just how good were any of the tests?

2013-03-14 Reply Admin

Geoff:
. . . It always amazing me how many shops simply don't have a representative test environment. . . .

Ditto here. sigh It will be years, if ever, before we reverse all the junk that tested great with a data set a sliver of a fraction of reality. In one success though a certain process was taking anywhere from 3 hrs to forever; we threw away all that krap kode and now it takes seconds.

2013-03-14 Reply Admin

The WTF is not using connection pooling. That's basic stuff for enterprise systems like this. Not using realistic DB sizes/loads in the test environment is just the icing on the cake.

2013-03-14 Reply Admin

Mandatory related comic: [image]

2013-03-14 Reply Admin

Zemm:
Indeed, it is policy at my place to never deploy anything large on a Friday. We even avoid Thursday if we can help it.

This doesn't always make sense. If your production system is important enough to end users that it be available full time M-F, promoting to production Friday afternoon is the most sensible time as it gives you the most time to solve issues that come up during production.

Granted, it sucks to be the one on tap for promotion.

2013-03-14 Reply Admin

faoileag:
2) Starting 500 Threads in parallel "so that as much information is processed in parallel as possible". It will not be processed in parallel because the end-user application will most probably not run on a machine sporting 500 cores.

Given the life cycles of enterprise apps I've worked on, it eventually will be run on a 500 core machine. Maybe not for 20+ years, but it will.

2013-03-14 Reply Admin

No, it was PRE-production. Production was Monday.

2013-03-14 Reply Admin

Andrew:
faoileag:
It will not be processed in parallel because the end-user application will not run on a machine sporting 500 cores.
Given the life cycles of enterprise apps I've worked on, it eventually will be run on a 500 core machine. Maybe not for 20+ years, but it will.

Like! :-)

dgvid · 2013-03-14 Reply Admin

Anonymous Paranoiac:
... I have to admit I'm curious - how exactly did they 'stress' test with 200 records? ...

Someone played the roll of drill instructor and shouted "Faster! Faster!" at the test team while a comically over-sized digital clock on the wall counted down in hundredths-of-a-second precision. That was plenty stressful. Adding more database records on top of that would have just been cruel.

ubersoldat · 2013-03-14 Reply Admin

Drinking tequila on a MMORPG is not considered as "partying" you nerd.

2013-03-14 Reply Admin

Mike:
Anonymous Paranoiac:
I have to admit I'm curious - how exactly did they 'stress' test with 200 records?

Clearly they didn't, which begs the next question, just how good were any of the tests?

My guess is they were all equally "valid".

This also reminds me of the php/mysql listserver application we use at work. It's a disturbingly popular open source application that was written buy a guy who seems to think that about two hundred emails is a 'large list' and the code base is absolutely stuffed with WTFs. Things like doing ereg_replaces (yes, ereg*, not preg*) on constant values when str_replace or str_ireplace are more than sufficient and doing them on values that will never be needed. Also things like if and while blocks that run several hundred lines and dead code wrapped in if (0) tests or testing to see if a value is numeric like this:

if ($val == sprintf('%d', $val))

Oh, and injection attacks galore. It would crash repeatedly on email lists larger than 10-20k. A co-worker and I have at least re-factored if not almost completely re-written every major component of the application yielding orders-of-magnitude performance increases (and security).

2013-03-14 Reply Admin

Wow

200 records probably allows the db to load the whole table into memory resulting in fast query response times. 200 million and disk I/O becomes a significant factor.

Why would anyone think 500 threads was OK?

Next you will tell be that the server belonged to Mitch

2013-03-14 Reply Admin

faoileag:
2) Starting 500 Threads in parallel "so that as much information is processed in parallel as possible". It will not be processed in parallel because the end-user application will most probably not run on a machine sporting 500 cores. Just throw the word "cost of task-switching" at those who decided on the 500 threads and, if you can see the big question mark in their face, a book on parallel computing.

That really depends on the application at hand. Although 500 is a bit much, not all threads will be running all the time. They block on resources that don't respond immediately, like networks and disks. I worked on a multithreaded application that needed 30-50 threads (depending on the actual data put in) to keep 8 cores fully occupied.

captcha nulla - nulla pointer exception

2013-03-14 Reply Admin

Wait, why is "number of threads" a user-configurable parameter? Have the software figure out an optimal (or at least a workable) number of threads based on the circumstances, so the user doesn't just go "ooh, I'm gonna put in a big numbar so it gose fastar!"

2013-03-14 Reply Admin

Brompot:
faoileag:
500 Threads ... will not be processed in parallel because the end-user application will most probably not run on a machine sporting 500 cores.

That really depends on the application at hand. Although 500 is a bit much, not all threads will be running all the time. They block on resources that don't respond immediately, like networks and disks. I worked on a multithreaded application that needed 30-50 threads (depending on the actual data put in) to keep 8 cores fully occupied.

Correct me if I'm wrong, but even a blocked thread will be switched to, if only to see that the resource is still not available. I acknowledge that the number of threads needed to get 100% cpu load will be larger than the number of cores, but not in the order of two magnitudes. Back in the good ole days (tm) of 8086 machine coding lab in college we were forced to count the cyles per instruction cycles to determine the time a loop would take to complete, and basically something like this would apply here as well: determine the amount of time task-switching takes, find a good ratio of switching-time / running time per thread, throw in the number of cores and you should get a reasonable, and, most of all, researched number of threads an end user app should have. Something that was actually demanded from Pratik when he "bowed out and went back to sleep" :-)

dkf · 2013-03-14 Reply Admin

Your Name:
Wait, why is "number of threads" a user-configurable parameter? Have the software figure out an optimal (or at least a workable) number of threads based on the circumstances, so the user doesn't just go "ooh, I'm gonna put in a big numbar so it gose fastar!"

Because that's a difficult number to get right in general. Really. (It depends on what else is happening on the machine in question, which means that the app really doesn't know enough to figure it out for itself.) Enterprise apps, especially those that aren't on desktops, leave this as a tunable parameter for good reason.

But then some idiots turn the value up way too high. Ho hum.

dkf · 2013-03-14 Reply Admin

faoileag:
Correct me if I'm wrong, but even a blocked thread will be switched to, if only to see that the resource is still not available.

Normally, no, you're wrong. The OS knows that the thread is blocked in a system call (e.g., read from a socket) and won't give any time back to it until there is actually something to do. All the thread does in the mean time is occupy memory (while possibly getting paged out).

TRWTF is using threads to handle IO-bound problems, but many programmers have been making that one for a long time.

2013-03-14 Reply Admin

Your Name:
Wait, why is "number of threads" a user-configurable parameter?

So that, if you and your colleagues all compile on the same development server, you can get an advantage over them by knowing about that parameter :-)

Or as the saying goes: the first to run "make" get's one core, the first to run "make -j 10" gets all the cores :-)

2013-03-14 Reply Admin

dkf:
faoileag:
Correct me if I'm wrong, but even a blocked thread will be switched to, if only to see that the resource is still not available.
Normally, no, you're wrong. The OS knows that the thread is blocked in a system call (e.g., read from a socket) and won't give any time back to it until there is actually something to do.

Thanks, didn't know that. So the (sensible) number of threads does indeed solely depend on resource availability, meaning you have to make some clever guesses about how long your average thread will be blocking :-)

2013-03-14 Reply Admin

Some people always ask "what's the right setting for this config?"

If there were one right setting, it wouldn't be a config, now would it?

Similarly, if code can arrive at the right value every time, you wouldn't need a config.

This is closely related to:

Luke: How do I do this?

Matthew: 1. Understand the tool. 2. Understand the problem. 3. Figure it out.

Luke: That sounds too much like thinking. Just give me the answer.

Matthew: Nobody gave me the answer.

Nagesh · 2013-03-14 Reply Admin

OzPeter:
Stupid Pratik .. you start partying once the customer is happy .. not when you think that you are done.

Agree completely. one of our in-house memo state this, do not drink and party immediately after release.

2013-03-14 Reply Admin

well, since i work on a bank, friday is the only day we CAN do this.

i dont know how it works in your country, but here (brazil) banks must pay a fine for each minute off-line.

[http://www.bcb.gov.br/?spb]

can you imagine millions of clients unable to do financial transactions on a business day?

2013-03-14 Reply Admin

csrster:
The real WTF is putting anything into production on a Friday.

Just don't pay the employees on a Friday. Then no party.

2013-03-14 Reply Admin

Your Name:
Wait, why is "number of threads" a user-configurable parameter? Have the software figure out an optimal (or at least a workable) number of threads based on the circumstances, so the user doesn't just go "ooh, I'm gonna put in a big numbar so it gose fastar!"

It's perfectly rational when your "user" in this case is an architect, DBA, or other person who gets paid to figure this out.

Or that was the theoretical justification at my last job. In practice it was a management mandate that everything uses 7 threads.

(Also our "multithreading" wasn't actual multithreading and was in fact retarded, but that's a different topic.)

2013-03-14 Reply Admin

AL:
This doesn't always make sense. If your production system is important enough to end users that it be available full time M-F, promoting to production Friday afternoon is the most sensible time as it gives you the most time to solve issues that come up during production.

There is no ideal time.

If you promote friday, and the reality is production issues don't show up until monday morning because no matter how much you've tested there's always that guy who does something you really didn't expect. Well, you've lost the weekend as an opportunity to fix things without impacting users.

So instead you're into monday, and you may lose tuesday as well... Twice I observed deployments which took the Oracle database offline for a day... something about a cascading trigger storm consuming all resources. So it took two days to roll it all back.

Now imagine you work for a company that has to be available 24x7... and think about all the work that needs to go into making everything resilient.

2013-03-14 Reply Admin

I see this has become ::puts glasses on:: a THREAD war!

YEAHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

2013-03-14 Reply Admin

I think that would've been a cooler ending

Roby McAndrew · 2013-03-14 Reply Admin

If your threads are doing number-crunching, then one thread per CPU core is about the right order of magnitude. If your threads are going to spend their time waiting for I/O then one thread per I/O channel is the right order of magnitude. In this case, one thread per database connection sounds about right.

2013-03-14 Reply Admin

faoileag:
2) Starting 500 Threads in parallel "so that as much information is processed in parallel as possible". It will not be processed in parallel because the end-user application will most probably not run on a machine sporting 500 cores. Just throw the word "cost of task-switching" at those who decided on the 500 threads and, if you can see the big question mark in their face, a book on parallel computing.

Not true at all. You're assuming all processing is done locally on those 500 threads.

Let's say you have 500 external resources that all take 10 mins to complete operation. Starting up 500 threads makes sense as you'll have all those threads waiting. Of course the alternative is asynchronous communication, but what if you're working with inputs that can't be asynchronous?

500 can't possibly be reasonable though, but I can see having more threads than cores and still getting benefit out of it.

2013-03-14 Reply Admin

For me the WTF is calling someone a Sunday morning for a preproduction bug.

Those can usualy wait for Monday. If the window for preproduction was only the week-end than this is the WTF : pre-production cycle take way longer than 2 days !

2013-03-14 Reply Admin

Nagesh:
OzPeter:
Stupid Pratik .. you start partying once the customer is happy .. not when you think that you are done.

Agree completely. one of our in-house memo state this, do not drink and party immediately after release.

To which I replied. Pay me overtime.

Business should learn to stop butting into my personal life unless they at least acknowledge that they are doing so.

If the owner thinks that it's ok because he's 24/7, then maybe he should consider handing out part-ownership.

You can't expect to take up my entire life without offering anything more than a salary. At least a 24/7 daycare!!!

FragFrog · 2013-03-14 Reply Admin

Steve:
If you promote friday, and the reality is production issues don't show up until monday morning [...] Now imagine you work for a company that has to be available 24x7

If your company has to be available 24/7, why would the bug not be found till monday? After all, 24/7 means there are people using it on saturday and sunday too :)

When we deploy updates, it's usually on friday as well. Surprisingly, we get the least number of users on saturday, and most of our customers don't open monday untill the afternoon, which means we have two days to fix problems (or simply rollback the update and try next week, if a problem is too big - our systems can generally do a rollback in minutes if need be).

Anything overlooked during the weekend is generally found on monday morning before most of our customers online, so over the past few years we've never had any serious problems due to updates. Of course, that often means working weekends, but since my hours are flexible anyway that's not really an issue for me.

Steve The Cynic · 2013-03-14 Reply Admin

faoileag:
Back in the good ole days (tm) of 8086 machine coding lab in college we were forced to count the cyles per instruction cycles to determine the time a loop would take to complete, and basically something like this would apply here as well:

8086 or 8088?

It makes a difference. Apart from oddities like MUL and DIV, most instructions on an 8088 take longer to fetch from memory than they take to execute - a typical Reg->Reg arithmetic instruction will take two bytes = 2 memory cycles = 8 clocks to fetch, and three or four clocks to execute. While it is executing, the next instruction is being fetched, so an 8088's memory subsystem is active almost 100% of the time, except when some lunatic executes MUL or DIV, which are monstrously slow, and the execution subsystem normally doesn't manage more than 50% utilisation.

The 8086 can load two bytes at a time, if they are at an even address and the next higher odd address, so the balance is less uneven than on an 8088, but it is easy to get an odd program counter, at which point you are back in the 8088 situation...

Counting the strict execution clock cycles is a waste of time on these chips, as it is on anything after the 486, although for different reasons.

2013-03-14 Reply Admin

The business users had kicked off what they thought would be a 15 minute job early Saturday afternoon. When they came back several hours later, all the applications were hung. The users couldn't access the database. After everything had been so thoroughly tested, how could such a catastrophic failure have happened?

Well, _maybe_ the problem could be that they kicked off the update and no one even bothered to monitor it? ;)

2013-03-14 Reply Admin

Anonymous Paranoiac:
My guess is they were all equally "valid".
This also reminds me of the php/mysql listserver application we use at work. It's a disturbingly popular open source application that was written buy a guy who seems to think that about two hundred emails is a 'large list' and the code base is absolutely stuffed with WTFs. Things like doing ereg_replaces (yes, ereg*, not preg*) on constant values when str_replace or str_ireplace are more than sufficient and doing them on values that will never be needed. Also things like if and while blocks that run several hundred lines and dead code wrapped in if (0) tests or testing to see if a value is numeric like this:
if ($val == sprintf('%d', $val))
Oh, and injection attacks galore. It would crash repeatedly on email lists larger than 10-20k. A co-worker and I have at least re-factored if not almost completely re-written every major component of the application yielding orders-of-magnitude performance increases (and security).

And where are your commits to the original project of these improvements? If you care to take from F/OSS and care not to return on the time investment others have extended for you, FOAD.

Seriously, if you can't be bothered to upstream improvements just fucking die. Write your own shit, from scratch, in ASM. If your project or employer won't allow for upstream patching, go with a commercial solution. You don't get to complain that you refactored if not almost completely re-written something you got for FREE.

2013-03-14 Reply Admin

faoileag:
dkf:
faoileag:
Correct me if I'm wrong, but even a blocked thread will be switched to, if only to see that the resource is still not available.
Normally, no, you're wrong. The OS knows that the thread is blocked in a system call (e.g., read from a socket) and won't give any time back to it until there is actually something to do.
Thanks, didn't know that. So the (sensible) number of threads does indeed solely depend on resource availability, meaning you have to make some clever guesses about how long your average thread will be blocking :-)

Windows has a kernel object called an I/O Completion Port, to help limit context switching overhead. You tell it how many concurrent threads you want it to run. When threads ask the completion port for work to do (GetQueuedCompletionStatus), they block if the number of runnable threads associated with the port is already greater than the concurrency limit you specified. A thread is released if there is work queued up, and a thread associated with the port blocks. It means that the number of threads running is usually slightly greater than the concurrency limit, but not much so. It is up to you to spin up enough worker threads to handle the expected workload, though.

To queue work you either call PostQueuedCompletionStatus, or you can associate other handles with the completion port: when an asynchronous I/O completes on a handle associated with the port, that queues the result on the port (and possibly wakes a thread if the conditions are right). The threads are woken in First-In-First-Out order, to take advantage of any cache locality.

Windows' other thread pooling infrastructure is based on completion ports, as is .NET's ThreadPool class.

Threads still have some overhead - they have user-mode and kernel-mode stacks that must be maintained, so they consume quite a bit of address space. They can be swapped out, but that leads to pretty poor performance! So it's a good idea to limit it to a few tens of threads rather than hundreds. Depending on how many processor cores you have available, of course.

2013-03-14 Reply Admin

Mark:
... This is closely related to:
Luke: How do I do this? Matthew: 1. Understand the tool. 2. Understand the problem. 3. Figure it out. Luke: That sounds too much like thinking. Just give me the answer. Matthew: Nobody gave me the answer.

Jesus Christ! Now we are quoting the bible here?

2013-03-14 Reply Admin

Adding more threads doesn't necessary == "Parallel".

Less is More

Leave a comment on “Less is More”