The Daily WTF: Curious Perversions in Information Technology

2010-03-23 Reply Admin

BradC:
Ok, the DBA was a moran. But that doesn't mean that ALL DBAs who claim to know something about physical partitioning are idiots.
Even with a SAN, the physical configuration is certainly relevant to throughput.

RAID-10 on dedicated LUNs = happy users, happy DBAs RAID-5 on shared LUNs = unhappy users, unhappy DBAs

We get even better performance. Our system is a custom-built RAID-11. You see it goes to 11. Most systems just go to 10. Our's is one better, RAID-11.

2010-03-23 Reply Admin

This may have simply been a very old DBA.

I have a "database optimization" textbook from the 70s or early 80s. It goes into great detail on how to organize block access from disk. I'm sure this was important before caching, modern RAID design, modern file io, and modern RDMSes.

It's funny, but a predictable result of old knowledge not translating well.

2010-03-23 Reply Admin

Actually the Morans are a well known criminal family in Australia. As far as I'm aware they're not known for the DB administration skills.

But then, perhaps that was the previous poster's point.

dkf · 2010-03-23 Reply Admin

md5sum:
No, but when the software you design suffers in performance due to management being too cheap to buy the required hardware, then suddenly your value to the company becomes less because your software contribution is reduced to a sub-par level. Thus, you become expendable, since "a product can be bought off-the-shelf that will run as good as your software".

If you're insisting on developing something in-house when it is bettered by an external product and isn't an organizational core competence, who exactly is wasting money? Better to get the external thing in to deliver the service and to use the cash to do something that's actually valuable. (If it's a core competence that's being obsoleted by the outside world, you're in trouble and nobody else is going to care...)

alegr · 2010-03-23 Reply Admin

Todd C:
True, a point on the outside of the disc travels at a greater *linear* speed than a point nearer the center, but *both* points travel at the same *angular* velocity 7,200 or 10K rpm.

This is why the disk manufacturers put more blocks per track on the outside, than on the inside.

Todd C:
Hey Mr. DBA, when you were 12 and listened to vinyl records, (remember those things with one loooooong spiral groove?) did you only listen to the first song because the others were considered of 'impure quality'?

Because record wavelength gets shorter toward the last track, those on the inside sound noticeably worse, especially on the peak levels of high frequency components.

2010-03-23 Reply Admin

Aaron:
Uh... ok, the 10 partitions per drive bit was a little screwy, but there's an obvious throughput advantage to staggering frequently-used areas of the database across different physical disks. A single RAID 1 array doesn't cut it for a 200 gig DB.
A SAN isn't some magic panacea that will fully-optimize the performance of any database. For very large databases you should still be spreading the data/indexes across different LUNs which correspond to entirely different disks.

Where did the tech even find a 7200 RPM SCSI disk? Was this 10 years ago? Or was the server using SATA disks (the real WTF)?

Clearly the above comment can be added to the 'feck off' list. I manage and use 500Gb databases daily and a well chosen index and sensible query approaches mean we can keep a 100 user team happy day in, day out, and have done for the last 5 years. Maybe next year (if we start to see performane problems) we can hire a smart-arse to screw it up for us, or we can diseminate systems across SOA.

Too many times have I worked with arrogant pricks who (truly) think I am an idiot, THEY are the idiots (for example one said both Java and C# were crap and Powerbuilder was the future... I mean ffs... ARG!!! squared)

If you want to disappear up your own crack I guess 'tuning' is the way forward. The rest of us understand weaknesses on a given platform and hence play to the platform's strengths.

Now, feck off you know-it-all numb-nut useless gits, you couldn't program your way out of an Excel representation of a clock.

GrrRrrrrrrrrrrRrrrrrrRRrrrrrrrrrrrrrr

2010-03-23 Reply Admin

Nigel Tufnel:

We get even better performance. Our system is a custom-built RAID-11. You see it goes to 11. Most systems just go to 10. Our's is one better, RAID-11.

Why don't you just make the RAID-10 louder?

susan2012 · 2010-03-23 Reply Admin

With almost 2 million profiles, ____ S e e k I n t e r r a c i a l [DOT] c o m ___, the Interracial dating site, is the best source of dating profiles for Interracial Singles. JOIn for free now!!

SQLDave · 2010-03-23 Reply Admin

[quote user="dkf"][quote user="BradC"]RAID-10 on dedicated LUNs = happy users, happy DBAs RAID-5 on shared LUNs = unhappy users, unhappy DBAs[/quote]I thought that DBAs were only happy when they made users and developers miserable?[/quote

That's what we want you to think. Win!

SQLDave · 2010-03-23 Reply Admin

Steve H:
BradC:
Ok, the DBA was a moran.

Which Moran? Dylan? Caitlin? Surely not Kevin...

Erin.

2010-03-23 Reply Admin

Matt:
The concept of the outer tracks of the disk giving better throughput than the inner tracks is perfectly valid - Pillar Data do exactly this on their storage hardware as part of the QoS system in the SAN. Even a quick and dirty test with a 7.2krpm SATA drive shows >100MB/s on the outside tracks and closer to 50MB/s on the inner tracks.
Of course once you start reading and writing from all over the disk at random, of course you'll see a penalty, but the idea of only using the outer tracks of 10krpm disks and adding more spindles should result in some pretty astounding performance.

The problem is these days you don't have a clue what logical sectors correspond to the inside or outside tracks... most modern file systems no longer try to optimize layout for placing stuff in certain tracks and just try to keep stuff contiguous while minimizing seeks, I imagine databases do the same.

2010-03-23 Reply Admin

NewbiusMaximus:
Anon:
So TRWTF is that Paul allowed this total nonsense article to go out in a trade magazine... This highlights the real lack of scholarship in computer science circles.
Please don't conflate an unnamed DBA trade magazine with computer science scholarship. I know CS academics sometimes make bad programmers and sysadmins, but there's no way an article about this kind of bullshit would pass the laugh test at any reputable CS journal.

Academics FREQUENTLY make terrible awful horrible programmers, because they don't program... it's not that they are incapable, they lack experience, which is what you REALLY need to write good code.

SQLDave · 2010-03-23 Reply Admin

Just Kidding:
I never had this problem with Access

Y'all can quit commenting now. This guy (gal?) wins.

2010-03-23 Reply Admin

Anon:
NewbiusMaximus:
Anon:
So TRWTF is that Paul allowed this total nonsense article to go out in a trade magazine... This highlights the real lack of scholarship in computer science circles.
Please don't conflate an unnamed DBA trade magazine with computer science scholarship. I know CS academics sometimes make bad programmers and sysadmins, but there's no way an article about this kind of bullshit would pass the laugh test at any reputable CS journal.

Sorry, but I've seen stuff submitted to CS journals and been to CS conferences, and most of the bullshit I see there wouldn't pass the laugh test at even a low end hard science journal (physics, chemistry, biology, etc).

Are you talking about computer science or software engineering (or whatever)? They are not the same thing... computer science is where you make up an algorithm and prove that it's correct/bounds on runtime/space or that it's optimal under some assumptions, and I mean prove, mathematical truth... That's Computer Science. (sometimes they will go into does the big O bound really get you anything with real life sized workloads, like for example a simple binary heap usually beats binomial heaps in practice because you need a REALLY big heap for the asymptotic behavior to dominate because a binary heap is just so simple and relatively friendly to hardware that the "past a certain point" and "within a constant factor" become extremely relevant)

There is also lots of CS research done that has no real world applications because the assumptions that the proof started with do not hold in real life (for example atomic primitives there are some we have in hardware, and some we don't, lots of academic articles about wait free algorithms that depend on atomic primitives that don't exist in the real world, so the algorithm isn't actually wait free in real life and so is probably not actually useful)

Software Engineering is an area where you can get away with snakeoil and bull (but people who actually want to write good software and know what they are doing will hate you)

2010-03-23 Reply Admin

Kensey:
I know no sysadmins who (have admitted to me that they) are using company assets for personal use under false pretenses.

Of course not, when you put it like that.

2010-03-23 Reply Admin

Todd C:
Am I the only one to question the 'Certifiable DBA's' claim that the "outer ring" was the only decent usable space on the drive because it is furthest distance from the center?
True, a point on the outside of the disc travels at a greater linear speed than a point nearer the center, but both points travel at the same angular velocity 7,200 or 10K rpm.

Not only does this DBA know more (or so he thinks) than the sysadmin, he also knows things about platten hardware manufacturing than Maxtor and Seagate.

Hey Mr. DBA, when you were 12 and listened to vinyl records, (remember those things with one loooooong spiral groove?) did you only listen to the first song because the others were considered of 'impure quality'?

And linear velocity is what matters for throughput, however your operating system probably doesn't actually know much about the drive geometry these days...

2010-03-23 Reply Admin

md5sum:
EmperorOfCanada:
If you need an onsite DBA then whoever set up the system sucks. A well designed system should run until a severe hardware failure forces some admin to restore the latest backup. To me an onsite DBA would be like driving a car with a mechanic tinkering under the hood while you drive down the highway. Personally I have well used Oracle databases so old they are running on 233mhz machines(new when installed). A DBA was never involved with these. The machines are not remotely connected to the internet and are running in a very secure location so even the fact that they have been running an average of 1500 days without a reboot(or anymore upgrades) is not an issue.
I pity you and anyone who ever comes in contact with you.

Why?

They aren't broken, perform adequately, and are not at risk of break-in, why fix what ain't broke?

2010-03-23 Reply Admin

I understand that you don't want to ridicule this DBA by name. But his article was published in a trade magazine. People may be reading it and trying to apply his foolishness. Maybe you owe to your readers to identify this guy?

"Names or it didn't happen."

2010-03-23 Reply Admin

And then the single drive failed and the sysadmin was fired.

The end.

2010-03-24 Reply Admin

jfm3:
And then the single drive failed and the sysadmin was fired.
The end.

Huh?

While the Certified DBA was on vacation, Paul took a pair of the 10,000 RPM drives, mirrored them

A drive failed and only the sysadmin noticed because it's mirrored. Heck, performance might have *improved* a bit because the OS now had half the drives to manage. So he just has to be careful to keep the rebuild at a dull roar or schedule it when the DB isn't heavily used.

P.S. Alex, the captcha I got is easy to OCR. Nice contrast ratio and almost no distortion. Needs dirt, or at least some minimum distortion settings.

2010-03-24 Reply Admin

You know, this isn't common on embedded systems

2010-03-24 Reply Admin

BradC:
Ok, the DBA was a moran. But that doesn't mean that ALL DBAs who claim to know something about physical partitioning are idiots.
Even with a SAN, the physical configuration is certainly relevant to throughput.

RAID-10 on dedicated LUNs = happy users, happy DBAs RAID-5 on shared LUNs = unhappy users, unhappy DBAs

Can't expect too much of a Masaai warrior, can you?

2010-03-24 Reply Admin

And RAID-5 on shared LUNs = happy financial officers

2010-03-24 Reply Admin

Wow, that is f(#@)$ screwed up.

You know how I increase MY database access time? I upgrade my server from 2GB of RAM to 32GB, build the app 64-bit, and let the row cache pull every damn table into RAM except for the largest two.

And if that's not enough, you know what I do then? I RAID two freakin' 128GB SSD drives and serve the data off that.

What with the cost of SSD drives, all the fancy partition schemes in the world can go right into the toilet, beginning THIS YEAR.

2010-03-24 Reply Admin

It's 'moron', not 'moran', you fucking retard.

2010-03-24 Reply Admin

Cut down on your coffee, man. You're missing the fact that it's an in-joke.

(Sheesh, what a moran)

2010-03-24 Reply Admin

I had the pleasure of working with a certified DBA, who only had experience with MSSQL. Upon encountering our PostgreSQl db, he was shocked to find that we built our queries by actually understanding serial query language, and not using a GUI interface. The lead programmer tasked him with writing a simple query - it took him a week and he did it wrong. The DBA in question still works for that co. , although he does a simple data-capturing job now, for the same salary. Lesson learned: don't work for Americans, or NGO's.

2010-03-24 Reply Admin

Anonymous:
highphilosopher:
Raid 1 is better than it used to be??? It hasn't changed. RAID 1 is mirrored.

Maybe he was referring to raid systems that in the past didn't allowed parallel reads?

Maybe you should read yours and some before you post.

There's a subtle difference between RAID 1+0 and RAID 0+1 which has to do with cache. Years ago, the cache for 0+1 wasn't quite up to what is now RAID 1+0. It's much better now. I didn't say I'd prefer 0+1 over 1+0. Go back about 15 years or so and check out how it worked (badly) with Oracle, 0+1 works much better in case of disk failures than it did then. Resilvering is much less a killer. HTH!

2010-03-24 Reply Admin

David Lewis:
I had the pleasure of working with a certified DBA, who only had experience with MSSQL. Upon encountering our PostgreSQl db, he was shocked to find that we built our queries by actually understanding serial query language

The shock may have been something to do with the fact that you don't even know what 'SQL' means.

2010-03-24 Reply Admin

I'm not a "certified DBA" but in my experience:

use partitions for limiting space used by a specific thing, not for performance - they cause lots of unnecessary seeking. Also possibly slightly useful for limiting the effects of fragmentation.
use striping for performance, not concatenation
use mirroring (or RAID 5/6) for safety
use different physical drives for different things (if possible on separate controllers, but that isn't always financially feasible)

So, given 6 drives I'd have done:

drives 1+2 - mirrored (RAID 1)for system & logs
drives 3+4/5+6 - striped & mirrored (RAID 1+0/10) for database (or maybe try 3+4 for database, 5+6 for logs)

-No special partitioning on 3/4/5/6. If 1+2 used for system + logs, possibly have two partitions to separate those roles.

Also, have lots of RAM (don't want any paging) and a nice big BBWC.

2010-03-24 Reply Admin

Aaron:
Was this 10 years ago? Or was the server using SATA disks (the real WTF)?

Google has thousands of computers, petabytes of data and all he uses is desktop grade hardware with SATA disks. No SAN, No SCSI, No relational databases

It really shows all the DB gurus (and sysadmin gurus!!) stuff in different light...

2010-03-24 Reply Admin

Ignoring for the moment that it's been 15 years since you could actually get a drive to write to a specific physical part of the hard drive, the fastest read time is not from the outer rings. The fastest read time will be from wherever the read head spends most of its time. Given that the drive controller tries to have the head scan the requested sectors from inside to outside and then back, the read head will spend about 50% more time in the center of the platter than towards either the inside or the outside. So if you could control where the physical data is located (which you can't) you would want it towards the center.

frits · 2010-03-24 Reply Admin

Jenkins:
It's 'moron', not 'moran', you fucking retard.

No, it's moran, returd.

2010-03-24 Reply Admin

Nigel Tufnel:
BradC:
Ok, the DBA was a moran. But that doesn't mean that ALL DBAs who claim to know something about physical partitioning are idiots.
Even with a SAN, the physical configuration is certainly relevant to throughput.

RAID-10 on dedicated LUNs = happy users, happy DBAs RAID-5 on shared LUNs = unhappy users, unhappy DBAs

We get even better performance. Our system is a custom-built RAID-11. You see it goes to 11. Most systems just go to 10. Our's is one better, RAID-11.

So RAID-10 is twice as good as RAID-5 then?

2010-03-24 Reply Admin

Well, none of this would be an issue if they just used an embedded system without a file system...

...I'm just sayin'.

Kuba · 2010-03-24 Reply Admin

RadarBob:
The first moronity, foreshadowing all the subsequent moronity, was declaring that "the outer edge of the disk is fastest." Well, linear velocity, yes; but it's the same rotational velocity. I/O is the same rate regardless of which cylinder is being read.

Are you a troll?

If not, then you must have been out of touch with how hard drives are actually implemented. As in two decades or more out of touch.

Modern hard drives maintain an almost constant linear recording density over the medium. Thus the data rate from the outermost cylinder will be ~2x higher than from the innermost cylinder. It'd only be constant if the drive would vary its speed of rotation with head position, and that would be a killer to implement and quite pointless, too! Exercise to the reader: assuming that a 10k @ outermost track, 3.5" drive is continuously seeking between innermost and outermost tracks with a 10ms seek time, calculate the power required to keep the platter spinning up and down as needed ;)

Only the common optical drives (CD, DVD) maintain constant linear velocity of the medium (up to a point) and thus have a fixed data rate on the outer half of the drive or so. The innermost part of the spiral has such a short equivalent circumference that the medium would be mechanically overloaded if it was spun up fast enough to saturate the data bandwidth of the read channel. Thus CD/DVD drives relatively suck when accessing data close to the spindle -- they have to maintain linear velocity lower than on the outer tracks of the disc.

Thus, when you are mastering for CD/DVD and the disc is not full, it actually helps to put your data into a subdirectory, add a padding file to the root directory, and make sure the image generators orders data of the root directory before all other files.

Kuba · 2010-03-24 Reply Admin

Slicerwizard:
Herby:
This means that on the OUTER regions of the disk there is MORE data, and the inner regions of the disk there is LESS data. Given this fact, the problem in an outer region of the disk is LATENCY (it takes longer to get ot a particular chunk of data)
Right, because unlike the inner tracks, a given sector is somehow going to be, on average, more than half a revolution away? WTF?

On the inner tracks, that half a revolution may be say a megabyte of data. On the outer tracks, that half a revolution may be two megabytes of data.

So if you represent latency in terms of lost data bandwidth -- a reasonable way to look at it -- then yes, the latency does go up as you go farther out on the drive.

Latency is really a problem of robbing you of data bandwidth, so knowing that a certain seek costs you lost opportunity to transfer X megabytes of data transfer makes it easier to reason about.

Cheers!

Kuba · 2010-03-24 Reply Admin

Tim:
Ignoring for the moment that it's been 15 years since you could actually get a drive to write to a specific physical part of the hard drive, the fastest read time is not from the outer rings. The fastest read time will be from wherever the read head spends most of its time. Given that the drive controller tries to have the head scan the requested sectors from inside to outside and then back, the read head will spend about 50% more time in the center of the platter than towards either the inside or the outside. So if you could control where the physical data is located (which you can't) you would want it towards the center.

1. On most drives, the logical block address increases as you go inwards (or outwards) on the platters. You can't control exactly what goes where, but it's a fair assumption that a seek from the first to the last LBA will take the heads across the platter.

As for the scatter-gather you refer to in the second part of your post, you're of course right in terms of time, but not bandwidth. You get highest bandwidth by doing a streaming read starting at the outermost track. Typically you'd be requesting blocks of certain size, say 64kb long, adjacent to farthest-out LBA, and keep going. On some drives the farthest-out may be LBA=0, on some drives it may be LBA=max. If you do a streaming read from LBA=max/2, on a 3.5" drive you'll typically get a 25% reduction in bandwidth compared to outermost LBA.

2010-03-24 Reply Admin

spoken like a true DBA.

alegr · 2010-03-24 Reply Admin

Kuba:

Only the common optical drives (CD, DVD) maintain constant linear velocity of the medium (up to a point) and thus have a fixed data rate on the outer half of the drive or so. The innermost part of the spiral has such a short equivalent circumference that the medium would be mechanically overloaded if it was spun up fast enough to saturate the data bandwidth of the read channel. Thus CD/DVD drives relatively suck when accessing data close to the spindle -- they have to maintain linear velocity lower than on the outer tracks of the disc.

Modern optical drives do quasi-CAV (constant angular velocity). They don't have to spin up/down when seeking across the disk. Only when you do a streaming read on a particular radius, they will spin to some preferred speed for that position.

2010-03-24 Reply Admin

Anonymous:
So why can't Paul show us the magazine page?

Because he has no wooden table on which to photograph it.

2010-03-24 Reply Admin

I do hope the opportunity to get as many people as possible to write to said trade magazine explaining how bad an idea the arrangement was. You shouldn't let something like that lie if you can (correctly) belittle it anonymously...

2010-03-24 Reply Admin

A Government Agency I once worked at required your email name to be the first letter of your first name, followed by your last name.

Miguel Orona was

morona@____.gov

2010-03-24 Reply Admin

Spudd86:
Anon:
NewbiusMaximus:
Anon:
So TRWTF is that Paul allowed this total nonsense article to go out in a trade magazine... This highlights the real lack of scholarship in computer science circles.
Please don't conflate an unnamed DBA trade magazine with computer science scholarship. I know CS academics sometimes make bad programmers and sysadmins, but there's no way an article about this kind of bullshit would pass the laugh test at any reputable CS journal.

Sorry, but I've seen stuff submitted to CS journals and been to CS conferences, and most of the bullshit I see there wouldn't pass the laugh test at even a low end hard science journal (physics, chemistry, biology, etc).

Are you talking about computer science or software engineering (or whatever)?

Computer Science. Maybe it's just because it's a younger field, but CS people can't seem to write scientific papers for shit. I'm guessing most CS graduate programs don't include courses on how to actually write a scientific paper. We worked on a collaboration with a CS research at a major university. When we got the first draft of their paper, our VP initially wanted all references to us removed because it was so embarrassingly badly written. We eventually whipped it into an almost passable state, but I'm still slightly embarrassed that my name ended up on it.

2010-03-24 Reply Admin

Thanks to SAN virtualization, that's not even so much the case anymore.

2010-03-24 Reply Admin

3rd Ferguson:
/CAPTCHA: Odio, part of the flying monkeys' marching song

Those aren't flying monkeys, and that's not what they say. http://message.snopes.com/showthread.php?t=29275

I think it's selecting parts of words ("odious", in your case), since I was given "appellatio".

RogerWilco · 2010-03-24 Reply Admin

Garote:
Wow, that is f(#@)$ screwed up.
You know how I increase MY database access time? I upgrade my server from 2GB of RAM to 32GB, build the app 64-bit, and let the row cache pull every damn table into RAM except for the largest two.

And if that's not enough, you know what I do then? I RAID two freakin' 128GB SSD drives and serve the data off that.

What with the cost of SSD drives, all the fancy partition schemes in the world can go right into the toilet, beginning THIS YEAR.

Maybe in your world, but in my world I need Petabytes of storage and disks are still the way to go. (currently have about a Petabyte installed, going to 6 Petabytes before the end of the year). Normal data production is about 150 Terabyte/24 hours, this will triple by the end of the year.

2010-03-24 Reply Admin

TRWTF is that there was no test environment in which the DBA's partitioning theories could be tested before the production servers were changed.

2010-03-24 Reply Admin

PRMan:
Mikkel:
Also, Paul should be fired on the spot for doing this, he is messing around with something he clearly doesn't understand (not that the DBA was any wiser, it is however the DBAs responsability), having tools intentionally report faulty information will make debugging extremely problematic.
Yes, fire the guy that made it work and keep the guy who absolutely blows at his "responsability". Are you a manager?

The original DBA was wrong and arrogant. But putting redundant log files and control files on the same physical array isn't what I would consider "making it work". If this is a real time transaction system then he has just put his company at risk for real dollars, and he doesn't have a clue. And neither will anyone else because he was dishonest about it.

Absolutely fire Paul. If he wasn't so spineless they could have argued it out until all of the requirements were properly considered and addressed.

2010-03-24 Reply Admin

susan2012:
With almost 2 million profiles, ____ S e e k I n t e r r a c i a l [DOT] c o m ___, the Interracial dating site, is the best source of dating profiles for Interracial Singles. JOIn for free now!!

This must be the DBA that decided to change careers...

The Certified DBA

Leave a comment on “The Certified DBA”