The Daily WTF: Curious Perversions in Information Technology

2010-03-23 Reply Admin

If you need an onsite DBA then whoever set up the system sucks. A well designed system should run until a severe hardware failure forces some admin to restore the latest backup. To me an onsite DBA would be like driving a car with a mechanic tinkering under the hood while you drive down the highway. Personally I have well used Oracle databases so old they are running on 233mhz machines(new when installed). A DBA was never involved with these. The machines are not remotely connected to the internet and are running in a very secure location so even the fact that they have been running an average of 1500 days without a reboot(or anymore upgrades) is not an issue.

md5sum · 2010-03-23 Reply Admin

PG:
Started as a sysadmin many, many morans ago...

FTFY ;-P

Jaime · 2010-03-23 Reply Admin

Todd C:
Am I the only one to question the 'Certifiable DBA's' claim that the "outer ring" was the only decent usable space on the drive because it is furthest distance from the center?
True, a point on the outside of the disc travels at a greater linear speed than a point nearer the center, but both points travel at the same angular velocity 7,200 or 10K rpm.

Not only does this DBA know more (or so he thinks) than the sysadmin, he also knows things about platten hardware manufacturing than Maxtor and Seagate.

Hey Mr. DBA, when you were 12 and listened to vinyl records, (remember those things with one loooooong spiral groove?) did you only listen to the first song because the others were considered of 'impure quality'?

BTW, hard drives pack data at a higher angular density on the outer tracks to give a consistent linear density.

Therefore, the outer tracks really are faster. Unfortunately, the difference between the slow tracks and the fast tracks is only about 50%. You can't fix a really bad system with this technique, you can only squeeze a bit more performance out of a system like this.

md5sum · 2010-03-23 Reply Admin

EmperorOfCanada:
If you need an onsite DBA then whoever set up the system sucks. A well designed system should run until a severe hardware failure forces some admin to restore the latest backup. To me an onsite DBA would be like driving a car with a mechanic tinkering under the hood while you drive down the highway. Personally I have well used Oracle databases so old they are running on 233mhz machines(new when installed). A DBA was never involved with these. The machines are not remotely connected to the internet and are running in a very secure location so even the fact that they have been running an average of 1500 days without a reboot(or anymore upgrades) is not an issue.

I pity you and anyone who ever comes in contact with you.

2010-03-23 Reply Admin

Because we all know that for modern drives logical block numbers allow you to determine the physical location on the disk </sarcasm>

2010-03-23 Reply Admin

EmperorOfCanada:
If you need an onsite DBA then whoever set up the system sucks. A well designed system should run until a severe hardware failure forces some admin to restore the latest backup.
........

Personally I have well used Oracle databases so old they are running on 233mhz machines(new when installed).

But you are assuming that the application never changes, has no bugs, the data requirements never change, you never have to put on a patch or upgrade for mandatory security issue, etc...

As far as clock speed.....

Damn young kids get off my lawn.

Try Oracle on a VAX 11/785 with 16Meg of memory and RA-81 disk drives. Ran like a champ.

2010-03-23 Reply Admin

How would coalescing multiple mount-points into a single mirrored mount-point have anything to do with (never mind reducing) the volume of I/O reads? That indicates that the behavior of the database changed dramatically, and how/why would that be? I'd be willing to believe that the average service time of each I/O read improved (somehow), but a change in the volume of I/O reads has nothing to do with directories and mount-points.

It's a little like asserting that washing my car over the weekend has made my commute shorter this week, while overlooking that it is spring break and many families are on vacation, so there is less traffic congestion at rush hour.

2010-03-23 Reply Admin

Todd C:
Am I the only one to question the 'Certifiable DBA's' claim that the "outer ring" was the only decent usable space on the drive because it is furthest distance from the center?
True, a point on the outside of the disc travels at a greater linear speed than a point nearer the center, but both points travel at the same angular velocity 7,200 or 10K rpm.

Not only does this DBA know more (or so he thinks) than the sysadmin, he also knows things about platten hardware manufacturing than Maxtor and Seagate.

Hey Mr. DBA, when you were 12 and listened to vinyl records, (remember those things with one loooooong spiral groove?) did you only listen to the first song because the others were considered of 'impure quality'?

The difference isn't there these days, most newer drives uses the extra space on outer rings for replacement area for bad sectors, but back in the olden days having data on outer part of the drive would mean faster throughput.

Kensey · 2010-03-23 Reply Admin

md5sum:
Every sysadmin I've ever known has had a company provided server at home, purportedly to "test installations of new server packages in their free time". Most of them have hosted websites on them for a personal profit.

Every sysadmin I've known who has company-owned kit at home has it for one of two reasons:

It's actually getting used for employer-related purposes (and not for personal gain)
It was going to be tossed outright, so they basically pulled it out of the dumpster but skipped the dumpster step.

I know no sysadmins who (have admitted to me that they) are using company assets for personal use under false pretenses.

2010-03-23 Reply Admin

Todd C:
True, a point on the outside of the disc travels at a greater *linear* speed than a point nearer the center, but *both* points travel at the same *angular* velocity 7,200 or 10K rpm.
Hey Mr. DBA, when you were 12 and listened to vinyl records, (remember those things with one loooooong spiral groove?) did you only listen to the first song because the others were considered of 'impure quality'?

These two media work in different ways:

On a HDD, data is at an equal density throughout. Therefore, it is the linear speed that matters, because all sectors take up the same amount of linear space (the inner edge therefore contains few sectors, but the outer edge contains far more. Both of these edges are read in the same amount of time, thanks to a constant angular velocity, so the outer edge is has faster data access)

A record, on the other hand, actually does rely on storing a given amount of music in a given angle of rotation (about 180 degrees for each second of sound stored), so it really does cram the music more tightly at the inner track.

However, the real problem here, as others have stated, is not data density, but forcing the disk to thrash. Partition a disk as many times at you want, but your heads can only be in one position at a time!)

2010-03-23 Reply Admin

what's a "moran" moron

2010-03-23 Reply Admin

Burleson?

2010-03-23 Reply Admin

Mikkel:
This smells fishy, Paul claims he could achieve orders of magnitude more performance by putting up a mirrored harddrive compared to many disks with dedicated partitions, and it was an I/O bottleneck? Even if the data was spanned and the DBA used some very old way of doing stuff, the fact is there are more drives available and should have had higher I/O throughput.
Also, Paul should be fired on the spot for doing this, he is messing around with something he clearly doesn't understand (not that the DBA was any wiser, it is however the DBAs responsability), having tools intentionally report faulty information will make debugging extremely problematic.

Yes, fire the guy that made it work and keep the guy who absolutely blows at his "responsability". Are you a manager?

2010-03-23 Reply Admin

I would imagine with the first configuration, the "DBA" was shooting for something like Oracle does (probably on Oracle is my guess... but I'm just a lowly developer). Oracle can take many hard drives and balance out the data across all of them. If one particular table or tablespace is getting lots of hits and it's being held up by being on a single drive, Oracle will spread out the table or tablespace across many drives so that the IO can be more parallelized.

I may have butchered a term or two in there... but I think that's at least part of what he was shooting for.

2010-03-23 Reply Admin

Did this story hit a little close to home? ;p

2010-03-23 Reply Admin

The thing about faster I/O at the rim of a drive is true. We set up our systems to use only the one outermost track. It's unbelievably fast! Plus the entire partition can fit in the drive controller's cache.

2010-03-23 Reply Admin

I never had this problem with Access

2010-03-23 Reply Admin

Third

2010-03-23 Reply Admin

Mikkel:
having tools intentionally report faulty information will make debugging extremely problematic.

Yeah, that wasn't the right approach.

He should have just reverted back to the DBA's configuration the day the DBA returned. When users complain that it got slow again, tell them you just implemented the DBA's latest wizardry, and you are on standby to implement his next improvement too. Let him flounder a couple weeks while never achieving anywhere near the performance you did while he was gone. Then you go talk to the boss.

2010-03-23 Reply Admin

Da*n, not even third!

2010-03-23 Reply Admin

This link:

http://www.coker.com.au/bonnie++/zcav/results.html

has some graphs showing how transfer rate varies between the outer and inner edges of a drive.

As for seek times, there are two components to consider. There's the time needed to move the head to the desired track, and the time needed for the desired sector to rotate around to the head's position. If you only use the outer part of the disk you will reduce the average head-moving time a bit, but the rotational latency will not be affected.

2010-03-23 Reply Admin

I was going to post my comment one letter per post, so it would all

l i n e

u p

along the left edge. But I decided to spare you. Plus I'm lazy.

2010-03-23 Reply Admin

Just Kidding:
I never had this problem with Access

If you use Access, your brain is already warped to the point where you can't perceive problems if (when) then occur!

2010-03-23 Reply Admin

DotNot:
The thing about faster I/O at the rim of a drive is true. We set up our systems to use only the one outermost track. It's unbelievably fast! Plus the entire partition can fit in the drive controller's cache.

I'm ashamed to admit that I believed you until that last sentence.

2010-03-23 Reply Admin

Lowly Developer:
I would imagine with the first configuration, the "DBA" was shooting for something like Oracle does (probably on Oracle is my guess... but I'm just a lowly developer). Oracle can take many hard drives and balance out the data across all of them. If one particular table or tablespace is getting lots of hits and it's being held up by being on a single drive, Oracle will spread out the table or tablespace across many drives so that the IO can be more parallelized.

I think you are talking about ASM in Oracle. And yes it will spread out an object across what it "thinks" are different disks. I say thinks because it has no clue that those raw volumes may be on the same set of spindles. However it doesn't do the spreading based on IO load, it does it to when an object is created... it's just striping the object. It will re-stripe if you add or remove disks from that ASM disk group. Nothing prevents a few "hot blocks" from different objects being on the disk, and nothing will move it around. (Maybe they have added something in newer versions of ASM, and I haven't heard of it yet))

Now what makes things even more fun is that by default ASM wants to do it's own RAID type management of data. You have to tell it to not if you have a real hardware RAID setup.

2010-03-23 Reply Admin

Cujo:
It's not completely clear from the article but it was probably set up as RAID 5 originally. Oracle (for one) runs poorly on it. Raid 10, which is close to what the admin did, is the right way.
I had a customer who set this up for all their systems, whether they had Oracle or not, and once I pulled the DBs over to Raid 10 it improved performance and throughput considerably. (Raid 1 is much better than it was years back but I recommend RAID 10).

BAARF on.

If you do mostly reading, a large RAID 5 is generally faster (and it's always cheaper). If you do mostly lots of small writes, a large RAID 10 is going to be much faster. Hardware parity calculation speed is useless if you still have to read each block on each disk to rewrite it. Consider your needs.

2010-03-23 Reply Admin

Anonymous:
DotNot:
The thing about faster I/O at the rim of a drive is true. We set up our systems to use only the one outermost track. It's unbelievably fast! Plus the entire partition can fit in the drive controller's cache.

I'm ashamed to admit that I believed you until that last sentence.

Dude! And I even tried to leave you a clue!

2010-03-23 Reply Admin

It seems to me that there are two sides to this story, and that possibly both sides incompetent.

Changing the storage has no affect on how many IOPs are done by the database, only the length of time the IO takes.

A change in the database and/or usage of the database will change IOs - it doesn't matter how many disks there are or how the disk was configured.

2010-03-23 Reply Admin

As the "Paul" concerned, I naturally cannot give too much extra detail or clarifications without fingers being able to be pointed.

The disk scenario as presented through the Alex filter (and much Kudos to the man for doing so) is somewhat simplified to the extent that it can be explained in a sentence rather than several sides of paper. It really was a nightmare.

However I will add the following which do not I believe compromise anyone involved:

Firstly the "Certfied DBA" was certainly certified but not in any DBA - instead they were the only member of staff at the organisation who had taken the operating system training course and so were declared the local expert. This is despite the version of the OS they were certified on was roughly 6 years older than the current one being used. The old version did not support RAID in any shape or form....

I didn't actually care about the partitions, the point was that forcing concatenation across the disks rather than striping mean that the first disks filled then the second etc etc, so the first disks always had a higher load and no effort was used to utilise the other disks to share the load and so slowed things down.

The solution put in place and accepted was a bit more complicated, a group of disks were setup as a single striped volume and then mirrored. This allowed load spreading across the disks and let the RAID management software do the work that it was supposed to.

Finally the situation was somewhat political, my manager was kept in the loop but since the DBA had direct access to even higher levels of management and was very good at making their point we rolled with it.

2010-03-23 Reply Admin

This could be fixed with "drive balancing:"

http://www.gbrockman.com/drivebalance/

:P

2010-03-23 Reply Admin

Do you always ask questions like that when you're beating your life?

2010-03-23 Reply Admin

The real WTF is the sysadmin going out of his way to hide his efforts and letting the DBA think that his own work had paid off. If he's upset about the situation he's only got himself to blame.

2010-03-23 Reply Admin

highphilosopher:
Cujo:
It's not completely clear from the article but it was probably set up as RAID 5 originally. Oracle (for one) runs poorly on it. Raid 10, which is close to what the admin did, is the right way.
I had a customer who set this up for all their systems, whether they had Oracle or not, and once I pulled the DBs over to Raid 10 it improved performance and throughput considerably. (Raid 1 is much better than it was years back but I recommend RAID 10).

BAARF on.

Raid 1 is better than it used to be??? It hasn't changed. RAID 1 is mirrored.

http://en.wikipedia.org/wiki/Nested_RAID_levels

http://en.wikipedia.org/wiki/Standard_RAID_levels

Read some before you post. There ARE purposes to each RAID configuration.

The definition hasn't changed, but the implementations sure as hell have; the difference between 1990 and 2010 RAID performance and reliability at all levels is pretty incredible.

mfah · 2010-03-23 Reply Admin

highphilosopher:
I disagree. As a lowly developer when I say, "we need these server specs to run this new app" management doesn't question it.

Instead of fixing your slow and bloated code I assume...?

2010-03-23 Reply Admin

You must be a DBA.

mfah · 2010-03-23 Reply Admin

EmperorOfCanada:
If you need an onsite DBA then whoever set up the system sucks. A well designed system should run until a severe hardware failure forces some admin to restore the latest backup. To me an onsite DBA would be like driving a car with a mechanic tinkering under the hood while you drive down the highway. Personally I have well used Oracle databases so old they are running on 233mhz machines(new when installed). A DBA was never involved with these. The machines are not remotely connected to the internet and are running in a very secure location so even the fact that they have been running an average of 1500 days without a reboot(or anymore upgrades) is not an issue.

You obviously haven't seen recent versions of Oracle then? Imagine a terminally ill patient that needs a small army of doctors just to keep basic vital signs going. That's recent versions of Oracle. Of course the marketing propaganda and the high entry cost makes it more attractive to management types ("it MUST be good...!")

For my money TRWTF here is that an RDBMS even requires a DBA to get down to this level of nitty gritty. Hardware should be almost a black box so far as a DBA - who's primary responsibility is the RDBMS software - is concerned. What I'm saying is that a DBA's input should start and stop at something like "I need guaranteed XX MB throughput and the transaction logs are going on the fastest disk(s)", and leave specifics to the hardware guys.

2010-03-23 Reply Admin

When people talk about seek times that differ between inner and outer regions of a disk drive they might have a bit of truth. The problem is that it is only a BIT. Nowdays disk drives with internal controllers use variable bit rate recording methods. This means that on the OUTER regions of the disk there is MORE data, and the inner regions of the disk there is LESS data. Given this fact, the problem in an outer region of the disk is LATENCY (it takes longer to get ot a particular chunk of data) and in the inner regions the problem is seek time. If one confines their data to a particular area of the disk (last Gbyte, or first Gbyte) these problems will balance out for the most part. If you have a drive of any size it will take some time to get the data, involving BOTH seek time and latency time.

In days of old, there were drives that homed the heads to track zero (yes, I worked on one) then did a seek to the cylinder in question. It made perfect sense to have well used data at the first cylinders. Later direct seeks were done, and this didn't make much sense, since ALL cylinders had the same number of sectors (like a PC's floppy disk does now). In that case you could put the "well used" chunk anywhere (inner or outer) and it made little difference. In an effort to add to the capacity, disks now use zone recording where the outer regions record LOTS of sectors, and the inner regions less. A CD-Rom is an example of this (it records from the inner region to the outer!). The original Mac's actually recorded their floppies this way, and got a whopping 10% more data (720k vs 800k). Modern multi TByte drives just pack them in, and with LARGE buffers most of the positioning of partitions is useless.

You can buy 1TB drives for less than $100 now. Use them wisely!

2010-03-23 Reply Admin

Paul (Another Paul):
If, as a sysadmin, you'd like to ship servers to your home, you should consider shipping a real life to your home instead.

I often interview candidates for UNIX sysadmin positions here at {some-fortune-global-500}.

One of the standard interview questions I like to ask candidates during technical interviews is "what do you run at home?"

Those who mention that they have their own servers at home doing X, Y, and Z, and get excited about it are more likely to get the job. They have passion for it.

2010-03-23 Reply Admin

'Tis spelled 'moron.'

2010-03-23 Reply Admin

He may be a moran, but does him being a "moran" make you a moron?

2010-03-23 Reply Admin

Mikkel:
This smells fishy, Paul claims he could achieve orders of magnitude more performance by putting up a mirrored harddrive compared to many disks with dedicated partitions, and it was an I/O bottleneck? Even if the data was spanned and the DBA used some very old way of doing stuff, the fact is there are more drives available and should have had higher I/O throughput.

Seen it, done it, proven DBA's wring with plain, common sense, and a bit of true understanding of hard drives and the logic in between the software and the byte on the platter.

It IS entirely possible to have a couple of orders of magnitude increased speed byu just doing it RIGHT. The DBA did not take the head movements of the disks that was created by the stupid partitioning scheme into account, and this alone can make a disk 100x slower compared to being a bit clever about it.

Sequential reading/writing with virtually no head movement is very fast and can easily achieve something like 60-70 Mbyte/sec on consumer-grade disks of today.

Average random search times for a disk is something like 10-14 ms. That average search is from middle of disk to any other location, not from end to end, which can actually easily hit 50ms.

Split a disk into 10 partitions, and you will have a nightmare scenario swapping between the partitions, and you can easily reach something like a situation where the disk spends 80-90% of available time doing head movements, and only 10-20% reading/writing, resulting in 6-14Mbyte/sec, at best.

Add latencies and other nasty stuff to the concoction, and you can have a far worse degradation.

His solution to mirror the disks, and split them in two groups, is the obvious and correct solution, and if needed, add additional, mirrored pairs, something that will increase read performance drasticly, and keep writes at an acceptable level.

Also, enable the drives write optimization, and you get better performance.

Mikkel:
Also, Paul should be fired on the spot for doing this, he is messing around with something he clearly doesn't understand (not that the DBA was any wiser, it is however the DBAs responsability), having tools intentionally report faulty information will make debugging extremely problematic.

I call BS on this. The DBA should be kicked out for misuse of company funds, and for not doing the proper analysis that he was paid to do, as well as for not listening to other people that actually DOES understand the hardware.

The "Not Invented Here" syndrome has caused neverending amounts of problem, not just in the IT industry, but predominantly so.

It was wrong to fake the DF command, yes.

As for messing about with something "he clearly did not understand" - Well, i truly call BS and WTF on this one, as it was the DBA that truly did not have all his ducks in a row, and the SA having a true point, as he actually did understand the hardware he was working with.

I would consider the DBA's actions borderline, if not crossing into gross misconduct, not only for work practice, but for his treatment of other co-workers, if he was within my organisation, as well as for wasting company funds in direct violation of commonly known "best practice" and for not listening to others that does have substantial knowledge within their field of expertise.

Just because you have a cert, that doesn't mean you can or should treat others like crap, or that you are always right.

Anyone who listens to others, and with logical and factual arguments can show when they are wrong, whilst still maintaining good manners, but also take in corrections or ideas from others, and not play "god", is a winner.

ContraCorners · 2010-03-23 Reply Admin

snoofle:
You give someone a certificate for showing up to a training class and they think they're a God.

So *that's* how He got the job!

2010-03-23 Reply Admin

moran?!

2010-03-23 Reply Admin

"Paul":
As the "Paul" concerned, I naturally cannot give too much extra detail or clarifications without fingers being able to be pointed.
...the DBA had direct access to even higher levels of management...

Sorry, you just gave away who you are to anyone who knows you. If the DBA was male you would have said "making his point". So it is a female "certified DBA" who isn't really a DBA but has 10-year-outdated skills, and is sleeping with the boss's boss.

Putting in a fake "df" command? I think you didn't go BOFH enough on her ass.

2010-03-23 Reply Admin

Drat! I edited out "and was very good at making their point".

2010-03-23 Reply Admin

Herby:
This means that on the OUTER regions of the disk there is MORE data, and the inner regions of the disk there is LESS data. Given this fact, the problem in an outer region of the disk is LATENCY (it takes longer to get ot a particular chunk of data)

Right, because unlike the inner tracks, a given sector is somehow going to be, on average, more than half a revolution away? WTF?

2010-03-23 Reply Admin

Man, I love it when someone calls someone else a moran.

2010-03-23 Reply Admin

Strangely enough, the DBA was right. In theory. You do get significantly better IO on the outer edge.

It's just something seems to have been going wrong in the more complex real world.

2010-03-23 Reply Admin

Sure - you do get better IO on the outer edge, but not enough to make up for having to move the disks head about.

Even just 25% more average movement will kill any advantage gained by this, and a significant additional movement of the heads through the partitioning, will literally trash the performance and hammer the disk so hard, that it will simply fail to perform.

As for latency - toss a coin. 50% chance on average, that the sector to be read/written, will be either in approximately the "right" position, and 50% that it will certainly be in the "wrong" position relative to the head.

Only large buffers on the disks will help solving this, together with elevator sorting ioop ordering performed by the disk itself, as only the disk itself knows where it currently is.

2010-03-23 Reply Admin

I hate it when people like that don't get their comeuppance.

The Certified DBA

Leave a comment on “The Certified DBA”