- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
If you need an onsite DBA then whoever set up the system sucks. A well designed system should run until a severe hardware failure forces some admin to restore the latest backup. To me an onsite DBA would be like driving a car with a mechanic tinkering under the hood while you drive down the highway. Personally I have well used Oracle databases so old they are running on 233mhz machines(new when installed). A DBA was never involved with these. The machines are not remotely connected to the internet and are running in a very secure location so even the fact that they have been running an average of 1500 days without a reboot(or anymore upgrades) is not an issue.
Admin
Admin
Therefore, the outer tracks really are faster. Unfortunately, the difference between the slow tracks and the fast tracks is only about 50%. You can't fix a really bad system with this technique, you can only squeeze a bit more performance out of a system like this.
Admin
Admin
Because we all know that for modern drives logical block numbers allow you to determine the physical location on the disk </sarcasm>
Admin
But you are assuming that the application never changes, has no bugs, the data requirements never change, you never have to put on a patch or upgrade for mandatory security issue, etc...
As far as clock speed.....
Damn young kids get off my lawn.
Try Oracle on a VAX 11/785 with 16Meg of memory and RA-81 disk drives. Ran like a champ.
Admin
How would coalescing multiple mount-points into a single mirrored mount-point have anything to do with (never mind reducing) the volume of I/O reads? That indicates that the behavior of the database changed dramatically, and how/why would that be? I'd be willing to believe that the average service time of each I/O read improved (somehow), but a change in the volume of I/O reads has nothing to do with directories and mount-points.
It's a little like asserting that washing my car over the weekend has made my commute shorter this week, while overlooking that it is spring break and many families are on vacation, so there is less traffic congestion at rush hour.
Admin
The difference isn't there these days, most newer drives uses the extra space on outer rings for replacement area for bad sectors, but back in the olden days having data on outer part of the drive would mean faster throughput.
Admin
Every sysadmin I've known who has company-owned kit at home has it for one of two reasons:
I know no sysadmins who (have admitted to me that they) are using company assets for personal use under false pretenses.
Admin
On a HDD, data is at an equal density throughout. Therefore, it is the linear speed that matters, because all sectors take up the same amount of linear space (the inner edge therefore contains few sectors, but the outer edge contains far more. Both of these edges are read in the same amount of time, thanks to a constant angular velocity, so the outer edge is has faster data access)
A record, on the other hand, actually does rely on storing a given amount of music in a given angle of rotation (about 180 degrees for each second of sound stored), so it really does cram the music more tightly at the inner track.
However, the real problem here, as others have stated, is not data density, but forcing the disk to thrash. Partition a disk as many times at you want, but your heads can only be in one position at a time!)
Admin
what's a "moran" moron
Admin
Burleson?
Admin
Yes, fire the guy that made it work and keep the guy who absolutely blows at his "responsability". Are you a manager?
Admin
I would imagine with the first configuration, the "DBA" was shooting for something like Oracle does (probably on Oracle is my guess... but I'm just a lowly developer). Oracle can take many hard drives and balance out the data across all of them. If one particular table or tablespace is getting lots of hits and it's being held up by being on a single drive, Oracle will spread out the table or tablespace across many drives so that the IO can be more parallelized.
I may have butchered a term or two in there... but I think that's at least part of what he was shooting for.
Admin
Did this story hit a little close to home? ;p
Admin
The thing about faster I/O at the rim of a drive is true. We set up our systems to use only the one outermost track. It's unbelievably fast! Plus the entire partition can fit in the drive controller's cache.
Admin
I never had this problem with Access
Admin
Third
Admin
He should have just reverted back to the DBA's configuration the day the DBA returned. When users complain that it got slow again, tell them you just implemented the DBA's latest wizardry, and you are on standby to implement his next improvement too. Let him flounder a couple weeks while never achieving anywhere near the performance you did while he was gone. Then you go talk to the boss.
Admin
Da*n, not even third!
Admin
This link:
http://www.coker.com.au/bonnie++/zcav/results.html
has some graphs showing how transfer rate varies between the outer and inner edges of a drive.
As for seek times, there are two components to consider. There's the time needed to move the head to the desired track, and the time needed for the desired sector to rotate around to the head's position. If you only use the outer part of the disk you will reduce the average head-moving time a bit, but the rotational latency will not be affected.
Admin
I was going to post my comment one letter per post, so it would all
l i n e
u p
along the left edge. But I decided to spare you. Plus I'm lazy.
Admin
Admin
I'm ashamed to admit that I believed you until that last sentence.
Admin
I think you are talking about ASM in Oracle. And yes it will spread out an object across what it "thinks" are different disks. I say thinks because it has no clue that those raw volumes may be on the same set of spindles. However it doesn't do the spreading based on IO load, it does it to when an object is created... it's just striping the object. It will re-stripe if you add or remove disks from that ASM disk group. Nothing prevents a few "hot blocks" from different objects being on the disk, and nothing will move it around. (Maybe they have added something in newer versions of ASM, and I haven't heard of it yet))
Now what makes things even more fun is that by default ASM wants to do it's own RAID type management of data. You have to tell it to not if you have a real hardware RAID setup.
Admin
Admin
Admin
It seems to me that there are two sides to this story, and that possibly both sides incompetent.
Changing the storage has no affect on how many IOPs are done by the database, only the length of time the IO takes.
A change in the database and/or usage of the database will change IOs - it doesn't matter how many disks there are or how the disk was configured.
Admin
As the "Paul" concerned, I naturally cannot give too much extra detail or clarifications without fingers being able to be pointed.
The disk scenario as presented through the Alex filter (and much Kudos to the man for doing so) is somewhat simplified to the extent that it can be explained in a sentence rather than several sides of paper. It really was a nightmare.
However I will add the following which do not I believe compromise anyone involved:
Firstly the "Certfied DBA" was certainly certified but not in any DBA - instead they were the only member of staff at the organisation who had taken the operating system training course and so were declared the local expert. This is despite the version of the OS they were certified on was roughly 6 years older than the current one being used. The old version did not support RAID in any shape or form....
I didn't actually care about the partitions, the point was that forcing concatenation across the disks rather than striping mean that the first disks filled then the second etc etc, so the first disks always had a higher load and no effort was used to utilise the other disks to share the load and so slowed things down.
The solution put in place and accepted was a bit more complicated, a group of disks were setup as a single striped volume and then mirrored. This allowed load spreading across the disks and let the RAID management software do the work that it was supposed to.
Finally the situation was somewhat political, my manager was kept in the loop but since the DBA had direct access to even higher levels of management and was very good at making their point we rolled with it.
Admin
This could be fixed with "drive balancing:"
http://www.gbrockman.com/drivebalance/
:P
Admin
Do you always ask questions like that when you're beating your life?
Admin
The real WTF is the sysadmin going out of his way to hide his efforts and letting the DBA think that his own work had paid off. If he's upset about the situation he's only got himself to blame.
Admin
Admin
Admin
You must be a DBA.
Admin
For my money TRWTF here is that an RDBMS even requires a DBA to get down to this level of nitty gritty. Hardware should be almost a black box so far as a DBA - who's primary responsibility is the RDBMS software - is concerned. What I'm saying is that a DBA's input should start and stop at something like "I need guaranteed XX MB throughput and the transaction logs are going on the fastest disk(s)", and leave specifics to the hardware guys.
Admin
When people talk about seek times that differ between inner and outer regions of a disk drive they might have a bit of truth. The problem is that it is only a BIT. Nowdays disk drives with internal controllers use variable bit rate recording methods. This means that on the OUTER regions of the disk there is MORE data, and the inner regions of the disk there is LESS data. Given this fact, the problem in an outer region of the disk is LATENCY (it takes longer to get ot a particular chunk of data) and in the inner regions the problem is seek time. If one confines their data to a particular area of the disk (last Gbyte, or first Gbyte) these problems will balance out for the most part. If you have a drive of any size it will take some time to get the data, involving BOTH seek time and latency time.
In days of old, there were drives that homed the heads to track zero (yes, I worked on one) then did a seek to the cylinder in question. It made perfect sense to have well used data at the first cylinders. Later direct seeks were done, and this didn't make much sense, since ALL cylinders had the same number of sectors (like a PC's floppy disk does now). In that case you could put the "well used" chunk anywhere (inner or outer) and it made little difference. In an effort to add to the capacity, disks now use zone recording where the outer regions record LOTS of sectors, and the inner regions less. A CD-Rom is an example of this (it records from the inner region to the outer!). The original Mac's actually recorded their floppies this way, and got a whopping 10% more data (720k vs 800k). Modern multi TByte drives just pack them in, and with LARGE buffers most of the positioning of partitions is useless.
You can buy 1TB drives for less than $100 now. Use them wisely!
Admin
I often interview candidates for UNIX sysadmin positions here at {some-fortune-global-500}.
One of the standard interview questions I like to ask candidates during technical interviews is "what do you run at home?"
Those who mention that they have their own servers at home doing X, Y, and Z, and get excited about it are more likely to get the job. They have passion for it.
Admin
'Tis spelled 'moron.'
Admin
He may be a moran, but does him being a "moran" make you a moron?
Admin
Seen it, done it, proven DBA's wring with plain, common sense, and a bit of true understanding of hard drives and the logic in between the software and the byte on the platter.
It IS entirely possible to have a couple of orders of magnitude increased speed byu just doing it RIGHT. The DBA did not take the head movements of the disks that was created by the stupid partitioning scheme into account, and this alone can make a disk 100x slower compared to being a bit clever about it.
Sequential reading/writing with virtually no head movement is very fast and can easily achieve something like 60-70 Mbyte/sec on consumer-grade disks of today.
Average random search times for a disk is something like 10-14 ms. That average search is from middle of disk to any other location, not from end to end, which can actually easily hit 50ms.
Split a disk into 10 partitions, and you will have a nightmare scenario swapping between the partitions, and you can easily reach something like a situation where the disk spends 80-90% of available time doing head movements, and only 10-20% reading/writing, resulting in 6-14Mbyte/sec, at best.
Add latencies and other nasty stuff to the concoction, and you can have a far worse degradation.
His solution to mirror the disks, and split them in two groups, is the obvious and correct solution, and if needed, add additional, mirrored pairs, something that will increase read performance drasticly, and keep writes at an acceptable level.
Also, enable the drives write optimization, and you get better performance.
I call BS on this. The DBA should be kicked out for misuse of company funds, and for not doing the proper analysis that he was paid to do, as well as for not listening to other people that actually DOES understand the hardware.
The "Not Invented Here" syndrome has caused neverending amounts of problem, not just in the IT industry, but predominantly so.
It was wrong to fake the DF command, yes.
As for messing about with something "he clearly did not understand" - Well, i truly call BS and WTF on this one, as it was the DBA that truly did not have all his ducks in a row, and the SA having a true point, as he actually did understand the hardware he was working with.
I would consider the DBA's actions borderline, if not crossing into gross misconduct, not only for work practice, but for his treatment of other co-workers, if he was within my organisation, as well as for wasting company funds in direct violation of commonly known "best practice" and for not listening to others that does have substantial knowledge within their field of expertise.
Just because you have a cert, that doesn't mean you can or should treat others like crap, or that you are always right.
Anyone who listens to others, and with logical and factual arguments can show when they are wrong, whilst still maintaining good manners, but also take in corrections or ideas from others, and not play "god", is a winner.
Admin
Admin
moran?!
Admin
Putting in a fake "df" command? I think you didn't go BOFH enough on her ass.
Admin
Drat! I edited out "and was very good at making their point".
Admin
Admin
Man, I love it when someone calls someone else a moran.
Admin
Strangely enough, the DBA was right. In theory. You do get significantly better IO on the outer edge.
It's just something seems to have been going wrong in the more complex real world.
Admin
Sure - you do get better IO on the outer edge, but not enough to make up for having to move the disks head about.
Even just 25% more average movement will kill any advantage gained by this, and a significant additional movement of the heads through the partitioning, will literally trash the performance and hammer the disk so hard, that it will simply fail to perform.
As for latency - toss a coin. 50% chance on average, that the sector to be read/written, will be either in approximately the "right" position, and 50% that it will certainly be in the "wrong" position relative to the head.
Only large buffers on the disks will help solving this, together with elevator sorting ioop ordering performed by the disk itself, as only the disk itself knows where it currently is.
Admin
I hate it when people like that don't get their comeuppance.