- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
A crazy viral living in Lost Angles?
Admin
I've been hearing a lot about these so called "Relation databases" and I tell you what, I just don't buy the hype.
Admin
I don't know what "hexidecimal" myself ;-)
Admin
Well... some databases - Sybase for one - store their data on raw partitions. That way none of the buffering and such filesystems offer get in the way of an action going to disk.
Admin
I do not like relational databases. I wish filesystems would evolve software the could use a SQL query to scan them for stuff and the ability to do real transactions. It is a horrible WTF that you have two completely different sets of tools to query what data you have lying around.
But, I would likely still not have done this. Filesystems don't do those things yet, and relational databases would still have done them much better. But, I can sympathize.
Wanting my filesystem to be more like a database is a big reason I like reiserfs and wish more work were done on it for Linux.
Admin
Someone will likely get pissed off when i say this, but the way files were copied locally reminds me of how FileMaker used to work.
To the freaks ( yea, I said it ) who don't like relational databases for whatever reason... first, grow up. Second, yea, this could have been done with flat files, but if you're going to do something like that... use flat files... this uses directory structures more than it uses flat files.
There were really two WTFs here, first, not using a database of some kind ( note, it could have been a pretty weak database, freekin' Access might have done the job, MySQL would have kicked this problem's ass ).
Second, relying on the performance of a filesystem with thousands of entries per directory is, of course, the major boner here. The lack of knowledge about operating systems required to make that design mistake truly floors me. Really, WTF... why not just have one file in transactions, one file in each doors directory, one file in balances with a "<id></id><balance></balance>" XML or "000001|0" ( since students are broke, natch ) format ? Because it might actually require thinking ?!?
Oh, and I guess there's the XML->text->binary return communication path as a third WTF, but that wouldn't even be noticed except for use of the directory structure as a per-ID database.
Admin
Admin
Wait... You don't like relational databases, but you want your file system to be one?
Admin
I'd rather be Microsoft.
Admin
Good point.
Perhaps some kind of code review process would be in order. Something where a kind of authority figure (say, a teacher) would review the student's work and assign some kind of performance metric to it. That would be a nice system, but if Universities will ever be bold enough to implement anything like that... only time will tell, I suppose.
Admin
Would you look at that, a quote button...
Admin
Just use reiserfs.
Admin
Even NTFS uses some variation of B-trees and handles large directories (~100000 files) at 0.03 second per an access to a random file. I admit it's not terribly fast but would it choke the application? This WTF system must have had other problems, apart from the disk-based database 'solution'.
Admin
The reason I don't like relational databases is that the create a namespace for stuff that's completely disjoint from the other namespace for stuff supported by the OS. If relational databases could be mounted as filesystems, I'd be much happier, but the transactional semantics of databases don't have a parallel for filesystems, so I doubt they could be mounted read/write that way.
My biggest complaint with relational databases is trying to figure out what tables and databases there are. It's stupid that I can't just 'ls' or 'dir' and find out. Of course if you have some specialized tool like 'toad' it's a lot easier, but why should I need some stupid tool like that to see what I have? Especially since those tools often don't work between different SQL databases.
Admin
I'm still trying to wrap my mind around the logic that lead the original developers to conclude that the data structure should be represented in the directory structure with one record per file. If you're going roll your own, then the textbook solution is to have one record per server. Data structure == subnets. Primary key == IP address. Think of the marketing: Distributed database! Massively parallel processing! Oh, aren't I one of the "latter"?
--Rank
Admin
You've just described a distributed hash table. A data structure that's all the rage in P2P circles nowadays. DHTs are actually pretty nifty. But I'm tired of them being equated with P2P in the academic community. Hey, we get to play with math and mathematical models for stuff and look respectable instead of doing anything to tackle real problems like reputation management!
Admin
WTF?! All relational databases I've worked allow querying the schema as part of their API or, in the case of PostgreSQL, by looking at system tables. If you want an unified API, just stick to ODBC/JDBC. Why is that a problem?
Admin
You might want to take a look at Aqua Data Studio, this is a Toad-like tool that works between different SQL databases. Not as powerfull as Toad, but usefull anyway, and much cheaper.
Admin
The title reminded me of a certain bulletin-board software that was once very prevalent. Having been a member on one that used that system, I was a bit shocked to find the true reason why it took so long (>10 seconds on occasion) to search or post on it.
The tagline: Database? Who needs a database?
Admin
Even magnetic stripes are so last week. Our university has used some rfid like technology for last five years and the local bus company at least ten years.
Admin
Can't believe some people are trying to justify this rubbish design! It has to be the worst piece of 'software' and 'middleware' I have ever heard of!
By the way, there are plenty of libraries for db access and xml messaging (ADO, MSXML for example) so how is it that some muppets believe the crap solution would be more profitable? My betting is it would take a lot longer to develop, is fundamentally crap and about as scalable as a tin can - string telephone.
Some people! I thought professional programmers commented on this site?
Admin
It helps - I had to do some maint. on a hospital db. The response was so slow on a remote PC that I ended up in the main server room. And when I say slow, I mean really, really slow.
When I pointed out that this would cause problems for the users, the network guys just shrugged and said "we only have to make it work, speed is not a requirement" or words to that effect.
Admin
rdbms are not god
if you need a specialized system because your file system can't find 1 file in 10000 in under a second then perhaps it is time to talk to the vendor of your file system.
I put my money where my mouth is I just created a text file containing 1..100000 each on a newline. For fairness I used the same shell on each platform, rc
Create the 10,000 text files :
% time for(n in `{cat 10k}) echo $n > $n
plan9 : 1.81u 13.07s 62.02r
openBSD : 233.34 real 2.10 user 22.21 sys
FreeBSD(with softupdates) : 0.065u 33.091s 0:34.00 97.5% 102+809k 80+0io 0pf+0w
ok, a bit slow to make them all in one go, esp OpenBSD
and then pick one out
% time cat 765
765
plan9 0.00u 0.00s 0.04r
OpenBSD 0.03 real 0.00 user 0.00 sys
FreeBSD 0.000u 0.001s 0:00.00 0.0% 0.0k 0.0io 0pf+0w
I won't bore you with loads of other values for the filename, they are all the same kind of times.
One would imagine that the worst case scenario was :
time cat notfound
plan9 :
cat : can't open notfound: 'notfound' file does not exist
0.00u 0.00s 0.02r
OpenBSD :
cat: notfound: No such file or directory
0.03 real 0.00 user 0.00 sys
FreeBSD :
cat: notfound: No such file or directory
FreeBSD 0.000u 0.002s 0:00.00 0.0% 0.0k 0.0io 0pf+0w
Ok, now lets see how that compares to Postgresql
No Indexes
sql = a file containing :
create table test10k;
insert into test10k (val) values('1'); ... insert into test10k (val) values('10000');
time psql < sql > /dev/null'
OpenBSD
121.24 real 1.10 user 0.44 sys
FreeBSD
0.802u 6.234s 0:47.62 171.276k 0+0io 0pf0w
OpenBSD :
explain analyze select val from test10k where val='1';
QUERY PLAN
------------------------------------------------------------------------------------------------------
Seq Scan on test10k (cost=0.00..139.56 rows=34 width=32) (actual time=0.096..25.487 rows=2 loops=1)
Filter: ((val)::text = '1'::text)
Total runtime: 25.676 ms
(3 rows)
FreeBSD
explain analyze select val from test10k where val='1';
QUERY PLAN
------------------------------------------------------------------------------------------------------
Seq Scan on test10k (cost=0.00..22.50 rows=5 width=32) (actual time=0.295..29.141 rows=1 loops=1)
Filter: ((val)::text = '1'::text)
Total runtime: 29.611 ms
(3 rows)
With Indexes
Same thing with this index added prior to insertion
create index test10k_val on test10k(val);
time psql < sql > /dev/null'
OpenBSD
102.53 real 1.03 user 0.35 sys
FreeBSD
0.953u 5.305s 0:46.17 173.279k 0+0io 0pf0w
OpenBSD
------------------------------------------------------------------------------------------------------------------------
Index Scan using test10k_val on test10k (cost=0.00..87.80 rows=28 width=32) (actual time=0.172..0.181 rows=1 loops=1)
Index Cond: ((val)::text = '1'::text)
Total runtime: 6.386 ms
(3 rows)
explain analyze select val from test10k where val='1';
QUERY PLAN
-----------------------------------------------------------------------------------------
Index Scan using test10k_val on test10k (cost=0.00..17.07 rows=5 width=32) (actual time=1.976..1.996 rows=1 loops=1)
Index Cond: ((val)::text = '1'::text)
Total runtime: 2.468 ms
(3 rows)
Conclusion :
For the simple task of storing 10,000 values the standard file systems are no slower for returning a single value, and without indexing they are two orders of magnitude faster than using an RDBMS and with an RDMBS they are still evenly matched with OpenBSD returning files faster via its own file system than through Postgresql.
Admin
You people who would rather get it right need to get out more often.
IBM, SAIC, CSC, Accenture, EDS... the list just goes on and on and on. A reputation for screwing things up is no impediment to getting another job. In fact, a reputation for screwing up big things helps, because then the person awarding the contract knows you have experience doing big things.
This is not just cynicism. Ask anyone from SAIC if their well-publicized fiasco at the FBI helped or hurt revenues. Or, how about CSC and their IRS project? As far as I know, Accenture has never succesfully implemented anything in the 40 years since Anderson formed the division, but I keep hearing about them.
Admin
My biggest complaint with filesystems is trying to figure out what files and directories there are. It's stupid that I can't just look for them and find out. Of course, if you have some specialized tool like "dir" or "ls" it's a lot easier, but why should I need some stupid tool like that to see what I have? Especially since those tools often don't work between different filesystems.
-dZ.
Admin
Interestingly enough, there is a filesystem that's been designed to be optimal for just that very sort of setup (filesystem is the database). It's called reiserfs, and unlike ext2/ext3 (which is what their system probably used) it does NOT bog down like crazy if you put 100,000 files in a single directory.
Admin
"Just don't expect them to know what "hexidecimal" is..."
As opposed to not knowing how to spell it. Dolt. If you are going to rip on somebody,
make sure you are actually better than those you claim are inferior.
Unless of course, you are an ignorant bigot.
Admin
I was wondering the same thing!
Admin
It doesn't matter. The client will never understand a) what a failure the original project was or b) what a heroic job the second company did for a third the price. They'll probably use the second company again, but they'll expect to pay less for every future job than they did with the first company.
Furthermore, in a year we'll see a post about what a terrible job this second company did with their database design, and what a great job consultant X did for a fifth the price of the last iteration.
Admin
The one that got it right. What's money got to do with it?
How many people would you kick out of their houses if every one raised the highest score you'd ever gotten in pacman by 20,000 points?
Money is a means to an end, not an end, and if money trumps ethics, there's just no point; why even bother breathing in such a world? It's one thing for such people to exist, but being one? Eww. I'll pass.
Admin
After 7 years doing quality assurance engineering and 10 years of on-and-off IT experience, I'm finally deciding it's time to give engineering a go.
This site, and stories like this one were a great incentive. (well, that and QA'ing other folks code).
Why? Because, while I'm deathly afraid of "doing it wrong", I see that there are LOTS AND LOTS of folks out there who, quite obviously, don't have a frickin clue and are paid gobs of money to do it. Wow. I know I could have done better. Now. With only my rudimentary C, C++, perl and relational db skills...
(of course, I also already "believe" in source control, automated builds, unit testing, etc etc etc...so I suppose that gives me an unfair leg up..)
Admin
Were the original "programmers" former students?
No, they were teachers
Admin
WTF???
Admin
How can an obsolete system be written in Java? It has only been available to the public since 1994. These idiots wrote a brand new obsolete system. That is the WTF.
Admin
There are bigger problems than transactions. Filesystems don't have any good semantics for non-hierarchical structures. You could have a "directory" for "tables", "views", "stored procedures", "user-defined functions" and so on and so forth, and then just dump everything into each of those, but by failing to expose relationships between the data you'd lose all of the good stuff about relational databases. Sure, you could just have a bunch of flat tables without any internal linking, but it doesn't seem very useful. There's just no good filesystem analogue to a foreign key, for example.
This is not an easy problem to solve. I am interested in how WinFS will deal with it when MS get it finished. This will be a database mounted as a filesystem, with a number of other features on top (to synchronize between the filesystem "world" and the database "world", for example, so that changes made in one "view" propagate to the other). Even prior to WinFS, the next NTFS version (in Vista/Longhorn Server) will have a transactional system so that changes to the filesystem and registry can be transacted. I haven't played with it yet; I wonder how well it works and how fancy-pants it is (whether it supports nested transactions for example).
This is probably why ANSI have devised the INFORMATION_SCHEMA object. It provides a standardized way of querying a database to find out about its internal structure.
Granted, you still need to use a DB-specific sql front-end (though you could feasibly write one using ODBC or JDBC or OLE DB or whatever other DB-agnostic API you prefer), but once you're submitting queries this stuff is /quite/ standard (though needless to say no vendor is 100% compliant with ANSI, unfortunately).
Admin
Well, that's either misleading or wrong.
Not all DB's store their data on the OS's filesystem. Some directly manage their own disk space. Some let you choose between the two. (For example, DB2 referes to the former as 'system managed' and the latter as 'database managed'.) You could call a database managed partition a specialized filesystem -- but even so, it's specialized for a reason.
Even when databases are using the filesystem, they use it in far clever-er ways than what this story describes. Most notably, they don't try to leverage the directory structure as their indexing scheme (which is the big WTF here).
Admin
You would do better to wrap this in [<moron>] tags.
Admin
Admin
i can just imagine that someone who would create such a terrible system would probably do some other stupid things along the way, like setting the account balance to be unsigned or something - now THERE would be some fun:
"Hey man, can you print me out a copy of this?"
"Ok, but i think i'm running low on credit..."
<takes account to below 0>
Account Balance: $4,294,967,295.90
Admin
The real WTF is ignoring the obvious solution, which has been included with every Windows release since 95.
Why create all these little files when you can just store everything in the Registry?
Admin
Yes, and they would NEVER THINK to accidentally credit their own account... Or their friends. Or lock someone out of the building as a joke. Or... [:D]
Admin
What's "hexidecimal"? Seriously, I've heard of "hexadecimal", but this must be sumthing new...
Admin
I was merely referring to a previous post
I'm sure you'll love it ;)
Pardon me for not spelling HEXADECIMAL correctly
Admin
Admin
Admin
That's twisted and evil... and what's more, it's probably already been done by someone.
Admin
That would greatly simplify their need to copy all the files around to the node servers. Instead, they could pass around a .reg file to import on each and every node server to update for changes...
Admin
man join
and you may be interested that there is a file system based OS, it is called plan9 : http://plan9.bell-labs.com/plan9
Built by the originators of Unix with one of the premises being :
<font size="-1">"We have persistent objects, they're called files." - Ken Thompson</font>
Admin
I don't see anything particularly wrong with this approach. There are actually several advantages, which are roughly the same as the advantages of using the Maildir format for mailboxes. The problem is that there are too many files in a directory, so you get crappy performance, but you can get around that easily enough (at least on Linux) by using a filesystem that is designed to handle large directories, such as XFS or Reiserfs.
Admin
Have you ever tried directory services? LDAP servers are a lot like hierarchical filesystems, but they support querying, transactions, listeners, etc. Their perfomance is up to par with relational databases since most use a relational database as a backend.