Originally posted by "jjeff1"...
I’m exposed to a certain application twice a year. It’s used for a fund raising drive: fifty volunteers man the phones, people call in, and the volunteers take a poll and then enter data into the VB application on their workstations. These fundraising events are tied to schedules beyond our control, and there are absolutely no do-overs. That means the application needs to be rock solid.
The front end is pretty decent. Users can easily tab between fields in the proper order, there is decent help text, and the errors make some sense. But the back end...
Year 1 - the system wasn’t working correctly, and the developers didn’t know why. They fixed it, but the next day, the same problems would reoccur. Turns out that the custom data files they used were stored on a network drive that was accessed by all of the clients. They were somehow using the Archive flag to flag these files for the application. Every night, the backups would run, “corrupting” their data.
Year 2 – they switched to a real database: Microsoft Access. Their development testing went great, but somehow, when those fifty users all logged in for the test, there were problems. We helped them transfer the data into our SQL database and modify the application to use it.
Year 3 - The system kept crashing when any volunteer would log in to it. It worked fine for the developer, though. We discovered their system stored a unique value for each user containing the PC name concatenated with the username. The database field for that value was only 15 characters. Our PC names were 12 characters. The developer’s name was Bob.
Year 4 - Since the application data is used only once a year, it got deleted by cleanup scripts. Did they have a backup? No. Did we have a backup from a year ago? Luckily, yes.
Year 5 - A total re-write. A new database. The database contained dozens of stored procedures for no apparent reason. During testing, they found deadlocks occurring constantly. After blaming the database server (the same one they’d been using for years), they modified/removed most of the stored procedures, upon which the system started operating mostly normally.
As for the “rock solid” part, they’ve never actually had a significant failure during normal operation. But just in case, they have all sorts of high-end failover code built-in. For example, there are hidden keystrokes to invoke panic mode.
In panic mode, the application stops using the database and sends all output to text files on the shared network drive. If that fails – i.e. if the network dies – the application enters double panic mode, where it outputs its files to the local disk. I guess the plan was to use sneaker-net to get the files to the reporting server. I’m not sure what happens in triple panic mode, but it may involve the bottle of scotch in the developer’s briefcase.
As for the other back end, the reporting server ...
They had a web based reporting module. There was a special VB app that ran on a special PC (complete with a special DO NOT TOUCH sticky note) that was responsible for taking data out of the database and outputting to text files. A separate web server would run a number of Perl scripts that grabbed the text files and massaged them into a variety of bizarre formats, some of which are also, yes, text files. From there, corporate would download the text files and put them into some display system for the monitors in their office lobbies.
The first year, the developer asked if the web server had enough horsepower to run the scripts. It turned out that yes, the Dual CPU box could handle 900 hits over a 6 hour time period just fine.
Sadly, the developers behind this system are really nice and, to be honest, have been successful (so far as corporate is concerned) every year. Technically though, they’re in a bit over their heads. Corporate doesn't know or care they're flirting with disaster.