- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
meh meh
Admin
I would be first. But after the confirmation page my comment wasn't there.
Admin
(6 minutes later) First!
Admin
Time to work out, OP, time to work out!
Admin
Lisa had already fixed the problem - INCORRECTLY
the correct approach would have been to set a flag preventing deletion until the confirmation had been generated and viewed.
Man with one watch always knows he time, man with two watches is never quite sure
Admin
Lisa has the best fixes.
(Better fix: log the transaction, read the log to generate the confirmation page)
Admin
Yep, if your "solution" to a race condition is just jamming in a delay somewhere, all you're doing is kicking the can down the road, as Lisa found out. Eventually your application will be a horrible tangle of conflicting waits that are impossible to debug when the timebomb inevitably explodes. As IP Guru says, the only real way to solve a concurrency problem is to build in positive confirmation that whatever was supposed to happen actually happened.
Admin
Ohh really and what if the user never ends up requesting that conformation page. A much harder to control issue than making sure the clock is in sync (or making sure your timer relies on just 1 clock or runs as an change-interval).
Your 'solution' is just as flawed if not more so since in your solution the data would stay there forever and the transaction would never be executed at all. In that case I prefer a buggy confirmation page
In a greenfield situation and recognizing this issue upfront you would most likely model the flow differently. For instance caching the confirmation text for a limited time could have been an option.
Admin
Several here have given options (Log, flag, etc). Another method is to have the DB tell the backend which records to move (Alter the SELECT statement to filter out anything not ready to move). With proper coding this happens in a Stored Procedure... wait, which website is this again?
Admin
That's absolutely a situation that happens. Some users are using a cell phone, some have a bad connection, etc. It's entirely possible (and does), for a connection to drop between paying for the document and seeing the confirmation page. Setting a flag would result in missing applications for these users.
Admin
So... a flag?
Admin
Lots of backseat driving here when TRWTF is having any server without NTP sync.
Admin
Another port and service to run on government-owned machines? Machines of unknown quantity and purpose with this level of legacy-ness? No thanks!
Admin
In fact, TRWTF is comparing timestamps created by different machines. The database is the single source of truth, and database time should be the single source of time!
Admin
I'd say there is more than one issue here, the biggest one being: why is the taxpayer paying someone who isn't capable of setting up NTP sync on a government owned server, especially since there are so many time sensitive components on the system, certificates being one, services etc.
The other issue is that this was implemented in a way that was more convenient in terms of development effort vs using proper development principles that would be independent of time (state machines come to mind).
Admin
Another solution would be to COPY the data from the front-end DB to the back-end DB, then erase from the front-end DB only the PII (or obfuscate, anonymise, etc.), leaving generic data such as transaction number, amount, etc.
Then the record would always be there to display.
Admin
Then the frontend would fill up with records and the application would get slow and laggy and somebody would have to push the 'Turbo' button on the computer they are using for a server.
Admin
Takes more than a ntp server when it's a VM, you have to get something in there like VMware tools to avoid clock drift.
Admin
Dear Jane In some countries pets (or their owners) don't need any licenses. Still, they all live. Mrs. Benz didn't need a license to safely drive a car. And let's not forget about ancient Egyptians, who build the pyramids without any certification AFAICT. So no, they don't "have to". You're forced to "have to". And to think this is the way it has to be.
HTH Michał
Admin
This is not a political forum.
HTH.
Admin
What happens when the backend moves to a new timezone.
Admin
Wait, what? When did that happen?
Admin
+1 for the old-school "turbo button" reference
Admin
In that case, you (the server) rely upon a timeout.
Say, a five minute timeout.
A five minute timeout that applies on your own machine, the server, not a random timeout that relies upon some spavined "universal clock" out there.
You see where I'm going with this? (Probably not.)
Admin
As to the TNP, there was in fact one (well, several in each layer of the network). But policy (not unreasonably), is to have everything locked down by default, e.g., no incoming/outgoing connections blocked. Each one is opened up as needed. What was missed was opening up the port to the TNP.
As to stubbing the data, yes, we actually do that for a portion of the app but even that is short lived due to a (again, not unreasonably given it's people's data and a government system), very stringent data protection policy, this too is a limited solution in this case.
Admin
**correction: that should read "ALL incoming/outgoing connections blocked"
Admin
Yeah, we had a customer who complained that the time stamps in our database were different depending on which time zone their users were in. We did an analysis and some of the time we'd get the time from the OS on the user's workstation and some of the time it was retrieved from the DB server. Obviously the DB server is correct but about 75% of the calls were to the OS. So we told the customer, sorry, this is how it works. Management determined that it wasn't worth the time to fix it.
Admin
He's the kind of guy that when someone says "but the program has to support incoming faxes", he goes off on how they didn't even have faxes in the old days and people got along just fine, and in fact we don't need computer programs for anything. Man has lived for eons without computers or electricity or any of the creature comforts we have today.
Admin
We have an API that uses an HMAC for verification, on of the fields that we ask to be passed is the date that the system generated the request, we then invalidate requests we deem to be too old.
This was great until we expanded to a different time zone and no one could figure out why all of the requests were being bounced!
Admin
I usually sit on my backend when I'm passing into another timezone
Admin
UTC is your friend, we do the same, using UTC without issue (I think there were some early when in development, but resolved quickly)
Only issue we had was a testing on an iPad that wasn't set to auto-sync. The time eventually drifted more than two minutes out, so we had authentication failures.
Why wasn't it set to auto-sync? To test the timeout of course!
Admin
NTP is great - except in one case it went bananas (some odd interaction with the virtual machine whatsits) and my server ended up several hundred years in the past.
Clearing up the database was ... Fun?
Admin
Several hundred years? Wow!
Admin
NTP should be one of these exclusion cases, there no reason all the servers are not using NTP, if only, to permit proper debugging.
Admin
I don't understand why you think not deleting the flag to say "show a confirmation page" that's been set after the transaction is processed would somehow stop the transaction being processed.
Admin
I can think of a couple of reasons for a government server not to use NTP by default: https://www.cvedetails.com/vulnerability-list/vendor_id-2153/NTP.html
Admin
The confirmation page requires the data. The transaction removes the data. The transaction cannot proceed until the confirmation page has unset the flag.
Admin
You are a moron if you think either that NTP is not essential, or that those CVEs are remotely relevant for ntpd running as an NTP client.
As per the OP, the server was using NTP anyway (which, for some reason, she is calling it "TNP", I'm guessing she means NTP and not picric acid), they had misconfigured the firewall to block responses. So yeah, typical government IT.
Admin
https://en.wikipedia.org/wiki/Etiquette_in_technology#Netiquette
Admin
Pre-generate the confirmation page, then delete the data. When the user requests the confirmation page they'll see the pre-generated page even though the data's probably already gone. Have a timeout to delete the pre-generated page in case the user never requests it.
Admin
The real WTF is not implementing some sort of lock to ensure that the data isn't deleted from the frontend database until it's no longer needed.
Also just because a database is "temporary" doesn't provide an excuse to make it insecure.
Admin
Sure the frontend DB would grow, but you could easily create a maintenance task to control that. For example, set a TTL on complete records so after they are pushed to the backend DB they are pruned from the frontend DB in 48 hours (or whatever turns out to be a suitable lifecycle).
Your database would be larger than it is currently but the TTL would keep it at a fairly stable size (assuming consistent traffic). I have a few PII collections which use TTLs to ensure we don't fall foul of compliance by expiring data before it could become an issue if it isn't cleaned before then.
Admin
No no no.
What you missed is that any form of brute-force timeout is completely the wrong solution for a state-based workflow like this. What you needed was a proper message-based protocol.
It's depressing that useless dingbat hacks like this are used anywhere at all in a server-based IT system these days. I suppose it's no surprise that they are used in GovIT, and the resultant cretinism is actually defended by the ignorant fools in charge.
Admin
Absolutely, trying to "Fix" a race condition by shoving lead boots on one of the participants is just storing up trouble for later, and making it all the more confusing for the poor sod who has to deal with it then.
I've seen it in the private sector too, but then I guess firms that routinely tolerate hacky crap like this gradually drift out of business as their IT fails to deliver, and I've seen that too.
More poor to crappy govt IT, been feeding my family fixing* that shite for years now.
Admin
https://www.youtube.com/watch?v=pnq96W9jtuw
Admin
If the server was an IBM mainframe it might be impossible to change the time without rebooting.... or buying an incredibly expensive piece of add on equipment.
And the initial time might be set manually by the operators doing the reboot.
There's probably a lcd display blinking 00:00.
(There's a tiny bit of of justification for the design, since the hardware guarantees that no two timestamps can be the same - the least significant bits of the timestamp are uniquifiers - but when you're trying to align logs, write code to parse the binary times, with all the leap seconds, can't figure out WTF is going on, then realize that it's a surreal time clock, there's a tiny bit of justification for breaking the designers on the flywheel).
Admin
As engineers, we (all of us), have to have a certain amount of pragmatism when applying any kind of solution. There are always situations where non-optimal solutions are put in place as a stop gap to keep things running while a more appropriate solution is created. The story above is one such situation...and it worked for the time it was implemented. We did eventually rolled out the stubbing approach but the "kick the can down the road" solution worked well enough in the interim (zero timed out applications - it demonstrably did SOMETHING right), to buy time to address the issue more appropriately.
As to the message based protocol - it IS a message based protocol. However, "message end" is determined to be when payment is taken (for reasons already outlined by others here), and the confirmation page was left a little in limbo. Hence the need for a timeout/stub to display the info to the user while the "message" was being moved and processed.
Admin
Yeah, that should work. Let's see, how long to make the timeout? Five minutes seems pretty reasonable to me...
Admin
Lisa's fix was stupid! She should connect the process of moving the data and generating the confirmation page so that the data will be moved because the confirmation page is done being generated instead of because some arbitrary time lapse has passed! Her way of "fixing" this problem leads to a fragile system!
Time for Lisa to stop playing around and go back to the kitchen!
Admin
"Stupid" would perhaps in this case be defined as;