Comment On Scratch One Inevitability

Before Curtis even got to sit down at his desk, he was accosted by a frenzied, sweating junior developer. "OhmygodCurtis," he began. Curtis extended his hand in a "calm the hell down" gesture and allowed him to continue. "A whole bunch of our stores had no data posted last night and I'm not sure why orwhat to doabout it or whoIshouldtalktoand-" Curtis gestured again, to which the developer handed him a thin stack of papers. After a deep breath, the developer continued. "It's a list of the stores that didn't post last night." [expand full text]
« PrevPage 1 | Page 2Next »

Re: Scratch One Inevitability

2009-01-15 11:05 • by KattMan
OK I'm confused.
As the code is written it does not go against a server that never existed, it goes against an apparently configurable variable that holds a server address. Is there more to this that we are not getting? Where is the WTF?

Re: Scratch One Inevitability

2009-01-15 11:13 • by Downfall (unregistered)
"Of course, management decided that this was the system admins' fault, for deleting the stale DNS entry. They should've known better!"

Yeah, pretty much. If something is wrong but sitting there and not hurting anything, leave it be. How many WTFs spring from trying to 'optimize' things already working fine?

If they insist on 'fixing' it, though, then perhaps they should have done some tests first. On the other hand, that wouldn't have the adrenaline rush of deploying untested changes in a production environment.

Re: Scratch One Inevitability

2009-01-15 11:15 • by RHuckster
The real WTF is 75% of the locations still use dial up.

Re: Scratch One Inevitability

2009-01-15 11:15 • by clickey McClicker (unregistered)
At least it wasn't a driver issue :-)

Re: Scratch One Inevitability

2009-01-15 11:17 • by DeLos
Doesn't the fact that they then used a "Request timed out" indicate that they knew the server didn't exist? Doesn't that make this implementation very very silly? Why not just ping a stable known server? Why not check for time out and successfull ping?
WTF

Re: Scratch One Inevitability

2009-01-15 11:18 • by Joe the Programmer (unregistered)
239299 in reply to 239295
The real WTF is 75% of the locations still use dial up.


Many of these locations are little kiosks (more like mini cubicles) in the aisles of shopping malls for only a few weeks/months out of the year. A cheap, seldom-used dial-up connection would be simpler and cheaper than putting a broadband connection into a booth that's not going to be used very many times during the day and for only a few weeks/months of the year...

Re: Scratch One Inevitability

2009-01-15 11:18 • by Hallvard (unregistered)
239301 in reply to 239291
KattMan:
OK I'm confused.
As the code is written it does not go against a server that never existed, it goes against an apparently configurable variable that holds a server address. Is there more to this that we are not getting? Where is the WTF?


Presumably if the nonexistent server was in DNS, the ping would try to use it and time out, and the function would return broadband. Removing the server from DNS would cause the ping to fail immediately, and the function would throw an exception.

Re: Scratch One Inevitability

2009-01-15 11:19 • by campkev
239303 in reply to 239291
KattMan:
OK I'm confused.
As the code is written it does not go against a server that never existed, it goes against an apparently configurable variable that holds a server address. Is there more to this that we are not getting? Where is the WTF?


The configurable variable was set to the name of a server that did not exist, but that was put in the DNS. If they tried to ping it and it couldn't find the host, then obviously they were on dialup and needed to activate the modem. If they were able to find it, but the ping timed out, obviously they could connect to the dns to get the data, so they didn't need to activate the modem. Brillant!!!

Re: Scratch One Inevitability

2009-01-15 11:22 • by kastein
Removing stale DNS entries is a good thing - however, they should have enabled logging on bind and searched through the log after a few days for anything requesting those systems before removing the records. Not like I've ever done this, no one does, but it'd be a good idea. Seriously, can you expect them to somehow know that removing a DNS record that no longer points at a server will break things? Most good DNS/network management tools remove all A/PTR/CNAME records from the zone file automatically when you delete the machine from the database anyways - at least, NetReg does.

EDIT: also, the Real WTF (shoot me for using that phrase) is that they were using ping to test the link, instead of say, opening a TCP connection. A /lot/ of firewalls, routers, etc block ICMP, there is no real guarantee that it will go anywhere.

Re: Scratch One Inevitability

2009-01-15 11:29 • by Steenbergh (unregistered)
239305 in reply to 239294
Downfall:
"Of course, management decided that this was the system admins' fault, for deleting the stale DNS entry. They should've known better!"

Yeah, pretty much. If something is wrong but sitting there and not hurting anything, leave it be. How many WTFs spring from trying to 'optimize' things already working fine?

If they insist on 'fixing' it, though, then perhaps they should have done some tests first. On the other hand, that wouldn't have the adrenaline rush of deploying untested changes in a production environment.


Oh yes, the rush!
I do all my development on live enironments!

CAPTCHA consequat: Yes, that's what I'm talking about!

Re: Scratch One Inevitability

2009-01-15 11:30 • by biziclop (unregistered)
239306 in reply to 239304
kastein:
Removing stale DNS entries is a good thing - however, they should have enabled logging on bind and searched through the log after a few days for anything requesting those systems before removing the records. Not like I've ever done this, no one does, but it'd be a good idea. Seriously, can you expect them to somehow know that removing a DNS record that no longer points at a server will break things? Most good DNS/network management tools remove all A/PTR/CNAME records from the zone file automatically when you delete the machine from the database anyways - at least, NetReg does.

EDIT: also, the Real WTF (shoot me for using that phrase) is that they were using ping to test the link, instead of say, opening a TCP connection. A /lot/ of firewalls, routers, etc block ICMP, there is no real guarantee that it will go anywhere.


It doesn't have to though, does it? They don't really open a connection to anything but their DNS server. Which is what they should've done in the first place.

Re: Scratch One Inevitability

2009-01-15 11:32 • by Capt. Obvious
239307 in reply to 239298
DeLos:
Doesn't the fact that they then used a "Request timed out" indicate that they knew the server didn't exist? Doesn't that make this implementation very very silly? Why not just ping a stable known server? Why not check for time out and successfull ping?


The purpose of the function is to distinguish which connection type you have, not its speed. Why that's important, I do not know.

The code, as written, implies that the error messages are different when pinging a non-existent server. That is, the dialup would return "Ping request could not find host..." and the broadband would return "Request timed out".

If true, this would be more robust than pinging a known stable server and guessing based on ping time... especially if 2000 locations want to do it at the same time.

That said, there certainly has to be a better way. If you really care, surely an OS call could determine if it was using the ethernet or a modem port? Or "not caring" seems like a good way too.

But, the most WTFy is failing when and only when the connection doesn't.

Re: Scratch One Inevitability

2009-01-15 11:40 • by Anon (unregistered)
239309 in reply to 239307
Capt. Obvious:

The purpose of the function is to distinguish which connection type you have, not its speed. Why that's important, I do not know.


Actually the function returns what connection type you have, but, as a side effect activates the modem if you're on dial-up.
Really, it a case of a badly named function because what if you're on dial-up and you call the function twice? Presumably it'd come back on the second try and tell you that you're on broadband (assuming the modem connection wasn't shut down in between calls).

Re: Scratch One Inevitability

2009-01-15 11:49 • by Maurits
239311 in reply to 239304
kastein:
A /lot/ of firewalls, routers, etc block ICMP, there is no real guarantee that it will go anywhere.


This strategy works just fine even if ICMP is blocked.

Re: Scratch One Inevitability

2009-01-15 11:51 • by puzzled (unregistered)
Hang on a minute, I'm a bit confused here (it's not hard to confuse me!!)

A listener program runs on the server - why would that care if the sender (store\outlet) used broadband or modem?

the sender opens the connection, sends the data (by whatever means) the listener listens & acts upon the data.

It's the client software that needs to know if it needs to dial-up or is already connected by broadband.

So is this util function that they found on the server or on the client? If it's on the client then why not just use a config setting for whether it's broadband or not.

If it is on the client they updated all 2,000 stores the day before????

Re: Scratch One Inevitability

2009-01-15 11:55 • by Anonymous (unregistered)
***A /lot/ of firewalls, routers, etc block ICMP, there is no real guarantee that it will go anywhere.***

Sort of off topic, but unless the universe has changed recently, blocking ICMP is incompatible with Path MTU Discovery (RFC 1191). Google "Blackhole Routing" to find out why it is not a good idea to mess with Path MTU Discovery. Bottom line: Unless you like debugging really weird message size related transmission problems, donna block the ICMP.

Re: Scratch One Inevitability

2009-01-15 12:00 • by configurator (unregistered)
This function finds out whether the internet connection is broadband or dial-up. I guess because before needing to connect, you have to start the dial-up connection. Also, when the dial-up is already connected, this will in fact return Broadband because the non-existent server can technically still be reached (i.e. there is a gateway for its IP, the default gateway)

Re: Scratch One Inevitability

2009-01-15 12:16 • by Andy Goth
239317 in reply to 239315
configurator:
This function finds out whether the internet connection is broadband or dial-up.
Why is this autodetected and not simply a configuration variable in the same way that HQ_SERVER_NAME is?

Re: Scratch One Inevitability

2009-01-15 12:16 • by mouse (unregistered)
239318 in reply to 239313
Anonymous:
***A /lot/ of firewalls, routers, etc block ICMP, there is no real guarantee that it will go anywhere.***

Sort of off topic, but unless the universe has changed recently, blocking ICMP is incompatible with Path MTU Discovery (RFC 1191). Google "Blackhole Routing" to find out why it is not a good idea to mess with Path MTU Discovery. Bottom line: Unless you like debugging really weird message size related transmission problems, donna block the ICMP.

I can't speak for "most" routers, but my router only drops ICMP ping requests.

Re: Scratch One Inevitability

2009-01-15 12:25 • by Jezebel (unregistered)
239320 in reply to 239317
Andy Goth:
configurator:
This function finds out whether the internet connection is broadband or dial-up.
Why is this autodetected and not simply a configuration variable in the same way that HQ_SERVER_NAME is?


Because the type of connection is doesn't matter. It just indicates that there is no connection established at the time it is called. Meaning you either have to fire up the modem (or start beeping until the operator plugs the damn cable back in)

Re: Scratch One Inevitability

2009-01-15 12:31 • by Asiago Chow (unregistered)
The function name is wrong.

The code checks to see if there is at least enough internet connection to resolve a DNS entry. If there is, you do not need to run the dialer to establish a connection.

The test condition would be met if someone had manually accessed the 'net (e.g. to surf) before the software tried, if the system was on a LAN with an always-on dialup or ISDN type connection, etc.. Use of broadband is just the most likely explanation.

There are better ways to do this test (resolv instead of ping), there are more reliable test subjects (google.com instead of some dummy DNS entry), and there are better tests (you can check the interfaces in windows) but the method they used is not insane.

IT shouldn't have cleared the DNS entry. They should've passed around a list of DNS entries that are slated for deletion.

Re: Scratch One Inevitability

2009-01-15 12:39 • by Downfall (unregistered)
At first, I thought my confusion at this WTF was just me. Upon reading the thread, though, I'm convinced that it just doesn't make sense. Is this a case of the anonymization ruining the story?

Re: Scratch One Inevitability

2009-01-15 12:42 • by Charles400
All your base are belong to us.

Re: Scratch One Inevitability

2009-01-15 13:03 • by Zygo (unregistered)
239334 in reply to 239304
kastein:
also, the Real WTF (shoot me for using that phrase) is that they were using ping to test the link, instead of say, opening a TCP connection. A /lot/ of firewalls, routers, etc block ICMP, there is no real guarantee that it will go anywhere.


If by "opening a TCP connection" you mean "verify the server's SSL certificate against a CA root key owned by your IT department" then I agree...on the other hand, that's really only necessary if you are 1) in a kiosk or mobile environment, 2) connecting to the nearest unsecured wireless AP, and 3) next to a Starbucks or similar bait-and-switch style hotspot.

Re: Scratch One Inevitability

2009-01-15 13:04 • by campkev
239335 in reply to 239317
Andy Goth:
Why is this autodetected and not simply a configuration variable in the same way that HQ_SERVER_NAME is?

Because if it was, it wouldn't be on this site.

Re: Scratch One Inevitability

2009-01-15 13:07 • by Zygo (unregistered)
239336 in reply to 239304
kastein:
Removing stale DNS entries is a good thing - however, they should have enabled logging on bind and searched through the log after a few days for anything requesting those systems before removing the records. Not like I've ever done this, no one does, but it'd be a good idea.


I actually do do this--both logging of DNS queries and also logging TCP connections to ports and IP addresses I think are not in use using OS-level firewall tools. It works great on intranets, LANs, and VPNs.

The trouble is, with public DNS records you'll get several dozen queries a second (OK, I'm exaggerating...slightly) from spammer botnets all over the world. I doubt I would have been able to spot it.

Re: Scratch One Inevitability

2009-01-15 13:10 • by Crash Magnet (unregistered)
239337 in reply to 239301
Is this a case of:

TRUE
FALSE
FILE_NOT_FOUND

Re: Scratch One Inevitability

2009-01-15 13:18 • by cconroy
Third simplest possible explanation... I'm in my own personal hell. So far the best theory Curtis had.


Occam's Razor #3, right as usual.

Re: Scratch One Inevitability

2009-01-15 13:19 • by cdosrun
The stores that had posted data seemed to have nothing differentiating them from the stores that hadn't posted data – regardless of how long the location had been there, how close it was to another location that had posted data successfully – there was nothing.


Except method of connecting to the internet. Not his fault, no one would think of checking that when troubleshooting this sort of problem.

Re: Scratch One Inevitability

2009-01-15 13:24 • by JamesQMurphy
239346 in reply to 239323
Asiago Chow:
The function name is wrong.

Agreed. A better name for what this function does is ConnectIfOffline(). And as someone pointed out, after a successful connection, you call the function a second time, the function would return ConnectionTypes.Broadband, regardless of the actual connection type.

Asiago Chow:
There are better ways to do this test (resolv instead of ping), there are more reliable test subjects (google.com instead of some dummy DNS entry), and there are better tests (you can check the interfaces in windows) but the method they used is not insane.

Right, like the InetIsOffline API.

Asiago Chow:
IT shouldn't have cleared the DNS entry. They should've passed around a list of DNS entries that are slated for deletion.

True, but I bet dollars-to-donuts that nobody would have remembered that this function referenced one of the DNS entries. At least IT would have had their asses covered.

Re: Scratch One Inevitability

2009-01-15 13:42 • by Downfall (unregistered)
239350 in reply to 239344
cdosrun:
The stores that had posted data seemed to have nothing differentiating them from the stores that hadn't posted data – regardless of how long the location had been there, how close it was to another location that had posted data successfully – there was nothing.


Except method of connecting to the internet. Not his fault, no one would think of checking that when troubleshooting this sort of problem.


I laughed, and almost spit coffee on my keyboard. Excellent point.

Re: Scratch One Inevitability

2009-01-15 13:46 • by Les (unregistered)
Isn't the real problem that they didn't name the invalid DNS entry "DoNotDelete.company.tla"?

Re: Scratch One Inevitability

2009-01-15 14:39 • by brouski (unregistered)
239359 in reply to 239342
It's always the last one you try, isn't it...

Re: Scratch One Inevitability

2009-01-15 14:48 • by Asiago Chow (unregistered)
239361 in reply to 239346
JamesQMurphy:
Asiago Chow:
There are better ways to do this test (resolv instead of ping), there are more reliable test subjects (google.com instead of some dummy DNS entry), and there are better tests (you can check the interfaces in windows) but the method they used is not insane.

Right, like the InetIsOffline API.


I've never needed to use that API, but... googling around shows it returns false negatives. Looks like it only checks the status of designated internet connections instead of checking for internet connectivity. Is that true? If so they may have started there and switched to DNS lookup testing because it better fits their problem. They don't care whether an "internet connection" is "connected" per the OS, they care whether they are able to talk to the 'net.

I'm not patting the coder on the back here but it doesn't seem totally crazy.

Re: Scratch One Inevitability

2009-01-15 15:38 • by commonsense (unregistered)
I can't believe no one has said this yet, but if you're going to use this method to determine if you have a live internet connection, why not ping the server you're trying to send the data to, instead of some random address that doesn't exist?

If I can't get where I'm trying to go, hmmm, maybe I need to turn on the modem.

Re: Scratch One Inevitability

2009-01-15 15:40 • by Derp (unregistered)
239365 in reply to 239335
campkev:
Andy Goth:
Why is this autodetected and not simply a configuration variable in the same way that HQ_SERVER_NAME is?

Because if it was, it wouldn't be on this site.


No, because HQ_SERVER_NAME is a CONSTANT that doesn't change from installation to installation, but the need to fire up a modem does.

Re: Scratch One Inevitability

2009-01-15 16:03 • by hatterson
If I'm reading it correctly wouldn't this just cause all sites to act as if they're on dial-up.

If I ping temphost.mydomain.com and there's a DNS entry pointing to nothing I get "Request time out"

If I then delete the DNS entry and ping temphost.mydoamin.com again I should receive "Ping request could not find host..."

I also receive "Ping request could not find host..." when I can't hit the DNS server.

Re: Scratch One Inevitability

2009-01-15 16:06 • by hatterson
239371 in reply to 239365
Derp:
campkev:
Andy Goth:
Why is this autodetected and not simply a configuration variable in the same way that HQ_SERVER_NAME is?

Because if it was, it wouldn't be on this site.


No, because HQ_SERVER_NAME is a CONSTANT that doesn't change from installation to installation, but the need to fire up a modem does.


Uhhh, what?

Why would having SITE_CONNECTION_TYPE as a configuration *variable* not be a good thing for something that changes with each site?

Re: Scratch One Inevitability

2009-01-15 16:08 • by real-modo (unregistered)
239373 in reply to 239352
Les:
Isn't the real problem that they didn't name the invalid DNS entry "DoNotDelete.company.tla"?

There seem to be three root causes (of any WTF):


1. Believing your problem to be unique, and reinventing the wheel rather than using proven code, data structures or algorithms.

2. Poor naming - in this case especially the DNS entry (but also the function name).

3. Multiple process failures.

Prevent any one of these and your WTFs don't reach production. However, they like travelling as a group - the three horsemen of the WTF - and they bring us so much schadenfreude.

Re: Scratch One Inevitability

2009-01-15 16:09 • by Patrick (unregistered)
239374 in reply to 239295
RHuckster:
The real WTF is 75% of the locations still use dial up.


I don't see why. It's cheaper and if all they're doing is uploading some data once a night is it really necessary? Some of these stores are set up as temporary booths in other retail establishments, and it may not be convenient to use.

Re: Scratch One Inevitability

2009-01-15 16:09 • by Anonymous Coward (unregistered)
239375 in reply to 239305
Steenbergh:

Oh yes, the rush!
I do all my development on live environments!


You laugh, but I do most of my development in a production environment. Yeah, it sucks when things go wrong. But I sit right there with the people. So they're quick to crucify me should anything go wrong. Good motivation.

As for testing, you check your status monitoring to make sure no one's currently in the system. Then you make the change and test as fast as you manually can. If anything goes wrong, hope to high hell you can fix and deploy before anyone uses the system.

BTW, it pays way too much for me to complain.

Re: Scratch One Inevitability

2009-01-15 16:23 • by hatterson
239379 in reply to 239375
Anonymous Coward:
Steenbergh:

Oh yes, the rush!
I do all my development on live environments!


You laugh, but I do most of my development in a production environment. Yeah, it sucks when things go wrong. But I sit right there with the people. So they're quick to crucify me should anything go wrong. Good motivation.

As for testing, you check your status monitoring to make sure no one's currently in the system. Then you make the change and test as fast as you manually can. If anything goes wrong, hope to high hell you can fix and deploy before anyone uses the system.

BTW, it pays way too much for me to complain.


If it pays you too much to complain then the company must have some spare money sitting aside to rig up a simple test environment. Sure maybe you won't get a clone of the production environment but even a $500 crappy eMachines desktop is better than developing in production.

Re: Scratch One Inevitability

2009-01-15 16:38 • by Madt M. (unregistered)
239380 in reply to 239299
I have a restaurant that uses dial-up:
- At 2400 baud credit card transactions take only seconds
- Minor security issues
- Keeps manager-slackers off the internet

If I need speed there's a shared wireless network nextdoor.

Re: Scratch One Inevitability

2009-01-15 17:33 • by Daniel (unregistered)
Management was correct, of course. A DNS server binds a name to an IP address. What that IP address will be used for is up to the application. :-)

Re: Scratch One Inevitability

2009-01-15 17:50 • by mp (unregistered)
239386 in reply to 239303
campkev:


The configurable variable was set to the name of a server that did not exist, but that was put in the DNS. If they tried to ping it and it couldn't find the host, then obviously they were on dialup and needed to activate the modem. If they were able to find it, but the ping timed out, obviously they could connect to the dns to get the data, so they didn't need to activate the modem. Brillant!!!


I agree. They probably tried other more straight-forward methods first, but which didn't work in certain situations. The main failure here was not calling the DNS entry DONOTDELETE or some such, as somebody mentioned above.

mp

Re: Scratch One Inevitability

2009-01-15 17:51 • by mp (unregistered)
239387 in reply to 239317
Andy Goth:
configurator:
This function finds out whether the internet connection is broadband or dial-up.
Why is this autodetected and not simply a configuration variable in the same way that HQ_SERVER_NAME is?


Most of the time, if you can have something work automatically rather than manually, this is a good thing.

mp

Re: Scratch One Inevitability

2009-01-15 17:57 • by mp (unregistered)
239388 in reply to 239371
hatterson:
Derp:
campkev:
Andy Goth:
Why is this autodetected and not simply a configuration variable in the same way that HQ_SERVER_NAME is?

Because if it was, it wouldn't be on this site.


No, because HQ_SERVER_NAME is a CONSTANT that doesn't change from installation to installation, but the need to fire up a modem does.


Uhhh, what?

Why would having SITE_CONNECTION_TYPE as a configuration *variable* not be a good thing for something that changes with each site?


Because it's one less thing to have to do for an install. You know, plug & play. I'm assuming the method has been rock-solid upto this point since the OP wasn't real familiar with it.

mp

Re: Scratch One Inevitability

2009-01-15 18:51 • by 5|i(3_x (unregistered)
At the very least, the hostname should've documented itself. E.g., dummyEntryForClientSideTest.domain.com

Re: Scratch One Inevitability

2009-01-15 18:58 • by robbak
239396 in reply to 239368
hatterson:
If I'm reading it correctly wouldn't this just cause all sites to act as if they're on dial-up.

If I ping temphost.mydomain.com and there's a DNS entry pointing to nothing I get "Request time out"

If I then delete the DNS entry and ping temphost.mydoamin.com again I should receive "Ping request could not find host..."

I also receive "Ping request could not find host..." when I can't hit the DNS server.

The problem is that this functions main purpose is to trigger the dial-up process if required (Badly named, as others have noted). This means that sites connected to a always-on connection would attempt to dial up, which would fail (No dial-tone, or no modem!), and the exception thrown by the dial-up function would stop the upload from occurring.

Re: Scratch One Inevitability

2009-01-15 19:32 • by campkev
The number of people having a hard time understanding this function has shaken my belief in my profession. Oh, wait, I never had any. As you were.
« PrevPage 1 | Page 2Next »

Add Comment