The Daily WTF: Curious Perversions in Information Technology

KattMan · 2009-01-15 Reply Admin

OK I'm confused. As the code is written it does not go against a server that never existed, it goes against an apparently configurable variable that holds a server address. Is there more to this that we are not getting? Where is the WTF?

2009-01-15 Reply Admin

"Of course, management decided that this was the system admins' fault, for deleting the stale DNS entry. They should've known better!"

Yeah, pretty much. If something is wrong but sitting there and not hurting anything, leave it be. How many WTFs spring from trying to 'optimize' things already working fine?

If they insist on 'fixing' it, though, then perhaps they should have done some tests first. On the other hand, that wouldn't have the adrenaline rush of deploying untested changes in a production environment.

RHuckster · 2009-01-15 Reply Admin

The real WTF is 75% of the locations still use dial up.

2009-01-15 Reply Admin

At least it wasn't a driver issue :-)

DeLos · 2009-01-15 Reply Admin

Doesn't the fact that they then used a "Request timed out" indicate that they knew the server didn't exist? Doesn't that make this implementation very very silly? Why not just ping a stable known server? Why not check for time out and successfull ping? WTF

2009-01-15 Reply Admin

The real WTF is 75% of the locations still use dial up.

Many of these locations are little kiosks (more like mini cubicles) in the aisles of shopping malls for only a few weeks/months out of the year. A cheap, seldom-used dial-up connection would be simpler and cheaper than putting a broadband connection into a booth that's not going to be used very many times during the day and for only a few weeks/months of the year...

2009-01-15 Reply Admin

KattMan:
OK I'm confused. As the code is written it does not go against a server that never existed, it goes against an apparently configurable variable that holds a server address. Is there more to this that we are not getting? Where is the WTF?

Presumably if the nonexistent server was in DNS, the ping would try to use it and time out, and the function would return broadband. Removing the server from DNS would cause the ping to fail immediately, and the function would throw an exception.

campkev · 2009-01-15 Reply Admin

KattMan:
OK I'm confused. As the code is written it does not go against a server that never existed, it goes against an apparently configurable variable that holds a server address. Is there more to this that we are not getting? Where is the WTF?

The configurable variable was set to the name of a server that did not exist, but that was put in the DNS. If they tried to ping it and it couldn't find the host, then obviously they were on dialup and needed to activate the modem. If they were able to find it, but the ping timed out, obviously they could connect to the dns to get the data, so they didn't need to activate the modem. Brillant!!!

kastein · 2009-01-15 Reply Admin

Removing stale DNS entries is a good thing - however, they should have enabled logging on bind and searched through the log after a few days for anything requesting those systems before removing the records. Not like I've ever done this, no one does, but it'd be a good idea. Seriously, can you expect them to somehow know that removing a DNS record that no longer points at a server will break things? Most good DNS/network management tools remove all A/PTR/CNAME records from the zone file automatically when you delete the machine from the database anyways - at least, NetReg does.

EDIT: also, the Real WTF (shoot me for using that phrase) is that they were using ping to test the link, instead of say, opening a TCP connection. A /lot/ of firewalls, routers, etc block ICMP, there is no real guarantee that it will go anywhere.

2009-01-15 Reply Admin

Downfall:
"Of course, management decided that this was the system admins' fault, for deleting the stale DNS entry. They should've known better!"
Yeah, pretty much. If something is wrong but sitting there and not hurting anything, leave it be. How many WTFs spring from trying to 'optimize' things already working fine?

If they insist on 'fixing' it, though, then perhaps they should have done some tests first. On the other hand, that wouldn't have the adrenaline rush of deploying untested changes in a production environment.

Oh yes, the rush! I do all my development on live enironments!

CAPTCHA consequat: Yes, that's what I'm talking about!

2009-01-15 Reply Admin

kastein:
Removing stale DNS entries is a good thing - however, they should have enabled logging on bind and searched through the log after a few days for anything requesting those systems before removing the records. Not like I've ever done this, no one does, but it'd be a good idea. Seriously, can you expect them to somehow know that removing a DNS record that no longer points at a server will break things? Most good DNS/network management tools remove all A/PTR/CNAME records from the zone file automatically when you delete the machine from the database anyways - at least, NetReg does.
EDIT: also, the Real WTF (shoot me for using that phrase) is that they were using ping to test the link, instead of say, opening a TCP connection. A /lot/ of firewalls, routers, etc block ICMP, there is no real guarantee that it will go anywhere.

It doesn't have to though, does it? They don't really open a connection to anything but their DNS server. Which is what they should've done in the first place.

Capt. Obvious · 2009-01-15 Reply Admin

DeLos:
Doesn't the fact that they then used a "Request timed out" indicate that they knew the server didn't exist? Doesn't that make this implementation very very silly? Why not just ping a stable known server? Why not check for time out and successfull ping?

The purpose of the function is to distinguish which connection type you have, not its speed. Why that's important, I do not know.

The code, as written, implies that the error messages are different when pinging a non-existent server. That is, the dialup would return "Ping request could not find host..." and the broadband would return "Request timed out".

If true, this would be more robust than pinging a known stable server and guessing based on ping time... especially if 2000 locations want to do it at the same time.

That said, there certainly has to be a better way. If you really care, surely an OS call could determine if it was using the ethernet or a modem port? Or "not caring" seems like a good way too.

But, the most WTFy is failing when and only when the connection doesn't.

2009-01-15 Reply Admin

Capt. Obvious:
The purpose of the function is to distinguish which connection type you have, not its speed. Why that's important, I do not know.

Actually the function returns what connection type you have, but, as a side effect activates the modem if you're on dial-up. Really, it a case of a badly named function because what if you're on dial-up and you call the function twice? Presumably it'd come back on the second try and tell you that you're on broadband (assuming the modem connection wasn't shut down in between calls).

Maurits · 2009-01-15 Reply Admin

kastein:
A /lot/ of firewalls, routers, etc block ICMP, there is no real guarantee that it will go anywhere.

This strategy works just fine even if ICMP is blocked.

2009-01-15 Reply Admin

Hang on a minute, I'm a bit confused here (it's not hard to confuse me!!)

A listener program runs on the server - why would that care if the sender (store\outlet) used broadband or modem?

the sender opens the connection, sends the data (by whatever means) the listener listens & acts upon the data.

It's the client software that needs to know if it needs to dial-up or is already connected by broadband.

So is this util function that they found on the server or on the client? If it's on the client then why not just use a config setting for whether it's broadband or not.

If it is on the client they updated all 2,000 stores the day before????

2009-01-15 Reply Admin

A /lot/ of firewalls, routers, etc block ICMP, there is no real guarantee that it will go anywhere.

Sort of off topic, but unless the universe has changed recently, blocking ICMP is incompatible with Path MTU Discovery (RFC 1191). Google "Blackhole Routing" to find out why it is not a good idea to mess with Path MTU Discovery. Bottom line: Unless you like debugging really weird message size related transmission problems, donna block the ICMP.

2009-01-15 Reply Admin

This function finds out whether the internet connection is broadband or dial-up. I guess because before needing to connect, you have to start the dial-up connection. Also, when the dial-up is already connected, this will in fact return Broadband because the non-existent server can technically still be reached (i.e. there is a gateway for its IP, the default gateway)

Andy Goth · 2009-01-15 Reply Admin

configurator:
This function finds out whether the internet connection is broadband or dial-up.

Why is this autodetected and not simply a configuration variable in the same way that HQ_SERVER_NAME is?

2009-01-15 Reply Admin

Anonymous:
***A /lot/ of firewalls, routers, etc block ICMP, there is no real guarantee that it will go anywhere.***
Sort of off topic, but unless the universe has changed recently, blocking ICMP is incompatible with Path MTU Discovery (RFC 1191). Google "Blackhole Routing" to find out why it is not a good idea to mess with Path MTU Discovery. Bottom line: Unless you like debugging really weird message size related transmission problems, donna block the ICMP.

I can't speak for "most" routers, but my router only drops ICMP ping requests.

2009-01-15 Reply Admin

Andy Goth:
configurator:
This function finds out whether the internet connection is broadband or dial-up.
Why is this autodetected and not simply a configuration variable in the same way that HQ_SERVER_NAME is?

Because the type of connection is doesn't matter. It just indicates that there is no connection established at the time it is called. Meaning you either have to fire up the modem (or start beeping until the operator plugs the damn cable back in)

2009-01-15 Reply Admin

The function name is wrong.

The code checks to see if there is at least enough internet connection to resolve a DNS entry. If there is, you do not need to run the dialer to establish a connection.

The test condition would be met if someone had manually accessed the 'net (e.g. to surf) before the software tried, if the system was on a LAN with an always-on dialup or ISDN type connection, etc.. Use of broadband is just the most likely explanation.

There are better ways to do this test (resolv instead of ping), there are more reliable test subjects (google.com instead of some dummy DNS entry), and there are better tests (you can check the interfaces in windows) but the method they used is not insane.

IT shouldn't have cleared the DNS entry. They should've passed around a list of DNS entries that are slated for deletion.

2009-01-15 Reply Admin

At first, I thought my confusion at this WTF was just me. Upon reading the thread, though, I'm convinced that it just doesn't make sense. Is this a case of the anonymization ruining the story?

Charles400 · 2009-01-15 Reply Admin

All your base are belong to us.

2009-01-15 Reply Admin

kastein:
also, the Real WTF (shoot me for using that phrase) is that they were using ping to test the link, instead of say, opening a TCP connection. A /lot/ of firewalls, routers, etc block ICMP, there is no real guarantee that it will go anywhere.

If by "opening a TCP connection" you mean "verify the server's SSL certificate against a CA root key owned by your IT department" then I agree...on the other hand, that's really only necessary if you are 1) in a kiosk or mobile environment, 2) connecting to the nearest unsecured wireless AP, and 3) next to a Starbucks or similar bait-and-switch style hotspot.

campkev · 2009-01-15 Reply Admin

Andy Goth:
Why is this autodetected and not simply a configuration variable in the same way that HQ_SERVER_NAME is?

Because if it was, it wouldn't be on this site.

2009-01-15 Reply Admin

kastein:
Removing stale DNS entries is a good thing - however, they should have enabled logging on bind and searched through the log after a few days for anything requesting those systems before removing the records. Not like I've ever done this, no one does, but it'd be a good idea.

I actually do do this--both logging of DNS queries and also logging TCP connections to ports and IP addresses I think are not in use using OS-level firewall tools. It works great on intranets, LANs, and VPNs.

The trouble is, with public DNS records you'll get several dozen queries a second (OK, I'm exaggerating...slightly) from spammer botnets all over the world. I doubt I would have been able to spot it.

2009-01-15 Reply Admin

Is this a case of:

TRUE FALSE FILE_NOT_FOUND

cconroy · 2009-01-15 Reply Admin

Third simplest possible explanation... I'm in my own personal hell. So far the best theory Curtis had.

Occam's Razor #3, right as usual.

cdosrun · 2009-01-15 Reply Admin

The stores that had posted data seemed to have nothing differentiating them from the stores that hadn't posted data – regardless of how long the location had been there, how close it was to another location that had posted data successfully – there was nothing.

Except method of connecting to the internet. Not his fault, no one would think of checking that when troubleshooting this sort of problem.

JamesQMurphy · 2009-01-15 Reply Admin

Asiago Chow:
The function name is wrong.

Agreed. A better name for what this function does is ConnectIfOffline(). And as someone pointed out, after a successful connection, you call the function a second time, the function would return ConnectionTypes.Broadband, regardless of the actual connection type.

Asiago Chow:
There are better ways to do this test (resolv instead of ping), there are more reliable test subjects (google.com instead of some dummy DNS entry), and there are better tests (you can check the interfaces in windows) but the method they used is not insane.

Right, like the InetIsOffline API.

Asiago Chow:
IT shouldn't have cleared the DNS entry. They should've passed around a list of DNS entries that are slated for deletion.

True, but I bet dollars-to-donuts that nobody would have remembered that this function referenced one of the DNS entries. At least IT would have had their asses covered.

2009-01-15 Reply Admin

cdosrun:
The stores that had posted data seemed to have nothing differentiating them from the stores that hadn't posted data – regardless of how long the location had been there, how close it was to another location that had posted data successfully – there was nothing.

Except method of connecting to the internet. Not his fault, no one would think of checking that when troubleshooting this sort of problem.

I laughed, and almost spit coffee on my keyboard. Excellent point.

2009-01-15 Reply Admin

Isn't the real problem that they didn't name the invalid DNS entry "DoNotDelete.company.tla"?

2009-01-15 Reply Admin

It's always the last one you try, isn't it...

2009-01-15 Reply Admin

JamesQMurphy:
Asiago Chow:
There are better ways to do this test (resolv instead of ping), there are more reliable test subjects (google.com instead of some dummy DNS entry), and there are better tests (you can check the interfaces in windows) but the method they used is not insane.
Right, like the InetIsOffline API.

I've never needed to use that API, but... googling around shows it returns false negatives. Looks like it only checks the status of designated internet connections instead of checking for internet connectivity. Is that true? If so they may have started there and switched to DNS lookup testing because it better fits their problem. They don't care whether an "internet connection" is "connected" per the OS, they care whether they are able to talk to the 'net.

I'm not patting the coder on the back here but it doesn't seem totally crazy.

2009-01-15 Reply Admin

I can't believe no one has said this yet, but if you're going to use this method to determine if you have a live internet connection, why not ping the server you're trying to send the data to, instead of some random address that doesn't exist?

If I can't get where I'm trying to go, hmmm, maybe I need to turn on the modem.

2009-01-15 Reply Admin

campkev:
Andy Goth:
Why is this autodetected and not simply a configuration variable in the same way that HQ_SERVER_NAME is?
Because if it was, it wouldn't be on this site.

No, because HQ_SERVER_NAME is a CONSTANT that doesn't change from installation to installation, but the need to fire up a modem does.

hatterson · 2009-01-15 Reply Admin

If I'm reading it correctly wouldn't this just cause all sites to act as if they're on dial-up.

If I ping temphost.mydomain.com and there's a DNS entry pointing to nothing I get "Request time out"

If I then delete the DNS entry and ping temphost.mydoamin.com again I should receive "Ping request could not find host..."

I also receive "Ping request could not find host..." when I can't hit the DNS server.

hatterson · 2009-01-15 Reply Admin

Derp:
campkev:
Andy Goth:
Why is this autodetected and not simply a configuration variable in the same way that HQ_SERVER_NAME is?
Because if it was, it wouldn't be on this site.

No, because HQ_SERVER_NAME is a CONSTANT that doesn't change from installation to installation, but the need to fire up a modem does.

Uhhh, what?

Why would having SITE_CONNECTION_TYPE as a configuration variable not be a good thing for something that changes with each site?

2009-01-15 Reply Admin

Les:
Isn't the real problem that they didn't name the invalid DNS entry "DoNotDelete.company.tla"?

There seem to be three root causes (of any WTF):

Believing your problem to be unique, and reinventing the wheel rather than using proven code, data structures or algorithms.
Poor naming - in this case especially the DNS entry (but also the function name).
Multiple process failures.

Prevent any one of these and your WTFs don't reach production. However, they like travelling as a group - the three horsemen of the WTF - and they bring us so much schadenfreude.

2009-01-15 Reply Admin

RHuckster:
The real WTF is 75% of the locations still use dial up.

I don't see why. It's cheaper and if all they're doing is uploading some data once a night is it really necessary? Some of these stores are set up as temporary booths in other retail establishments, and it may not be convenient to use.

2009-01-15 Reply Admin

Steenbergh:
Oh yes, the rush! I do all my development on live environments!

You laugh, but I do most of my development in a production environment. Yeah, it sucks when things go wrong. But I sit right there with the people. So they're quick to crucify me should anything go wrong. Good motivation.

As for testing, you check your status monitoring to make sure no one's currently in the system. Then you make the change and test as fast as you manually can. If anything goes wrong, hope to high hell you can fix and deploy before anyone uses the system.

BTW, it pays way too much for me to complain.

hatterson · 2009-01-15 Reply Admin

Anonymous Coward:
Steenbergh:
Oh yes, the rush! I do all my development on live environments!

You laugh, but I do most of my development in a production environment. Yeah, it sucks when things go wrong. But I sit right there with the people. So they're quick to crucify me should anything go wrong. Good motivation.

As for testing, you check your status monitoring to make sure no one's currently in the system. Then you make the change and test as fast as you manually can. If anything goes wrong, hope to high hell you can fix and deploy before anyone uses the system.

BTW, it pays way too much for me to complain.

If it pays you too much to complain then the company must have some spare money sitting aside to rig up a simple test environment. Sure maybe you won't get a clone of the production environment but even a $500 crappy eMachines desktop is better than developing in production.

2009-01-15 Reply Admin

I have a restaurant that uses dial-up:

At 2400 baud credit card transactions take only seconds
Minor security issues
Keeps manager-slackers off the internet

If I need speed there's a shared wireless network nextdoor.

2009-01-15 Reply Admin

Management was correct, of course. A DNS server binds a name to an IP address. What that IP address will be used for is up to the application. :-)

2009-01-15 Reply Admin

campkev:

The configurable variable was set to the name of a server that did not exist, but that was put in the DNS. If they tried to ping it and it couldn't find the host, then obviously they were on dialup and needed to activate the modem. If they were able to find it, but the ping timed out, obviously they could connect to the dns to get the data, so they didn't need to activate the modem. Brillant!!!

I agree. They probably tried other more straight-forward methods first, but which didn't work in certain situations. The main failure here was not calling the DNS entry DONOTDELETE or some such, as somebody mentioned above.

mp

2009-01-15 Reply Admin

Andy Goth:
configurator:
This function finds out whether the internet connection is broadband or dial-up.
Why is this autodetected and not simply a configuration variable in the same way that HQ_SERVER_NAME is?

Most of the time, if you can have something work automatically rather than manually, this is a good thing.

mp

2009-01-15 Reply Admin

hatterson:
Derp:
campkev:
Andy Goth:
Why is this autodetected and not simply a configuration variable in the same way that HQ_SERVER_NAME is?
Because if it was, it wouldn't be on this site.

No, because HQ_SERVER_NAME is a CONSTANT that doesn't change from installation to installation, but the need to fire up a modem does.

Uhhh, what?

Why would having SITE_CONNECTION_TYPE as a configuration variable not be a good thing for something that changes with each site?

Because it's one less thing to have to do for an install. You know, plug & play. I'm assuming the method has been rock-solid upto this point since the OP wasn't real familiar with it.

mp

2009-01-15 Reply Admin

At the very least, the hostname should've documented itself. E.g., dummyEntryForClientSideTest.domain.com

robbak · 2009-01-15 Reply Admin

hatterson:
If I'm reading it correctly wouldn't this just cause all sites to act as if they're on dial-up.
If I ping temphost.mydomain.com and there's a DNS entry pointing to nothing I get "Request time out"

If I then delete the DNS entry and ping temphost.mydoamin.com again I should receive "Ping request could not find host..."

I also receive "Ping request could not find host..." when I can't hit the DNS server.

The problem is that this functions main purpose is to trigger the dial-up process if required (Badly named, as others have noted). This means that sites connected to a always-on connection would attempt to dial up, which would fail (No dial-tone, or no modem!), and the exception thrown by the dial-up function would stop the upload from occurring.

campkev · 2009-01-15 Reply Admin

The number of people having a hard time understanding this function has shaken my belief in my profession. Oh, wait, I never had any. As you were.

Scratch One Inevitability

Leave a comment on “Scratch One Inevitability”