• NULLPTR (unregistered)


  • Anon (unregistered)

    I had the misfortune of picking up a ancient product where, at the time I joined, one of the original programmers was still there. The code written by this programmer was littered with lines like:

    i = 0 i = SomeFunction(blah)

    In fact, every time a function was called and the result placed into a variable, there would be a preceding line that pointlessly set that variable to 0 or empty.

    When I asked the original programmer why on earth he added all those lines, the answer was "in case the function fails".


  • Anon (unregistered) in reply to Anon

    Lost the formatting of the lines of code of course in my previous comment.

  • bvs23bkv33 (unregistered)

    warning: unreachable code

  • Lothar (unregistered)

    If you rm a file, all file handles that are held by applications still write into the file that no longer exists in the file system. So no new file with new log entries will pop up until you restart the application. Doing the echo-trick avoids that, the file simply "shrinks" back to 0 and new log entries of currently running applications show up without the need of restarting them.

  • (nodebb)

    If you rm a file, all file handles that are held by applications still write into the file that no longer exists in the file system.

    The file still exists, but it has no names. It's a subtle distinction that doesn't exist for some filesystem types, notably the various flavours of FAT.

  • Foo AKA Fooo (unregistered) in reply to Lothar

    Exactly, and if the application opened the file with the "append" option (which it should do for log files), it works just right.

    Likewise, log-watchers (if only a simple "tail -f") will recognize file truncation easily, whereas recognizing deletion-and-recreation takes some extra effort.

    The only slightly strange thing here is catting /dev/null. I'd do ": > json.log" (where ":" is the null command) -- or, you know, just use logrotate which exists for this purpose.

  • Foo AKA Fooo (unregistered) in reply to Anon

    It would actually be correct if the function does not exit regularly, but by throwing an exception (or given that it's ancient, perhaps by doing a longjmp), and the catching code still relies on the value of "i". But to me those would be rather strange circumstances, not something to be done on every function call.

  • -to- (unregistered) in reply to Foo AKA Fooo

    In other words, yet another useless use of cat !

  • 'Nony'Mouse (unregistered)

    Of course, if you wanted to express this more clearly, you could use 'truncate', but is it available ? The 'cat' idiom has been working in various shell accross a bunch of different uni'ces. But I'm just showing my age, I guess...

  • Schroedinger's Dog (unregistered)

    cat /dev/null > somehugefile? That takes me back to my Sysadmin days (Solaris servers, running mostly Oracle data bases).

    If a disk volume filled up with log messages up to the point where processes were blocked by their inability to write anything to the volume in question, this trick allowed to win space back and get things going again without having to kill any of the processes involved, even while they were holding on to the log files.

    It meant the difference between restarting a productive DB and dealing with the fallout, and gaining enough time to analyse the problem. Ideally, copy away some of the log file somewhere else first for analysis.

  • (nodebb) in reply to Foo AKA Fooo

    But that won't work if the declaration/first assignment is immediately before the second assignment. Unless the declaration is before a try and the second assignment is in the try, the variable falls out of scope wherever the exception is handled.

  • Anon (unregistered) in reply to Foo AKA Fooo

    This was in a basic language that didn't (at the time) even support exceptions. So there were no throwing exceptions, no try blocks, no lngjmps. (If something did go wrong with a run-time error, the program would actually just crash).

  • (nodebb) in reply to Steve_The_Cynic

    The file still exists, but it has no names. It's a subtle distinction that doesn't exist for some filesystem types, notably the various flavours of FAT.

    No, the file no longer exists. But the handle does.

    Addendum 2020-02-06 09:53: The "correct" way to do this would be:

    echo "" > json.log

  • America (the band) (unregistered) in reply to Steve_The_Cynic

    I wrote to a log in a file with no name it felt good to write out the null in the log, you can't remember too much 'cause there ain't no one for to give you logrotate

  • 🤷 (unregistered) in reply to Anon

    He should've set i = 0 AFTER the function call. Because, if the function call fails, then i will have a definite value...

    PS: Since it's sometimes a bit difficult to tell whether one is joking or not: I am not serious.

  • James (unregistered)

    <quote>Regardless, I’ve learned a new way to empty a file.</quote>

    Catting the null device is normally for file creation after all.

  • ooOOooGa (unregistered)

    cat is probably the most abused utility program in all of *nix.

    Its stated purpose is to combine multiple text files together. Hence the name 'concatenate' - shortened to 'cat'.

    But its most common practical use is to print files to the terminal output. It actually isn't the only program that can do that. But it may be one of the easiest and most reliable.

    Some other options for printing out the contents of 'data.txt':

    sed '' < data.txt

    grep '' data.txt

    php data.txt # as long as data.txt doesn't contain any '<?php' or '<?=' character strings.

  • (nodebb)

    @Bananafish That will write a newline to the file.

  • (nodebb) in reply to Foo AKA Fooo

    The minimalist way to truncate a file is "> filename". A null command is implied, just like it's implied when you just hit Enter on a blank line.

    Whether (on UNIX derivatives) a file exists when it's been deleted and is still open is a philosophical matter. Practically, it still occupies disk space and can be read and written normally, but has no names. It can be accessed only by processes that have it open.

  • (nodebb) in reply to Bananafish

    No, the file no longer exists. But the handle does.

    The file (an inode and some data potentially occupying space in the filesystem) still exists, but there are no "name-to-inode" links (better known as "directory entries") that allow anyone except the existing open file descriptor(s) to find it. As I said, it's a subtle distinction, but the correctness of what I said is revealed if you look in lost+found/ of a suitable file system afterwards if the kernel panics after the last name is deleted but before the last file descriptor is closed. (Effsick might create a filename in there for the still-open file, although I haven't seen that on an ext3 or ext4 filesystem. ext2 did it a lot, probably because of the lack of journalling.)

  • guest (unregistered)

    cat /dev/null >somefile.log is not a reliable way to truncate a file. After all can you be sure /dev/null is the device file you think it is? I mean there could be aliens in your system turning it into a regular file...

    (Yep, happened to me. After the second time I set up a watch on /dev/null to catch whatever was doing it. Of course it turned out to be a heisenbug.)

  • I dunno LOL ¯\(°_o)/¯ (unregistered)

    Regarding what happens if you use 'rm' under *nix, open file handles will stay open and continue to be written and occupy disk space until the file is closed.

    Once many years ago (15+ years) I had a Linux box that got pwned. The kiddies had bothered to rm the log files, but left the logger running. So I just shut down the machine hard (big red switch), pulled the hard drives, and looked for the logs on another computer. Gave all the info I had, including the /tmp files I found, to some computer security guys I worked with, found out it was a new kind of sploit kit.

  • (nodebb)

    There are two hard problems in computer science: naming things, cache expiration, and off-by-one-errors.

    And scope creep.

  • Officer Johnny Holzkopf (unregistered) in reply to Steve_The_Cynic

    It's worth noting that the unlink command (partially equivalent to rm), or the underlying unlink() system call do exact what they say: they destroy the connection between a file name and an inode number and its properties (location, size, access permissions, other properties), therefore unlinking them. This leaves existing access using a file handle aside - as long as a file is still open, references to it will still work, no matter if there is still a connection to a valid file name. Only if this file handle is dropped, the file really stops to exist (even though its data probably will still be on the disk, if the place of storage is a typical hard disk). I won't go into detail about what it means to delete files in the non-PC regions of computing (for example some mainframe OS - file - data set - storage management - uncatalog - scratch - you name it, it might be complicated).

  • Foo AKA Fooo (unregistered) in reply to I dunno LOL ¯\(°_o)/¯

    If the logger is still running and has the log files opened, you can access them via /proc/$PID/fd/$FD. So I'd copy them from there first (preferably to an external medium or over the network), even before a panic shutdown. (Yes, I'll send this message to ca. 2000, you're welcome.)

  • tlhonmey (unregistered)

    Note also that on some filesystems you actually need a tiny bit of free space to delete a file since they create the new version of the metadata on disk first before removing the old. The > trick (Commonly called "zapping" a file among the people I know) evades this since it's not changing the directory entry.

    Put that together with the disk space not actually being freed until all handles to the file are closed and it's actually a pretty handy trick since otherwise you'd have to restart the services or something to actually free the space. But proper log rotation is generally better since that way you don't get rid of all of the logs at once.

  • Erwin (unregistered)

    A file system has filled up. You find a huge log file and apply the cat /dev/null trick. The log file is now 0 bytes and the file system has ample free space again.

    But then the logger process uses seek(end_of_previous_message) before writing the next message to the log file.. You now have a huge file of mostly 0x00 characters and are out of space again.

  • Foo AKA Fooo (unregistered) in reply to Erwin

    Normally, this would be a sparse file, i.e. the blocks of zeros won't be allocated and won't use space. (Of course, that's if the FS supports those; FAT probably doesn't.)

    Sure, such files are a bit awkward to handle, that's why it's a good idea the logging process use the append option, which will cause writes to always go to the current end of file, even after truncation.

  • WTFguy (unregistered)

    @Erwin. Yup. There's only so much an admin can do to defeat WTF coding like that.

    Any logger called from multi-threaded code that used the snippet you suggest would also need to interlock on the EOF pointer. So sorta unlikely that code would be written in the first place or written correctly if attempted.

    OTOH, in the typical world of log-to-file apps and the later use of the logs, how often would dropped messages be noticed? Most testing won't drive the logging rate high enough for long enough to trigger the race condition with non-interlocked updates often enough to be noticed.

  • the blob (unregistered)

    So TRWTF is that we (including Remy) actually learnt something from a TDWTF article. Huh.

  • (nodebb) in reply to Anon

    i = 0 i = SomeFunction(blah)

    In fact, every time a function was called and the result placed into a variable, there would be a preceding line that pointlessly set that variable to 0 or empty.

    When I asked the original programmer why on earth he added all those lines, the answer was "in case the function fails".

    I sometimes do something like that, except with a reason of "in case the function containing these two lines fails and bails out before it reaches the SomeFunction call", presumably due to new code that later gets added between those two lines. In particular, if the value of an uninitialized variable could match a legit return value of SomeFunction, then you need to pick a different variable to indicate "things unexpectedly went south and this failed to get changed".

    Granted, the original example and explanation does reek of "in case the core language drops the ball in some absurdly drastic and improbable way".

    Addendum 2020-02-07 11:55: "pick a different value", rather

  • medievalist (unregistered) in reply to ooOOooGa

    A minor correction: "cat" is short for catenate, not for CONcatenate. But the two words are synonyms so the distinction is not very important.

    The cat command takes sequential data that is stored in some random number of blocks and catenates it to standard out.

    Or at least, that's how the Bell Labs boys explained it to me, back when dinosaurs roamed the earth.

  • Foo AKA Fooo (unregistered) in reply to WTFguy

    Don't know what you mean by "EOF pointer", maybe file position or something. Anyway, with the append option set, that's not necessary -- the OS will do each write atomically at the end. You just need to write each logging item at once, not in several parts.

  • Worf (unregistered)

    Actually the second problem is not "cache expiration" it's "cache coherency". Expiration is merely one way to ensure coherency. In a multiprocessor, the caches may choose to expire lines in the other cache, or some may choose to send the updated cache line to the other cache. Others simply demote the cache line to a lower level cache that may be shared (e.g., L2 to L3 cache), which works especially well for exclusive caches (where each level does not contain the same cache line as another level, so L1 will never have anything also in L2 or L3, etc).

    Outside of a multiprocessor, as the need for a cache goes up, the difficulty in maintaining coherency goes up as well. Think of two nodes connected by a bad network. For performance reasons you cache requests between the nodes because it can take an arbitrarily long time to fulfill any request. But the local cache needs to be maintained in coherency with the remote resource so what do you do? If you interrogate the remote resource, you slow down access since now every cache request is gated by the remote resource, but you do speed up in the case of no changes to the remote resource where the cached copy can be returned.

  • David (unregistered) in reply to Foo AKA Fooo

    I'd do ": > json.log" (where ":" is the null command)

    in bash you don't even need the null command. Just ">json.log" is enough. Yeah one should be careful here, in order to not inadvertently lose a file due to a typo.

  • guilty 1 (unregistered)

    In addition to all that has already been said about the cat /dev/null > hugefile.log thing, it has an additional advantage over rm-ing the file: It keeps the original file owner and permissions. Not that this should matter in normal circumstances, but at least it's one less source of triggering WTFs.

  • (nodebb) in reply to Foo AKA Fooo

    Even though I now learned that : is just the null command, I'm a bit scared to use it. This may be related to this little shell nugget I read a long time ago:

    :(){ :&:;};:

    (Don't run this, and don't say I didn't warn you… and if you do feel foolishly adventurous, at least precede it with a ulimit -u <some_sensible_number>.)

  • Hans Liss (unregistered)

    "cat /dev/null > json.log" is unneccesarily complicated. The canonical way to empty json.log "in place" in *nix is


Leave a comment on “On the Hard Problems”

Log In or post as a guest

Replying to comment #:

« Return to Article