• n@ (unregistered)

    tFsir

  • (nodebb)

    Random comment held for moderation.

  • bobcat (unregistered)

    Frist! (but sorted randomly)

  • (nodebb)

    The problematic place... he needed to spend time doing something so he can spend time speeding up the application later. Also a means of staying out of the problematic place of unemployment because of the strange bug of random order of processing this application performs. he can spend days "tracking this down" then get it fixed and his fix rate improves dramatically. Also this keeps himself on top of the list of number of bugs fixed due to the problematic place he was in because his code never had bugs to begin with, therefore his fix count was always to low.

    Companies, be careful what you wish for, for you will get it.

  • el_drafto (unregistered)

    They want to process directories in a different order each time. If processing in sequence takes a very long time you might want to do that in order not to have the result for directory B blocked by the processing of directory A over and over again. That way at least sometimes directory B is processed before work on directory A is started.

  • RLB (unregistered)

    Dunno. I can think of situations in which accessing the same directories in the same order every time, or displaying the same list in the same order every time to every user, could cause trouble. For starters, it could introduce biases. (Google Gullibility, anyone?) So I don't want to say this was a silly solution without seeing more.

    That comment, though... that definitely is silly.

  • Dan Bugglin (google)

    Funny thing, I actually have a need to process a directory of files and have been considering making it randomly ordered. I would probably do it slightly better than this, if only just.

    I have a set of 360k files and almost as many folders on a slow network share. My app processing it can remember where it left off if it crashes or otherwise has a problem, but if still has to process all the folders over again to check for files that were added, updated, or removed. But as it is currently 3/4 of the way finished if this happens it takes forever for it to get back to the point of the bulk of the files it doesn't know about yet. I have been considering making the order random if only to make it look like it's operating faster (in theory it would be no faster or slower in the end).

  • Bill Baggins (unregistered)

    Quicksort is O(N log N) under ordinary circumstances, but O(N^2) if you hand it a list that's already sorted (or reverse-sorted). Maybe he's trying to avoid that?

  • (nodebb)

    You'd be surprised. There are perfectly good reasons for...

    SELECT RANDOM() AS SEQ, ...
    FROM ATABLE
    ORDER BY SEQ
    

    For example, we were once assigning a new employee ID to all the employees in the organization. Because it was an ID type column, the numbers were assigned sequentially, or we didn't want to do it for the employees from A to Z, because someone might be able to guess which numbers went with witch names.

    So for the initial conversion, we processed the employees in random order.

    Wikipedia has a show random article link. To do that, they sign a random number to every article, and store it in the table. When you click the link, the site picks a random number, and then finds the first article having a number at or above.

    I am sure there are other use cases.

    Addendum 2018-04-19 08:41: I don't know about dictation. It goes back and changes words sometimes a whole paragraph behind where you're dictating. Sigh. That must be another use case for random.

  • Useful shuffle (unregistered)

    The shuffle is not that dumb. Let's say you have 100 files on a DVD and the end of the DVD is corrupted, only half of it at the end could be read. For example file 93, 95, 96,97, 99 are corrupted. Your copy-program fails at file 93 after some hours. You delete the file and restart your copy program. Your copy program again reads all files for some hours to crash at file 95. You delete the file and restart and so on until all were read. With the shuffling algorithm provided, it is very likely that the files with the highest indexes are shuffled to the beginning, so your copy program would not crash after hours, but very quickly. You would save a lot of time!

  • El Dorko (unregistered) in reply to Dan Bugglin

    Jesus effin christus, how about fixing the crashing problem instead of writing tons of "clever" (as in stupid) code to circumvent it? Argh. I really, really hope you are joking...

  • Jay Kreibich (unregistered)

    It is common for databases to have a flag you can turn on to reverse or randomize row order in queries that do no specific define an ORDER BY. If you turn it on during testing, you can catch a LOT of bugs because people assume return order is stable. While this seems a bit odd for production, I can still think of reasons why you might want to do this.

  • Nope (unregistered) in reply to CoyneTheDup

    "When you click the link, the site picks a random number, and then finds the first article having a number at or above."

    That is definitely not random. If this is the case, the algorithm would show a bias towards larger numbers..

  • ZZartin (unregistered)

    /shrug maybe they just didn't want the folder list sorted alphabetically which was probably the default or sorted by some standard metric.

  • Phlip (unregistered)

    If I had to guess... I suspect that the "problematic places" are "." and ".." which would, in many languages, be right at the top of the list. If this function, say, recurses into subfolders, and is searching for something (and will bail out early when it finds it) then this will at least usually ensure it finds it before it gets stuck in an infinite descent...

  • Donald Knuth (unregistered)

    You pesky kids, appropriating my algorithms for nefarious purposes. Now get off my lawn!

  • Wallace Owen (unregistered)

    He was initializing a binary tree and didn't want the performance hit of constantly rebalancing the tree because the data is in-order, which is worst-case for that.

  • Grey no beard (unregistered)

    If you are partitioning data for parallel processing, you need a partition key. Sometimes you use a date so you append data to the last partition (or similar). But for scan type effects, random works better because it scatters the work across the partitions.

  • TheRealWTF(tm) (unregistered)

    @RLB I agree, there should be no bias!

    http://thedailywtf.com/articles/Happy_(Belated)_Jed_Day!

  • (nodebb)

    One of my clients is a wholesaler whose web site has a "show me nearby brick-and-mortar retailers selling this stuff" feature. I implemented it with a random tiebreaker (but seeded based on the current date, so it stays constant for a day at a time), so that equally distant retailers have an equal chance of being higher on the list on any given day.

  • Sole Purpose of Visit (unregistered) in reply to RLB

    The only minor problem with your diagnosis is that, in fact, there is no seed to the random number generator. Which, as far as I know in .NET, means that you will get exactly the same "shuffled" sequence every time.

    (Somebody in the redacted comments may already have pointed this out.)

    I mean, even seeding it with a time-stamp might make it fairly random.

    As to why you would want to do that? Who knows? The delicious ironing is that you are not guaranteed to index every file (well, obviously), and given the lovely accidental purity of the loop, that means that you might (a) get duplicates and (b) miss the "problematic" file.

    Of course, deleting the "problematic" file might be a better solution ... but that is an exercise best left to either the reader or to maintenance programmers at 2am in the morning.

    We live in the best of all possible worlds.

  • Sole Purpose of Visit (unregistered) in reply to emurphy

    Or you could just have used alphabetical order, possibly hashed against the date.

    Don't abuse random functions.

    Never abuse random functions.

    Apart from anything else, writing tests for the result is sheer bloody hell.

  • Sole Purpose of Visit (unregistered) in reply to ZZartin

    /shrug duplicates and omissions.

    Try a little harder.

  • foo (unregistered)

    Maybe there is a cycle in the directory structure, with "foo/bar/baz/" having a link to "foo/". Ordering randomly doesn't exactly solve the problem, but if you are doing a search then eventually you'll find what your looking for when "foo/bar/baz/target/" gets ordered before "foo/bar/baz/foo/".

  • Zenith (unregistered)

    I think it's a fair bet that they ran into some part of the file system they didn't have access to that threw an exception.

    The Real WTF is that the Win32 API doesn't seem to have some sort of flag to ignore directories you can't get into. I understand that Windows was developed as a single-user system with this permissions stuff bolted on when NT was developed. That really should've been worked around when they put the .NET framework IO interface over top of the core libraries.

  • Anon (unregistered) in reply to CoyneTheDup

    Witch names?

    You mean like Hazel, Greta, Helga, Jadis, Glinda ... ?

    Why is it more important that somebody doesn't guess the employee ID of witches?

  • Gumpy Gus (unregistered)

    My guess is that they had file access conflicts if this code was running in several places at the same time. So how to (somewhat minimize) those conflicts? Pick a random file. It's most likely the other processes will pick another one, most of the time, if there are plenty of them. Sheesh.

  • (nodebb)

    Probably not in this case, but some sort algorithms work better on random data, rather than partially sorted data. You never know when things like this crop up.

    Then again, this example is most likely some other example, which should be explained in the comments thoroughly.

  • Olivier (unregistered)

    So weird kind of load balancing maybe, so that different processes using a couple of subdirectories will not always use the two top ones.

  • FlipBurgerAdmin (unregistered) in reply to CoyneTheDup

    I don't know the Perfectly Good Reasons (TM) at your place. By a Perfectly Good Reason (TM) at my place is: "What is your employee number?" is a "Security Question (TM)" asked by admins when people forget their passwords and email a request to the admins to reset it from some external email address. It's a very secure method, just like those depending upon you knowing "your" SSN, or birth-date, or any of those hard-to-remember, easy-to-find-out, hard-to-change facts.

  • (nodebb) in reply to FlipBurgerAdmin

    Security is exactly the reason we wanted the new employee ids randomly assigned. We didn't want anyone to be able to guess, based on one employee, what an alphabetically adjacent employee's number might be.

    After the initial conversion, we figured the hire order would be be random enough to prevent such guesses later on.

  • Chris (unregistered)

    Given what I've seen on this site, I wouldn't be surprised if the GetDirectories() method can return nulls or something else that was invalid and caused crashed when trying to access it. However, when accessed in the shuffle method, there is some operator overriding happening that replaces bad data with something else which is handled better, without anyone knowing about it. I would then not be surprised to learn that the original author had no idea about any of this (because, I mean, isn't it obvious?), but they did work out that re-ordering the directories seemed to make the crashes go away.

  • FlipBurgerAdmin (unregistered) in reply to CoyneTheDup

    I propose a mandatory Security 101 course, where the students repeat "identification numbers are not passwords" till they are enlightened.

  • Abe Z. (unregistered)

    Probably some forgotten test whether they can process some predefined files in whatever order. Or, maybe a lesson to teach other colleagues who thought files always come sorted alphabetically. Or maybe just defensive programming, something like when this guy intentionally randomly reordered fields in Json to teach API consumers a lesson: https://twitter.com/tartley/status/966287396286418945

  • doubting_poster (unregistered)

    Another reason to randomise that shit is to avoid depending on accidental orderings. We had a project that through organic growth ended up having a dependency on how maps were sorted in Java, which was fine right up till the point where Java changed its ordering. Ker-derp. Now it's randomized.

  • No Fun (unregistered) in reply to CoyneTheDup

    I can't imagine the business requirements such that it's genuinely problematic if an employee can reasonably guess another employee's ID based on last names, but it's not a problem if an employee can reasonably guess another employee's ID based on hire date.

    My general rule is that if someone who sees an ID shouldn't be able to guess a different ID, the IDs shouldn't be sequential, whatever the order.

  • (nodebb) in reply to No Fun

    Which is why I'm in favor of composite IDs, with a guaranteed-unique sequential part and a random part. Guessing the sequential part won't help you guess the random part, unless someone screwed up the latter's generation.

  • Developer Dude (google) in reply to Jay Kreibich

    That was my guess; that it was for testing purposes to test code that might make assumptions about order.

  • Appalled (unregistered)

    How about a BOM/Parts "database" that for unknowable reasons is stored in an archaic Folder/File structure. (Perhaps drawing files that they didn't wish to store as BLOB's or an archaic App that expected them that way). Now a request arrives from the QA department for a random sample of drawings to Audit once a month.

  • Worf (unregistered) in reply to Phlip

    Funny that, given anyone who's actually traversed a directory either experiences or finds out that they do get "." and ".." in the results. (In UNIX systems, those are hardlinks and thus real directory entities. In DOS/Windows, they're virtual entities but I do believe the OS returns them just the same).

    Heck, I remember one of my computer science assignments was to load a file from disk. Almost a few minutes after it was assigned, a post on the class' forum stated "Remember to filter out . and ..!" from someone who tried to recurse into the directory tree.

    This method just seems like an odd way to avoid having to use strcmp() and equivalents.

  • Warr (unregistered)

    As a general rule, if operations don't have to happen in a particular set order, I actually prefer using randomization rather than an arbitrary unspecified order. Some reasons:

    • If you're running a job periodically, and the job fails at a particular place, any work to be processed after that point will be starved for processing. Randomizing the order of the jobs will ensure that everywhere except for the specific problem area gets processed eventually, assuming that you can "catch up" in processing. Not all problems can be anticipated and prevented, or even necessarily fixed, so this makes the fallback behavior a little more robust.

    • Randomizing tends to shake out dependency-order type bugs, and prevent people from developing future dependence on an observed, but not necessarily guaranteed, order.

    • Being reliant on a particular order also implies that you are reliant on there being some order at all, i.e. not doing things in parallel.

  • ray10k (unregistered) in reply to TheRealWTF(tm)

    Thanks for reminding me about that article. While I understand that "Result" is apparently Delphi-speak for "return", I still don't have the faintest idea how that code would ever run without getting in an infinite loop.

  • (nodebb) in reply to FlipBurgerAdmin

    I totally agree with your Security 101 course. However it should be followed by Reality 101, "they'll use it as a password anyway," with social security number and driver's license number being the canonical examples; and Reality 102, "because identity numbers are better than name and address....and birthdate."

  • (nodebb) in reply to No Fun

    Yes, the hire date problem did exist. But see my answer just above. We knew that despite all common sense, that the new number would be used as a password anyway (and of course it is). Given the level of security, the fact that I might guess the number of the person hired just before me or after me doesn't necessarily mean I know who that is. Which is necessary because the number isn't used alone, it's used in combination with the name.

    If you want something more problematic to think about, think about the fact the number appears on my badge, along with my name.

  • FlipBurgerAdmin (unregistered) in reply to CoyneTheDup

    Sigh... yes, you are right, they will use it as a password anyway.

    And even if we succeed in enforcing passwords separate from ids, we still have a hunter2 society.

Leave a comment on “A Problematic Place”

Log In or post as a guest

Replying to comment #:

« Return to Article