- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
int timediff = (NULLPTR)->getDiff((void)0);
Admin
The submitter's solution scans each dataset twice: once to get the minimum and once to get the maximum. The ex-boss' solution at least does it in one pass, so, even though it adds complexity by concatenating and unmixing the datasets, it would be faster for large datasets.
Admin
Not TRWTF, but if the lists are extraordinarily large, this O(3N) task could take a lot longer than a more optimized (but less expressive) function that avoided the use of repetitive collection enumeration.
Admin
Time, it needs time
To win back your love again
I will be there
I will be there
Admin
The ex-boss' solution sorts the list, which is O(n*log(n)) if the input isn't mostly sorted already. So for large, unsorted data sets, the submitter's solution should be faster.
That's not to say the solution couldn't be optimized if the effort is required. One could find both the max and min of each series in one pass.
Admin
Also nothing was said about how often this will be done. If you do it once a day, the time does not matter; if you do it every 10 ms with new data, then it might matter. I prefer it to be simpler to understand in the former case, with the latter case you might want to do something more performant and maybe harder to read.
Admin
Yes, it's an easy trap to fall in to create a pass-based system, and while performing everything in a single or as few passes as possible may sometimes be necessary (remember folks, browsing memory is expensive, typically moreso than the computations proper), here it is clearly pointless given the previous mixing and sorting passes of the boss's solution…
Meanwhile, with a pass-based system you need to maintain non-obvious state, which you have to evolve in even weirder ways for even minor requirement changes, and there is a high risk of forgetting edge cases, mishandling sentinel values, and introducing yet different performance pathologies by giving up on well-tested library functions like min() and max(). Start with expressive, and any deviation from that should appear self-evident.
Admin
The real meaning of the story: bad programming leads to alcoholism, opioid addiction, extramarital sex, disease, and early death.
Admin
The EB's code is 3-pass: merge, sort, process.
And sorting is O(n) best case (to verify it's already sorted).
And each step in the processing loop is more expensive than in a min/max function.
Definitely much slower in total!
Admin
The old code surely needs time. But now the wind of change blows straight into the face of time!
Oh well, I need a holiday. -- Whoa, is reCAPTCHA reading my comment before posting? This time, it wanted me to identify palm trees, which I'd never had before.
Admin
TRWTF is the article's description of the requirement. If it weren't for the following paragraph describing the intended solution, I'd have thought "how much overlap" meant the size of the intersection of the two sets.
Admin
That was my thought as well - "how much overlap" is rather vague. Overlap between the overall time interval described by each set, overlap between discrete time intervals within each set, intersection of individual data points... there are numerous ways to interpret the problem as stated, so it's not surprising that you'd get so many different answers.
Of course, that leads to another great interview exercise: is the candidate smart enough to ask for clarification of the requirements before setting off to design a solution?
Admin
This reminds me of a self codesod moment at my job. I was working a template and needed a way to display something only in certain conditions. What I needed (and what's in the code now) was a simple or statement of two variables, but what I came up with before was some crazy series of not's and and's statments.
Admin
"given two unordered sets of timestamps, calculate how much overlap (if any) is between the two series. "
reads to me as: for i in set1.length print set1[i] - set2[i]
In other words, needs more specifications.
Admin
The problem as stated violates Special Relativity: there is no simultaneity and thus no way to compare Start and Stop times. (yes I'm joking, ffs!)
Admin
With SpectateSwamp Date Search, you sort the dates in your head and no printing required.
(I'll get me coat)
Admin
Agreed, that was exactly what I was thinking when I saw the word "overlap". Just goes to show how much detail you need to include when specifying requirements. It might seem overkill at times, but I would rather have overkill than ambiguity.
Admin
I don't think the solution is so obvious that it's right to judge someone for not immediately arriving at the simplest solution. It's like a puzzle toy - if you've never seen the solution, there's no training or ability that can prepare you to readily find it, other than having solved that particular puzzle before. Such problems are a terrible basis on which to judge a job candidate.
/why yes, I did write a bad date-overlap check function before 'discovering' the simple two-comparison method during later debugging.
Admin
... and that would be bad why?
Admin
The interesting thing here is that if you take the time to properly spec the problem the right solution just kind of falls out. 'What do you mean by overlap? How many points have the same values?' 'No, you just have two time ranges with a random number of points and you want to see if the ranges overlap in time.' Now if you take the time to just draw a simple example (always a good idea) it's immediately obvious that only the end times in a range matter at all.
This is less about testing your puzzle solving ability than it is about testing whether you take the time to spec and design at all - or whether you just dive in feet first and start coding before thinking (the usual approach).
Admin
Who said it would be? To each his own.
Admin
What's the answer supposed to be if one set's smallest and largest values are entirely within the other set's smallest and largest values?
Admin
Been there, done that. Twice. Or perhaps once but really long?
Admin
Still loving you?
I'd of thought Always Somewhere fitted this better?
Admin
The solution still applies and and the result is the smaller set.
Admin
It's not just how often it's run. The time it takes to process the less efficient algorithm versus the more efficient algorithm also matters.
Say we only run it once a day, but there's a bajillion data points in each set, and it takes 30 hours to handle the slower method, but only an hour to do the faster method. In that case, doing the slower method is absolutely insane, it takes longer to run than collecting the data.
But if there's a maximum of 5 data points in each set ever, and you're using anything at least as sophisticated as an 8086, you could run it once a second and it wouldn't matter which algorithm you use.
The real size of the set is probably somewhere between those two and much closer to the latter case than the former. But as far as I'm aware, we don't know.
Admin
Reads post, juggling around how I'd do it...
".....but Joel saw a lot of candidates overthink the problem. They would sort the lists...."
Oh sonnafa-*! In my defense, I like readability in my code, and it kinda depends on what type the timestamps are. If they're epoch seconds, just min / max the list and get it over with. If they're DateTime objects, Depending on language you might end up having to sort them to actually get the first / last one. :S :P
The article reminds me of the fun time in college in where I wrote an awesome little function in ASP 3.0 to figure out what checkboxes were checked on a page. The entire group was pretty impressed, but then I realised that I essentially re-implemented the split function.. ^_^
Admin
All that min/max requires is a comparator, and sort requires that too. (And sort still has higher complexity than min/max, so especially if the comparator is more complicated, that favours min/max even more.) It would be a strange language that has a DateTime comparator that it uses internally for sorting, but does not expose it. Not saying that strange languages don't exist ...
And how would sorting improve readability? You'd basically replace min() with sort().first (likewise with max), so the code only gets longer. (Regardless whether you prefer a compact expression as suggested in the article, or do the same with lots of temporary variables.)
Admin
also, what about times crossing one or more midnights
Admin
"really only need the first and last elements of the list." Just FYI, that advice conflicts with the condition given of "two unordered sets". If they're unordered, you can't just use the first and last elements.
Admin
How would one go about finding a source of timestamps that you cared about that didn't start sorted?
Admin
My first interpretation was „what even is the overlap between sets of timestamps?“ since it wouldn’t be clear from the specification as stated that the relevant range is from the minimum time stamp to the maximum time stamp. I wonder if the setting misled the applicants into thinking they are expected to, most of all, provide a solution quickly.
Application questions are tricky I guess. How much does it say about a candidate if they haven’t memorized the API? Probably nothing, though a Python applicant can probably be expected to know the min() and max() functions.
On the other hand I somewhat botched and interview once partly because I come from a background win science, where people don’t normally use SDKs, and thus I had never heard it pronounced (in English anyway). Having always read abbreviations unconsciously in German pronunciation I must have looked shortly confused at „ess dee kay“, since I only knew it as „ess deh kaa“ and misheard the abbreviation as an entirely different letter sequence. As an interviewer, how do you judge such reactions?
I facepalm though, when companies send HR people with some „paper programming“ quiz to a recruitment event.
Admin
The following is not always correct: "subtract the later start time (01:08:01) from the earlier end time (01:09:42) to get the overlap (01:09:42 - 01:08:01 = 00:01:41)". It only works if the earlier end time is before the later end time. It does not work if one interval is completely inside the other, and as such the overlap is the smaller interval.
Admin
If the granularity and time intervals are not too small and too large, how about just an array of bits? Build the array from the first list, scan it iwith the second, and youre done in two linear passes.
Admin
I think you may be misunderstanding the meaning of "earlier" and "later". Under what circumstances do you think the earlier end time might be after the later end time?
If interval A is inside interval B, then the earlier end time is interval A's end time, and the later start time is interval A's start time, so subtracting the latter from the former indeed gives you interval A as expected.
Admin
Okay, yes, it would be more efficient to only loop through the list once. But come on, if we really cared that much about efficiency, we wouldn't be writing in Python.
Admin
O(3N) is literally O(N). They're the same thing. So is O(124315N + 7) for that matter. That's how the O notation works.
The former boss' code is garbage and that's all there is to it. Even assuming the sort somehow always works in O(n) rather than O(n log n) because labeled_timestamps are always guaranteed to be just that nice, and even assuming that doing a pass itself is the dominant costly bit and so should ideally be done only once, you can just track the minimum and maximum and so get them in a single pass also while avoiding doing all the other guff.