The Daily WTF: Curious Perversions in Information Technology

Maciejasjmj · 2015-05-26 Reply Admin

Creating a pointer to the last byte of a c-string and walking backwards, depositing characters from the original string as you go, may win some points for creativity, but it's definitely not code you want to see in your iPad app.

Uh... isn't that the simplest and most obvious way to reverse a string, aside from library functions? I'm no ObjC wizard, but I see no real WTF here. Maybe the guy simply thought "well, it can be solved by a library one-liner, but since that's the whole task, maybe they want me to show actual effort".

Onyx · 2015-05-26 Reply Admin

You could do swapping, I guess? Would save some memory?

Then again, if you're reversing a string long enough for it to matter, what the hell are you even doing? Simulating RTL in the most stupid way ever?

accalia · 2015-05-26 Reply Admin

Onyx:
Then again, if you're reversing a string long enough for it to matter, what the hell are you even doing? Simulating RTL in the most stupid way ever?

there's a point....

"reverse this string for me"

"ok: U+202E+inputString"

Maciejasjmj · 2015-05-26 Reply Admin

Onyx:
You could do swapping, I guess? Would save some memory?

Only if you're told to reverse in-place, which might or might not be the case.

Onyx:
Then again, if you're reversing a string long enough for it to matter, what the hell are you even doing? Simulating RTL in the most stupid way ever?

Doing a coding challenge for an interview. They don't have to make sense, you know.

Onyx · 2015-05-26 Reply Admin

Maciejasjmj:
Doing a coding challenge for an interview. They don't have to make sense, you know.

Indeed. But then again:

it's definitely not code you want to see in your iPad app.

If it's just about that, what's with the comment? As in, it's expected to be used, then?

JBert · 2015-05-26 Reply Admin

Maybe we should ask @Yamikuronue if she remembers what the original text said.

Yamikuronue · 2015-05-26 Reply Admin

JBert:
what the original text said.

It explicitly called this out as an interview question

Maciejasjmj:
isn't that the simplest and most obvious way to reverse a string

The "correct" way of doing this, according to the submitter, is to create an NSMutableString and append substrings in reverse order to it. I don't know Objective-C, but dropping to C-style pointer arithmetic is TR :wtf: here.

Maciejasjmj · 2015-05-27 Reply Admin

Yamikuronue:
The "correct" way of doing this, according to the submitter, is to create an NSMutableString and append substrings in reverse order to it. I don't know Objective-C, but dropping to C-style pointer arithmetic is TR here.

Does it handle Unicode or otherwise wide-chars? If not... does it matter?

mjmilan1 · 2015-05-27 Reply Admin

Hang on though... The candidate has demonstrated pointer arithmetic and string encoding. He/she has now passed my "Is this person worth talking to" test...

I'd be tempted to have him redo the test to reverse a list of integers, and then in the interview ask him to justify his original approach to the problem with strings... You might well learn something interesting about the candidate and their approach.

Too many people look on interviews as a chance to simply reject people who do not think as they do, and this is a major mistake. I don't expect the developers I work with to be clones of me, but I do expect them to be able to justify themselves under examination...

flabdablet · 2015-05-27 Reply Admin

Yamikuronue:
I don't know Objective-C, but dropping to C-style pointer arithmetic is TRWTF here.

Mainly because it mangles UTF-8 encoding. If the copy loop were tweaked to deal with that, I'd expect it to run rather faster than a bunch of calls to a substring slicing function.

Jerome_Viveiros · 2015-05-27 Reply Admin

TRWTF is that anybody would reject a coder because they can write a loop and understand pointer arithmetic. There any many reasons to reject a candidate; being able to write code normally isn't one of them.

aliceif · 2015-05-27 Reply Admin

TRWTF is Objective-C.

That stuff looks very ugly.

JBert · 2015-05-27 Reply Admin

mjmilan1:
I'd be tempted to have him redo the test to reverse a list of integers, and then in the interview ask him to justify his original approach to the problem with strings... You might well learn something interesting about the candidate and their approach.

Exactly, though if Objective-C has some higher-level constructs he would have *quite* some explaining to do.

flabdablet:
Mainly because it mangles UTF-8 encoding. If the copy loop were tweaked to deal with that, I'd expect it to run rather faster than a bunch of calls to a substring slicing function.

The nice thing is that it immediately gives you leverage for the "there's a bug in your code" gambit and see how they react when they are proven wrong. Should they make no obvious bug, this is far harder to pull with a straight face. Meanwhile [Joel Spolsky tries to pull it anyway](http://www.joelonsoftware.com/articles/GuerrillaInterviewing3.html):

Inevitably, you will see a bug in their function. So we come to question five from my interview plan: “Are you satisfied with that code?” You may want to ask, “OK, so where’s the bug?” The quintessential Open Ended Question From Hell. All programmers make mistakes, there’s nothing wrong with that, they just have to be able to find them. With string functions in C, most college kids forget to null-terminate the new string. With almost any function, they are likely to have off-by-one errors. They will forget semicolons sometimes. Their function won’t work correctly on 0 length strings, or it will GPF if malloc fails… Very, very rarely, you will find a candidate that doesn’t have any bugs the first time. In this case, this question is even more fun. When you say, “There’s a bug in that code,” they will review their code carefully, and then you get to see if they can be diplomatic yet firm in asserting that the code is perfect.

Los_Homerosy · 2015-05-27 Reply Admin

Maciejasjmj:

Onyx:
You could do swapping, I guess? Would save some memory?

Only if you're told to reverse in-place, which might or might not be the case.

Don't think that it is even possible to reverse in-place since NSStrings are immutable afaik.

aliceif:
TRWTF is Objective-C.
That stuff looks very ugly.

Completly agree on this. Objective-C is on of those languages, I always feel the urge to regurgitate when I see it (Perl is another one).

Dogsworth · 2015-05-27 Reply Admin

aliceif:
TRWTF is Objective-C.
That stuff looks very ugly.

I'm glad someone called out the elephant in the room.

gnasher729 · 2015-05-27 Reply Admin

A minor problem is the method name. It doesn't reverse a string, it returns a reversed string. That's why it should be called "reversedString". Also, it is a method that is sent to some object. Why would you send a message to reverse a string to another string? That's pointless. So the method should not have an "original" parameter but be just declared as

- (NSString*)reversedString;

The first malloc is completely wrong. The length method of NSString returns the length in UTF-16 units. So if there are any non-ASCII characters, length will be less than the length of the UTF-8 encoded C string and the application will crash or worse.

Without that mistake, cVersionOfReversedString ends up one char before the malloc result, which is undefined behaviour. Likely harmless, but still undefined behaviour.

The whole loop using indexes instead of pointer arithmetic would be a lot shorter, a lot more readable, would have no undefined behaviour, and probably would be more efficient.

The whole method isn't going to work if there are any non-ASCII characters, because you'd need to keep the bytes in UTF-8 code points in their original order. You should actually look for Unicode grapheme clusters and leave their byte order unchanged.

If you work with UTF-16 characters, you need to check for characters in the range 0xD800 to 0xDFFF, which are actually two parts of a Unicode character outside the basic plane. Doing everything manipulating Objective-C strings is likely expensive. I'd probably write (ignoring grapheme clusters)

- (NSString*) reversedString
{
    NSUInteger length = self.length;
    NSUInteger i, j;

    if (length <= 1)
        return [self copy];

    unichar chars [length];
    [self getCharacters:buffer range:(NSRange) { .location = 0, .length = length }];

    // Fix for characters outside basic plane      
    for (i = 0; i < length; ++i) {
        if (chars [i] >= 0xd800 && chars [i] <= 0xdfff && i + 1 < length) {
           unichar tmp = chars [i]; chars [i] = chars [i+1]; chars [i+1] = tmp;
        }
    }

    for (i  = 0, j = length - 1; i < j; ++i, --j) {
        unichar tmp = chars [i]; chars [i] = chars [j]; chars [j] = tmp;
    }
    
    return [NSString stringWithCharacters:chars length:length];
}

Yamikuronue · 2015-05-27 Reply Admin

Public service announcement: If submitted WTFs looked more like ^ that post, my job would be a lot easier and the quality of articles would improve :)

Martijn_Hoekstra · 2015-05-27 Reply Admin

I hate the reverse a string question with a passion.

TRWTF is thinking you can meaningfully reverse a string without further knowledge. Do you just want to reverse the array of code units? The grapheme clusters? It depends what you are going to do with your reversed string.

What if the requirement is for grapheme clusters but the string contains codepoint sequences that don't form a proper cluster, but do when reversed, or orphaned surrogate code units in the case of UTF-16?

The real world use-case for reversed strings tends to be nothing, so it's difficult to get an answer to what you might want to do with the result to get to understand the requirements.

And how the hell do you ask these questions on an interview with out sounding like a pedant?

Buddy · 2015-05-27 Reply Admin

Paging @antiquarian, @ben_lubar, @captain.

tarunik · 2015-05-27 Reply Admin

Maciejasjmj:
Uh... isn't that the simplest and most obvious way to reverse a string, aside from library functions? I'm no ObjC wizard, but I see no real WTF here. Maybe the guy simply thought "well, it can be solved by a library one-liner, but since that's the whole task, maybe they want me to show actual effort"

Actually, the article's code is TR :wtf: Hint: feed it a snippet from Al-Jazeera's homepage (in that news network's native language, of course).

Jerome_Viveiros:
TRWTF is that anybody would reject a coder because they can write a loop and understand pointer arithmetic. There any many reasons to reject a candidate; being able to write code normally isn't one of them.

OTOH -- I wouldn't send them packing immediately upon seeing this for a solution. I'd just show them what it did to multibyte UTF-8 chars and then let them figure out what to do from there. ;)

gnasher729:
The first malloc is completely wrong. The length method of NSString returns the length in UTF-16 units. So if there are any non-ASCII characters, length will be less than the length of the UTF-8 encoded C string and the application will crash or worse.

Talk about adding injury to insult!

gnasher729:
You should actually look for Unicode grapheme clusters and leave their byte order unchanged.

Yep.

operagost · 2015-05-27 Reply Admin

I read that entire Spolsky article. Six hours of interviews? I would refuse their offer and send them a bill for my time.

bryan986 · 2015-05-27 Reply Admin

The response to that interview question is "It depends..."

dkf · 2015-05-27 Reply Admin

tarunik:
I'd just show them what it did to multibyte UTF-8 chars and then let them figure out what to do from there.

I'd be tempted to just say “That's broken; we won't continue the interview until you can say why” to their faces and let them see if they can figure it out.

tharpa · 2015-05-27 Reply Admin

JBert:
The nice thing is that it immediately gives you leverage for the "there's a bug in your code" gambit and see how they react when they are proven wrong. Should they make no obvious bug, this is far harder to pull with a straight face.

If an interviewer lied to me, and said my code had a bug when it didn't? Unless I really needed the job, I'd get out of there. Who wants to work for a lying dirtball if you don't have to?

Khudzlin · 2015-05-27 Reply Admin

They'd better invite you for lunch as well.

anotherusername · 2015-05-27 Reply Admin

Somebody correct me if I'm wrong here... getting 1 character from a string with a fixed character width is O(1). However, UTF-8 is a variable-width encoding, so substring is always an O(n) operation because it has to scan the string to find the substring's starting index.

If you put substring in a loop, reversing a UTF-8 string is O(n²). Setting a pointer to the last byte and scanning in reverse would be O(n), and then you just have to correct the byte order of any multibyte UTF-8 characters you encounter.

dkf · 2015-05-27 Reply Admin

anotherusername:
However, UTF-8 is a variable-width encoding, so substring is always an O(n) operation because it has to scan the string to find the substring's starting index.

No, there are ways to accelerate that. Not that the code in TFA was using any of them…

anotherusername · 2015-05-27 Reply Admin

I guess you could build an index of each character's byte offset, so then you only have the overhead of scanning the string once... but does the language do that for all UTF-8 strings by default (or the first time it's needed), or can it be enabled?

If the language doesn't have that built-in, then you're basically just creating an array of pointers to characters first, then starting at the end of that and scanning it in reverse. Still the same basic approach as the applicant used.

gnasher729 · 2015-05-27 Reply Admin

UTF-8 is specially designed to avoid that problem.

Bytes of the form 0xxx xxxx are single byte characters. Bytes of the form 10xx xxxx are continuation bytes. Bytes of the form 110x xxxx are the first byte of a two byte character. Bytes of the form 1110 xxxx are the first byte of a three byte character. Bytes of the form 1111 0xxx are the first byte of a four byte character.

To get the code point, you concatenate all the x's. A UTF-8 string must never contain invalid UTF-8 characters (that's why you cannot use strncpy with UTF-8 strings because the result could end with a partial UTF-8 character. Another rule is that each codepoint must be represented in the shortest possible way. So a 7 bit character may not be written as 1100 000x 10xx xxxx.

If you scan backward, a UTF-8 character starts with the first byte that isn't of the form 10xx xxxx, so that is easy enough.

ben_lubar · 2015-05-27 Reply Admin

Combining characters make reversing a string non-trivial.

antiquarian · 2015-05-27 Reply Admin

Buddy:
Paging @antiquarian, @ben_lubar, @captain.

Standard Front Page Troll™. Nothing to see here, please move along.

dkf · 2015-05-27 Reply Admin

ben_lubar:
Combining characters make reversing a string non-trivial.

Depending on what you mean by “reversing” and whether you know the string has been transformed into NFC or not.

Any candidate who calls the interviewer out for asking them to reverse a string without specifying what happens in all those cases is probably going to be worth hiring. :D

anotherusername · 2015-05-27 Reply Admin

I'm aware of this. It doesn't change the problem any. If you want to get character n you still have to scan the string once to find its starting byte, because it's not necessarily at byte n.

Kian · 2015-05-27 Reply Admin

So, the right answer to the question is asking "First, what encoding is the string in? UTF-8, UTF-16, UTF-32, or some particular code-page? Next, do you want me to do take the brain dead approach of just reversing the bytes? That would work for plain ASCII, and it takes 10 minutes. I could do a UTF-8 aware reversal and make sure that the codepoints don't get mangled in multibyte characters. That'll take me maybe 30 minutes, after I find the documentation for the leading bytes in UTF-8. Or I can do it properly, if you get me access to a coffee machine, the UTF standards, and a week without interruptions. Which I'll bill you for."

Martijn_Hoekstra · 2015-05-27 Reply Admin

That doesn't mean that to find the n'th character, you don't have to inspect n-1 characters first.

And you still may have to keep grapheme clusters together. And not accidentally create new grapheme clusters in the reversed string if there were non in the original string (this could happen with Korean characters for example, if an isolated vowel is followed by an isolated initial; for that case I can't think of any reasonable definition of reversing at all)

anotherusername · 2015-05-27 Reply Admin

Martijn_Hoekstra:
And not accidentally create new grapheme clusters in the reversed string if there were non in the original string (this could happen with Korean characters for example, if an isolated vowel is followed by an isolated initial; for that case I can't think of any reasonable definition of reversing at all)

Can you give an example (hex codes) of how this could happen?

CodeSlave · 2015-05-27 Reply Admin

As I recall; every non-trivial program has at least one bug in it. And the candidates understanding of the requirements may differ slightly from the interviewer's understanding of the requirements. The interview could even superstitiously flip their view around (e.g., "when fiz-buzz get's to 15, shouldn't it printout buzz-fiz instead?" - there was probably ambiguity in the requirements)

In the real world this is often treated as a bug (even though to the letter of the contract the code is correct - it still needs to be fixed, it's just a question of who pays for it).

Any morality problems can be resolved by changing the question to: "Are thee any bugs in your code? Or what could you do to improve it?"

Part of it is also seeing how they developer is going to react to a challenge to their work (legitimate or not). You don't want a developer who's going to say to a client that reports a bug (that's not really bug, perhaps just an unexpected but correct emergence of a complex system). Are they gong to scream at them and call them a lying sack of $#%@# $% or are they going to work the problem with the client?

gnasher729 · 2015-05-27 Reply Admin

A trivial case would be a single umlaut or accent, followed by a vowel, like ¨a or `e. These are two characters. But reverse them and you get ä or è. Single character, even if there are two code points. Reverse again, and it stays ä or è. So reversing twice doesn't produce the original string.

gnasher729 · 2015-05-27 Reply Admin

The point is that you just don't care what character n is. You care about "the character starting at byte n".

Martijn_Hoekstra · 2015-05-27 Reply Admin

anotherusername:

Martijn_Hoekstra:
And not accidentally create new grapheme clusters in the reversed string if there were non in the original string (this could happen with Korean characters for example, if an isolated vowel is followed by an isolated initial; for that case I can't think of any reasonable definition of reversing at all)

Can you give an example (hex codes) of how this could happen?

basically, any vowel-consonant combination in U+1100 to U+11FF.

For example, u+1161 isᅡand u+1100 is ᄀ The codepoint sequence u+1161u+1100 are two grapheme clusters: ᅡᄀ But reverse them as u+1100u+1161 and they become a single grapheme cluster: 가

narbat · 2015-05-27 Reply Admin

Martijn_Hoekstra:
And how the hell do you ask these questions on an interview with out sounding like a pedant?

Actually, if I were giving this exercise in an interview I'd be doing it either because I specifically wanted the candidate to ask those questions, or because I myself was unaware of the problem. Either way asking the questions would score points with me. Maybe that shows that I'm biased towards hiring pedants...

Captain · 2015-05-27 Reply Admin

It depends on the data structure used to implement strings, but I wouldn't typically suggest a functional approach to reversing a string, even though it's easy to write. The problem, basically, is that the a functional implementation must physically rebuild the string, in reverse. But what you really want is to keep the string as is, and look at it backwards. I.e., you want to traverse the string in reverse. This may or may not be possible, depending on the data structure (for one thing, it needs to know its own length, and must not be recursive)

dkf · 2015-05-27 Reply Admin

That'd potentially be a good approach, especially as part of a general system of stacked string transformers, but it does mean you have to have some way to simplify the stack of such operations so that you're not keeping around data that is unreachable or getting a build up of bunch of inverses of each other or effective-noops. Getting that right so that it's both nice and efficient is really quite a lot harder than it appears to be at first glance (and it's important; strings often get really long in real-world code).

Captain · 2015-05-27 Reply Admin

Getting that right so that it's both nice and efficient is really quite a lot harder than it appears to be at first glance

Yes it is. It's literally the word problem, and it's unsolvable in general (though it is potentially solvable for the specific group of operations on strings)

Protoman · 2015-05-27 Reply Admin

There's more weird Unicode edge cases, in addition to multibyte chars in UTF-8, surrogates in UTF-16, grapheme clusters, and combining characters. There's also things like left-to-right marks and right-to-left marks (LRM and RLM), involved in the bidirectional algorithm. Not quite sure if the naive string reversal algorithm handles those correctly, or if it would need to add, remove, or more any LRMs/RLMs.

Then there's characters which have different initial, medial, and final forms, like in Hebrew and Arabic, or the Greek final sigma. If we reverse a string containing an initial form, should that change into a final form, and vice-versa? The answer depends on your requirements. Of course, since reversing a string makes no sense linguistically, we probably don't care about if the result makes sense, so we should probably just naively preserve the forms.

Ragnax · 2015-05-27 Reply Admin

Martijn_Hoekstra:
I hate the reverse a string question with a passion.
TRWTF is thinking you can meaningfully reverse a string without further knowledge. Do you just want to reverse the array of code units? The grapheme clusters? It depends what you are going to do with your reversed string.

So? You ask them: "Is that in Unicode or basic 1 byte ASCII? Are we reversing by code units or grapheme clusters?" And then they either admit they don't know and you get to enlighten them and explain why it makes a difference ... or they pick one and you explain "well in that case..."

Martijn_Hoekstra · 2015-05-27 Reply Admin

Ragnax:
admit they don't know and you get to enlighten them and explain

I don't think this is the goal of a job interview, and something you'd rather want to avoid

Harry_Johnston · 2015-05-27 Reply Admin

Yamikuronue:
The "correct" way of doing this, according to the submitter, is to create an NSMutableString and append substrings in reverse order to it.

Seriously? How very inefficient.

Ragnax · 2015-05-27 Reply Admin

Martijn_Hoekstra:
I don't think this is the goal of a job interview, and something you'd rather want to avoid

Depends on how you bring this to table.

If you act like a pompous dick about it, you'll get shot/shut down for sure. On the other hand, you could politely ask about remaining requirements because the question has some ambiguity to it that could affect the way you'd choose to solve the problem.

If they then burn you for that; then you have a reason to not want to work for said company. Interviews go both ways, remember?

FrostCat · 2015-05-27 Reply Admin

Maciejasjmj:
Uh... isn't that the simplest and most obvious way to reverse a string, aside from library functions?

I didn't bother to read the code, but if you're reversing bytes and you wind up dealing with variable-length characters, you're going to have a bad day. (This probably doesn't apply to iOS, but nonetheless, it's indicative, possibly, of a poor thought process.)

Reversing the String, Belaboring the Point

Leave a comment on “Reversing the String, Belaboring the Point”