- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Uh... isn't that the simplest and most obvious way to reverse a string, aside from library functions? I'm no ObjC wizard, but I see no real WTF here. Maybe the guy simply thought "well, it can be solved by a library one-liner, but since that's the whole task, maybe they want me to show actual effort".
Admin
You could do swapping, I guess? Would save some memory?
Then again, if you're reversing a string long enough for it to matter, what the hell are you even doing? Simulating RTL in the most stupid way ever?
Admin
there's a point....
"reverse this string for me"
"ok: U+202E+inputString"
Admin
Only if you're told to reverse in-place, which might or might not be the case.
Doing a coding challenge for an interview. They don't have to make sense, you know.
Admin
Indeed. But then again:
If it's just about that, what's with the comment? As in, it's expected to be used, then?
Admin
Maybe we should ask @Yamikuronue if she remembers what the original text said.
Admin
It explicitly called this out as an interview question
The "correct" way of doing this, according to the submitter, is to create an NSMutableString and append substrings in reverse order to it. I don't know Objective-C, but dropping to C-style pointer arithmetic is TR :wtf: here.
Admin
Does it handle Unicode or otherwise wide-chars? If not... does it matter?
Admin
Hang on though... The candidate has demonstrated pointer arithmetic and string encoding. He/she has now passed my "Is this person worth talking to" test...
I'd be tempted to have him redo the test to reverse a list of integers, and then in the interview ask him to justify his original approach to the problem with strings... You might well learn something interesting about the candidate and their approach.
Too many people look on interviews as a chance to simply reject people who do not think as they do, and this is a major mistake. I don't expect the developers I work with to be clones of me, but I do expect them to be able to justify themselves under examination...
Admin
Mainly because it mangles UTF-8 encoding. If the copy loop were tweaked to deal with that, I'd expect it to run rather faster than a bunch of calls to a substring slicing function.
Admin
TRWTF is that anybody would reject a coder because they can write a loop and understand pointer arithmetic. There any many reasons to reject a candidate; being able to write code normally isn't one of them.
Admin
TRWTF is Objective-C.
That stuff looks very ugly.
Admin
Admin
Don't think that it is even possible to reverse in-place since NSStrings are immutable afaik.
Completly agree on this. Objective-C is on of those languages, I always feel the urge to regurgitate when I see it (Perl is another one).
Admin
I'm glad someone called out the elephant in the room.
Admin
A minor problem is the method name. It doesn't reverse a string, it returns a reversed string. That's why it should be called "reversedString". Also, it is a method that is sent to some object. Why would you send a message to reverse a string to another string? That's pointless. So the method should not have an "original" parameter but be just declared as
The first malloc is completely wrong. The length method of NSString returns the length in UTF-16 units. So if there are any non-ASCII characters, length will be less than the length of the UTF-8 encoded C string and the application will crash or worse.
Without that mistake, cVersionOfReversedString ends up one char before the malloc result, which is undefined behaviour. Likely harmless, but still undefined behaviour.
The whole loop using indexes instead of pointer arithmetic would be a lot shorter, a lot more readable, would have no undefined behaviour, and probably would be more efficient.
The whole method isn't going to work if there are any non-ASCII characters, because you'd need to keep the bytes in UTF-8 code points in their original order. You should actually look for Unicode grapheme clusters and leave their byte order unchanged.
If you work with UTF-16 characters, you need to check for characters in the range 0xD800 to 0xDFFF, which are actually two parts of a Unicode character outside the basic plane. Doing everything manipulating Objective-C strings is likely expensive. I'd probably write (ignoring grapheme clusters)
Admin
Public service announcement: If submitted WTFs looked more like ^ that post, my job would be a lot easier and the quality of articles would improve :)
Admin
I hate the reverse a string question with a passion.
TRWTF is thinking you can meaningfully reverse a string without further knowledge. Do you just want to reverse the array of code units? The grapheme clusters? It depends what you are going to do with your reversed string.
What if the requirement is for grapheme clusters but the string contains codepoint sequences that don't form a proper cluster, but do when reversed, or orphaned surrogate code units in the case of UTF-16?
The real world use-case for reversed strings tends to be nothing, so it's difficult to get an answer to what you might want to do with the result to get to understand the requirements.
And how the hell do you ask these questions on an interview with out sounding like a pedant?
Admin
Paging @antiquarian, @ben_lubar, @captain.
Admin
Actually, the article's code is TR :wtf: Hint: feed it a snippet from Al-Jazeera's homepage (in that news network's native language, of course).
OTOH -- I wouldn't send them packing immediately upon seeing this for a solution. I'd just show them what it did to multibyte UTF-8 chars and then let them figure out what to do from there. ;) Talk about adding injury to insult! Yep.Admin
I read that entire Spolsky article. Six hours of interviews? I would refuse their offer and send them a bill for my time.
Admin
The response to that interview question is "It depends..."
Admin
I'd be tempted to just say “That's broken; we won't continue the interview until you can say why” to their faces and let them see if they can figure it out.
Admin
Admin
They'd better invite you for lunch as well.
Admin
Somebody correct me if I'm wrong here... getting 1 character from a string with a fixed character width is O(1). However, UTF-8 is a variable-width encoding, so substring is always an O(n) operation because it has to scan the string to find the substring's starting index.
If you put substring in a loop, reversing a UTF-8 string is O(n²). Setting a pointer to the last byte and scanning in reverse would be O(n), and then you just have to correct the byte order of any multibyte UTF-8 characters you encounter.
Admin
No, there are ways to accelerate that. Not that the code in TFA was using any of them…
Admin
I guess you could build an index of each character's byte offset, so then you only have the overhead of scanning the string once... but does the language do that for all UTF-8 strings by default (or the first time it's needed), or can it be enabled?
If the language doesn't have that built-in, then you're basically just creating an array of pointers to characters first, then starting at the end of that and scanning it in reverse. Still the same basic approach as the applicant used.
Admin
UTF-8 is specially designed to avoid that problem.
Bytes of the form 0xxx xxxx are single byte characters. Bytes of the form 10xx xxxx are continuation bytes. Bytes of the form 110x xxxx are the first byte of a two byte character. Bytes of the form 1110 xxxx are the first byte of a three byte character. Bytes of the form 1111 0xxx are the first byte of a four byte character.
To get the code point, you concatenate all the x's. A UTF-8 string must never contain invalid UTF-8 characters (that's why you cannot use strncpy with UTF-8 strings because the result could end with a partial UTF-8 character. Another rule is that each codepoint must be represented in the shortest possible way. So a 7 bit character may not be written as 1100 000x 10xx xxxx.
If you scan backward, a UTF-8 character starts with the first byte that isn't of the form 10xx xxxx, so that is easy enough.
Admin
Combining characters make reversing a string non-trivial.
Admin
Standard Front Page Troll™. Nothing to see here, please move along.
Admin
Depending on what you mean by “reversing” and whether you know the string has been transformed into NFC or not.
Any candidate who calls the interviewer out for asking them to reverse a string without specifying what happens in all those cases is probably going to be worth hiring. :D
Admin
I'm aware of this. It doesn't change the problem any. If you want to get character n you still have to scan the string once to find its starting byte, because it's not necessarily at byte n.
Admin
So, the right answer to the question is asking "First, what encoding is the string in? UTF-8, UTF-16, UTF-32, or some particular code-page? Next, do you want me to do take the brain dead approach of just reversing the bytes? That would work for plain ASCII, and it takes 10 minutes. I could do a UTF-8 aware reversal and make sure that the codepoints don't get mangled in multibyte characters. That'll take me maybe 30 minutes, after I find the documentation for the leading bytes in UTF-8. Or I can do it properly, if you get me access to a coffee machine, the UTF standards, and a week without interruptions. Which I'll bill you for."
Admin
That doesn't mean that to find the n'th character, you don't have to inspect n-1 characters first.
And you still may have to keep grapheme clusters together. And not accidentally create new grapheme clusters in the reversed string if there were non in the original string (this could happen with Korean characters for example, if an isolated vowel is followed by an isolated initial; for that case I can't think of any reasonable definition of reversing at all)
Admin
Can you give an example (hex codes) of how this could happen?
Admin
In the real world this is often treated as a bug (even though to the letter of the contract the code is correct - it still needs to be fixed, it's just a question of who pays for it).
Part of it is also seeing how they developer is going to react to a challenge to their work (legitimate or not). You don't want a developer who's going to say to a client that reports a bug (that's not really bug, perhaps just an unexpected but correct emergence of a complex system). Are they gong to scream at them and call them a lying sack of $#%@# $% or are they going to work the problem with the client?
Admin
A trivial case would be a single umlaut or accent, followed by a vowel, like ¨a or `e. These are two characters. But reverse them and you get ä or è. Single character, even if there are two code points. Reverse again, and it stays ä or è. So reversing twice doesn't produce the original string.
Admin
The point is that you just don't care what character n is. You care about "the character starting at byte n".
Admin
basically, any vowel-consonant combination in U+1100 to U+11FF.
For example, u+1161 isᅡand u+1100 is ᄀ The codepoint sequence u+1161u+1100 are two grapheme clusters: ᅡᄀ But reverse them as u+1100u+1161 and they become a single grapheme cluster: 가
Admin
Actually, if I were giving this exercise in an interview I'd be doing it either because I specifically wanted the candidate to ask those questions, or because I myself was unaware of the problem. Either way asking the questions would score points with me. Maybe that shows that I'm biased towards hiring pedants...
Admin
It depends on the data structure used to implement strings, but I wouldn't typically suggest a functional approach to reversing a string, even though it's easy to write. The problem, basically, is that the a functional implementation must physically rebuild the string, in reverse. But what you really want is to keep the string as is, and look at it backwards. I.e., you want to traverse the string in reverse. This may or may not be possible, depending on the data structure (for one thing, it needs to know its own length, and must not be recursive)
Admin
That'd potentially be a good approach, especially as part of a general system of stacked string transformers, but it does mean you have to have some way to simplify the stack of such operations so that you're not keeping around data that is unreachable or getting a build up of bunch of inverses of each other or effective-noops. Getting that right so that it's both nice and efficient is really quite a lot harder than it appears to be at first glance (and it's important; strings often get really long in real-world code).
Admin
Yes it is. It's literally the word problem, and it's unsolvable in general (though it is potentially solvable for the specific group of operations on strings)
Admin
There's more weird Unicode edge cases, in addition to multibyte chars in UTF-8, surrogates in UTF-16, grapheme clusters, and combining characters. There's also things like left-to-right marks and right-to-left marks (LRM and RLM), involved in the bidirectional algorithm. Not quite sure if the naive string reversal algorithm handles those correctly, or if it would need to add, remove, or more any LRMs/RLMs.
Then there's characters which have different initial, medial, and final forms, like in Hebrew and Arabic, or the Greek final sigma. If we reverse a string containing an initial form, should that change into a final form, and vice-versa? The answer depends on your requirements. Of course, since reversing a string makes no sense linguistically, we probably don't care about if the result makes sense, so we should probably just naively preserve the forms.
Admin
So? You ask them: "Is that in Unicode or basic 1 byte ASCII? Are we reversing by code units or grapheme clusters?" And then they either admit they don't know and you get to enlighten them and explain why it makes a difference ... or they pick one and you explain "well in that case..."
Admin
I don't think this is the goal of a job interview, and something you'd rather want to avoid
Admin
Admin
Depends on how you bring this to table.
If you act like a pompous dick about it, you'll get shot/shut down for sure. On the other hand, you could politely ask about remaining requirements because the question has some ambiguity to it that could affect the way you'd choose to solve the problem.
If they then burn you for that; then you have a reason to not want to work for said company. Interviews go both ways, remember?
Admin
I didn't bother to read the code, but if you're reversing bytes and you wind up dealing with variable-length characters, you're going to have a bad day. (This probably doesn't apply to iOS, but nonetheless, it's indicative, possibly, of a poor thought process.)