- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
This function could be useful as an inconspicuous alternative to the speedup loop. It takes an std::string with data and returns an empty string instead after taking an amount of time depending on the length of the input string.
In one word: "Brillant!"
Admin
Assuming that you want an uppercase copy of the string - passing the argument by value would make a lot of sense here... if they returned correct value from the function.
Admin
"Yes of course I tested it. I even tested it with the edge case. Bet you none of your permanent staff would think of making sure their "toupper" would work with a blank string."
Admin
In true "not a consultant" fashion, the "not a consultant" feels to threatened to realize there is no causality relation between "consultant" and "incompetent".
Admin
I think that just means that the string uses the default constructor, so the behaviour is at least defined.
Admin
Indeed, the behavior is defined, the function will reliably always return an empty string (or throw std::bad_alloc).
Admin
Yes, it is. 'out' is guaranteed to be an empty string in this case. Still, a very nice WTF.
Admin
Where can I get these highly paid consultant jobs? I want to be paid for being an idiot, too!
Admin
You can't make subtle Easy Reader gags if they're typed in uppercase.
Admin
It does indeed mean just that, so
out
bloody well is initialised.To an empty controlled sequence.
Admin
Some of us are stuck using compilers that predate that template, you insensitive clod! ;-)
Admin
Some of us are stuck using compilers that predate that template, you insensitive clod! ;-)
Admin
Quite separately, uppercasing a string char-by-char is incorrect unless it is guaranteed to be in ASCII (which it never is). As soon as you have ‘ß’, which appears say in Latin-1, you have a codepoint that should be uppercasing (according to unicode standard) to a sequence of two codepoints, ‘SS’.
Admin
Admin
You don't have to shout.
Admin
I didn't read today's easy reader gag. It was too long.
Admin
Then you should quit working for a large behemoth company that moves at the speed of government. Either that or read the newspaper while you compile ;-)
Admin
I read TDWTF while I compile. ;-)
Admin
Yes, but Doing It Wrong™ is implicit in using string or character functions that originated in C. You wouldn't want C++ to buck that tradition, would you?
Admin
@Melvar: Quite true (although in this example, Unicode 5.1 did introduce "ẞ", i.e. U+1E9E LATIN CAPITAL LETTER SHARP S; however, that's rarely used in practice, in my experience).
Another example would be the treatment of "i", "I", "ı", and "İ". Firstly, you need to take the locale into account (which the C++ STL does), because the uppercase form of "i" is "I" in most locales but is "İ" in Turkish and Azerbaijani locales. But in order to make sure that doing a toupper() + tolower() gives you back the original string as often as possible, you sometimes need to decompose it into <base character, combining character> instead of just <precomposed character>.
For example, tolower(<U+0130>) is the sequence [<U+0069>, <U+0307>], which is a lowercase "i" with a combining dot above, so that when you toupper it again, you get back [<U+0049>, <U+0307>], which is an uppercase "I" with a combining dot above. It's not the same code point sequence (it's now decomposed), but it's displayed identically to the original U+0130.
Related reading: http://www.moserware.com/2008/02/does-your-code-pass-turkey-test.html
Admin
Maybe this is why case insensitive user names seem to work so well. I wish they would make case insensitive passwords as well.
Admin
Or pass by const &
Admin
This function is not guaranteed to return an empty string. As soon as there is a single character that is negative and does not equal EOF, std::toupper may do what it wants. That's truely awesome work.
Admin
Wait a second, std::transform explicitly cannot be used to transform a string to uppercase in place, if cppreference is quoting the requirements correctly: http://en.cppreference.com/w/cpp/algorithm/transform
"unary_op and binary_op must not invalidate any iterators, including the end iterators, or modify any elements of the ranges involved."
Eh, it'd probably work, but if we're getting snarky about someone not using things correctly, we should be sure that we're pedantically correct about it.
Admin
Very confused about the WTF here. a) "I hate C++"? b) "I hate immutable strings?" c) "I hate all that irrelevant copying that C++ does, just like Java, except that it doesn't when you use move semantics?" (Not specified in this snippet, but still.) d) "Oh, look, there's a trivial bug where the function just returns "out," a value that has nothing to do with the input and is basically initialised to the default value for string." I'm guessing it's supposed to be (d) with a whole heap of bitterness about (a) through (c). Now then. Leaving this particular (admitted) excrescence to one side, how exactly would Remy suggest implementing a "toupper" method for a string in any (natural) language, in any (programming) language, given any (ISO compliant) character encoding? Bit of a big ask there, I would think.
Admin
Strings are often copy-on-write so that paragraph about copying them every time is incorrect, not to mention other compiler optimizations.
Admin
@SolePurposeOfVisit c) Except that Java doesn't do any of that copying because it passes pointers by ref for parameters and has a much easier time telling what's a local and what's not...
Admin
Except c++11 iterator constraints (section 21) make a copy-on-write implementation impractical or even impossible without violating the standard.
Admin
Pretty sure it's allowed, see the example on that very page. It seems confusing at first sight, but consider that it's not unary_op (here, toupper) that modifies the elements in the range, but rather returns the new value, and std::transform actually modifies the range. So what this means is that unary_op must not do additional modifications to the elements directly (cf. the stronger pre-C++11 restriction "must not have side effects"). That seems to allow std::transform to do caching, prefetching, delayed writing, or operate out of order or in parallel, etc.
Admin
"Rarely used in practice" until you run your code in Germany. The word "street" in German is "Straße", so probably two thirds of all addresses in Germany have a ß.
Admin
That isn't really possible. If it acts on a single byte (char datatype), it doesn't know if the byte is the first, second, third or fourth byte of a character, so it can't be locale-aware in deciding what to do with the byte. You HAVE to process entire strings if you want to convert each character in a locale-aware fashion.
If you're acting on wchar_t instead, and if wchar_t is wide enough so each character of your locale fits in a single wchar_t, then std::toupper can, when given a wchar_t, turn that into an upper-case version in a locale-aware fashion.
Consider one of the comments too:
Right, the result of upper-casing a single byte ß doesn't fit in a single byte even in the Western European locale. If you process char by char you'll still screw it up.
Admin
Yes he did. In this thread, he had to upper-case everything, and he did it, even when processing one byte at a time (char datatype) in a locale-aware fashion. Furthermore he even produced the correct result. He even read the Easy Reader Version before posting.
Admin
You mean long long. Someday to be maxint maxint.
Admin
Um, no. That would make two copies of the string. One for the input parameter and one for the return value,
The correct way to pass the parameter would be via. a const reference.
Admin
Looks like all the people who don't quite understand C++ have turned up to comment today.
Admin
The real WTF would be using multi-byte chars instead of wchar_t to represent a string in memory.
Multi-byte chars was an experiment that failed sometime back in the 1990s when people realized it was a genuinely stupid idea.
If you use wchar_t for "text" data in memory then the iterators (and everything else_ will work perfectly, praise be to Bjarne!
Admin
You're already going to have to create multiple copies of the string to get this done (strings are immutable). Why make it make yet another one that accomplishes nothing?
Admin
And by doing so makes it so that it changes the caller's string with no way to tell it that you didn't actually want it to change what you were holding onto. Yep, have fun with that "friendly" language of yours. I'll take C++'s syntax any day.
Admin
The real WTF is to believe that wide characters help you in any way.
In reality, it is just impossible to do anything useful with single Unicode code points. Strings must be processed as a whole. There are plenty of characters that require more than a single code point. There are Emojis that take 7 or more code points. There are identical characters with more than one representation using one, two or three code points.
Admin
Multibyte characters (known as multichar characters in C ^_^) were made in the 1980's if not earlier. So were wide characters (wchar_t). In those days both encodings were locale specific. Most locales included copies of most of ASCII but Japanese couldn't represent some characters of French, Chinese, Korean, Swedish, etc., and vice-versa.
Then the C standard made it possible (not recommended but legal) for wchar_t to be a single byte, so one experiment that had been muddling along up to that point suddenly failed.
Microsoft hit on the idea of using Unicode for wchar_t, which requires tables to convert between each locale and wchar_t instead of simply shifting and or'ing, but it was a good idea and it worked for a while.
Then it turned out that 16 bits weren't enough and Microsoft's idea had to be turned into UTF-16 (multiwchar_t characters), so yet another failure.
This is software. EVERY experiment has failed.
Admin
Strings are often copy-on-write so that paragraph about copying them every time is incorrect, not to mention other compiler optimizations.Sadly not. The C++11 rules about invalidating iterators mean that this is not allowed. For example:
std::string a = "Illegal"; std::string b = a; /// COW possibility auto iter = b.begin(); b[0] = 'X'; /// *Must not* invalidate 'iter' by rule std::cout << *it << "\n"; // Should print 'X' but won't if you COW.