The Daily WTF: Curious Perversions in Information Technology

2016-09-13 Reply Admin

This function could be useful as an inconspicuous alternative to the speedup loop. It takes an std::string with data and returns an empty string instead after taking an amount of time depending on the length of the input string.

In one word: "Brillant!"

2016-09-13 Reply Admin

Assuming that you want an uppercase copy of the string - passing the argument by value would make a lot of sense here... if they returned correct value from the function.

2016-09-13 Reply Admin

"Yes of course I tested it. I even tested it with the edge case. Bet you none of your permanent staff would think of making sure their "toupper" would work with a blank string."

bjolling · 2016-09-13 Reply Admin

In true "not a consultant" fashion, the "not a consultant" feels to threatened to realize there is no causality relation between "consultant" and "incompetent".

dkf · 2016-09-13 Reply Admin

And finally, they return out, the string variable declared at the top of the function, and never initialized.

I think that just means that the string uses the default constructor, so the behaviour is at least defined.

2016-09-13 Reply Admin

Indeed, the behavior is defined, the function will reliably always return an empty string (or throw std::bad_alloc).

2016-09-13 Reply Admin

Yes, it is. 'out' is guaranteed to be an empty string in this case. Still, a very nice WTF.

2016-09-13 Reply Admin

Where can I get these highly paid consultant jobs? I want to be paid for being an idiot, too!

2016-09-13 Reply Admin

You can't make subtle Easy Reader gags if they're typed in uppercase.

Steve_The_Cynic · 2016-09-13 Reply Admin

It does indeed mean just that, so out bloody well is initialised.

To an empty controlled sequence.

2016-09-13 Reply Admin

Some of us are stuck using compilers that predate that template, you insensitive clod! ;-)

2016-09-13 Reply Admin

Some of us are stuck using compilers that predate that template, you insensitive clod! ;-)

2016-09-13 Reply Admin

Quite separately, uppercasing a string char-by-char is incorrect unless it is guaranteed to be in ASCII (which it never is). As soon as you have ‘ß’, which appears say in Latin-1, you have a codepoint that should be uppercasing (according to unicode standard) to a sequence of two codepoints, ‘SS’.

2016-09-13 Reply Admin

‏ ‭‫⁮‪

2016-09-13 Reply Admin

You don't have to shout.

2016-09-13 Reply Admin

I didn't read today's easy reader gag. It was too long.

2016-09-13 Reply Admin

Then you should quit working for a large behemoth company that moves at the speed of government. Either that or read the newspaper while you compile ;-)

2016-09-13 Reply Admin

I read TDWTF while I compile. ;-)

2016-09-13 Reply Admin

Yes, but Doing It Wrong™ is implicit in using string or character functions that originated in C. You wouldn't want C++ to buck that tradition, would you?

2016-09-13 Reply Admin

@Melvar: Quite true (although in this example, Unicode 5.1 did introduce "ẞ", i.e. U+1E9E LATIN CAPITAL LETTER SHARP S; however, that's rarely used in practice, in my experience).

Another example would be the treatment of "i", "I", "ı", and "İ". Firstly, you need to take the locale into account (which the C++ STL does), because the uppercase form of "i" is "I" in most locales but is "İ" in Turkish and Azerbaijani locales. But in order to make sure that doing a toupper() + tolower() gives you back the original string as often as possible, you sometimes need to decompose it into <base character, combining character> instead of just <precomposed character>.

For example, tolower(<U+0130>) is the sequence [<U+0069>, <U+0307>], which is a lowercase "i" with a combining dot above, so that when you toupper it again, you get back [<U+0049>, <U+0307>], which is an uppercase "I" with a combining dot above. It's not the same code point sequence (it's now decomposed), but it's displayed identically to the original U+0130.

Related reading: http://www.moserware.com/2008/02/does-your-code-pass-turkey-test.html

2016-09-13 Reply Admin

Maybe this is why case insensitive user names seem to work so well. I wish they would make case insensitive passwords as well.

2016-09-13 Reply Admin

Or pass by const &

2016-09-13 Reply Admin

This function is not guaranteed to return an empty string. As soon as there is a single character that is negative and does not equal EOF, std::toupper may do what it wants. That's truely awesome work.

2016-09-13 Reply Admin

Wait a second, std::transform explicitly cannot be used to transform a string to uppercase in place, if cppreference is quoting the requirements correctly: http://en.cppreference.com/w/cpp/algorithm/transform

"unary_op and binary_op must not invalidate any iterators, including the end iterators, or modify any elements of the ranges involved."

Eh, it'd probably work, but if we're getting snarky about someone not using things correctly, we should be sure that we're pedantically correct about it.

2016-09-13 Reply Admin

Very confused about the WTF here. a) "I hate C++"? b) "I hate immutable strings?" c) "I hate all that irrelevant copying that C++ does, just like Java, except that it doesn't when you use move semantics?" (Not specified in this snippet, but still.) d) "Oh, look, there's a trivial bug where the function just returns "out," a value that has nothing to do with the input and is basically initialised to the default value for string." I'm guessing it's supposed to be (d) with a whole heap of bitterness about (a) through (c). Now then. Leaving this particular (admitted) excrescence to one side, how exactly would Remy suggest implementing a "toupper" method for a string in any (natural) language, in any (programming) language, given any (ISO compliant) character encoding? Bit of a big ask there, I would think.

2016-09-13 Reply Admin

Strings are often copy-on-write so that paragraph about copying them every time is incorrect, not to mention other compiler optimizations.

2016-09-13 Reply Admin

@SolePurposeOfVisit c) Except that Java doesn't do any of that copying because it passes pointers by ref for parameters and has a much easier time telling what's a local and what's not...

2016-09-13 Reply Admin

Except c++11 iterator constraints (section 21) make a copy-on-write implementation impractical or even impossible without violating the standard.

2016-09-14 Reply Admin

Pretty sure it's allowed, see the example on that very page. It seems confusing at first sight, but consider that it's not unary_op (here, toupper) that modifies the elements in the range, but rather returns the new value, and std::transform actually modifies the range. So what this means is that unary_op must not do additional modifications to the elements directly (cf. the stronger pre-C++11 restriction "must not have side effects"). That seems to allow std::transform to do caching, prefetching, delayed writing, or operate out of order or in parallel, etc.

2016-09-14 Reply Admin

"Rarely used in practice" until you run your code in Germany. The word "street" in German is "Straße", so probably two thirds of all addresses in Germany have a ß.

2016-09-14 Reply Admin

One useful function is std::toupper. Given a char, it will turn that char into an upper-case version, in a locale-aware fashion.

That isn't really possible. If it acts on a single byte (char datatype), it doesn't know if the byte is the first, second, third or fourth byte of a character, so it can't be locale-aware in deciding what to do with the byte. You HAVE to process entire strings if you want to convert each character in a locale-aware fashion.

If you're acting on wchar_t instead, and if wchar_t is wide enough so each character of your locale fits in a single wchar_t, then std::toupper can, when given a wchar_t, turn that into an upper-case version in a locale-aware fashion.

Consider one of the comments too:

Unicode 5.1 did introduce "ẞ", i.e. U+1E9E LATIN CAPITAL LETTER SHARP S

Right, the result of upper-casing a single byte ß doesn't fit in a single byte even in the Western European locale. If you process char by char you'll still screw it up.

2016-09-14 Reply Admin

Remy Porter (google) in reply to Ex-lurker

You don't have to shout.

Yes he did. In this thread, he had to upper-case everything, and he did it, even when processing one byte at a time (char datatype) in a locale-aware fashion. Furthermore he even produced the correct result. He even read the Easy Reader Version before posting.

2016-09-14 Reply Admin

Man, I've been out of C++ for a LOOOOONG time.

You mean long long. Someday to be maxint maxint.

2016-09-14 Reply Admin

Um, no. That would make two copies of the string. One for the input parameter and one for the return value,

The correct way to pass the parameter would be via. a const reference.

2016-09-14 Reply Admin

Looks like all the people who don't quite understand C++ have turned up to comment today.

2016-09-14 Reply Admin

The real WTF would be using multi-byte chars instead of wchar_t to represent a string in memory.

Multi-byte chars was an experiment that failed sometime back in the 1990s when people realized it was a genuinely stupid idea.

If you use wchar_t for "text" data in memory then the iterators (and everything else_ will work perfectly, praise be to Bjarne!

2016-09-14 Reply Admin

You're already going to have to create multiple copies of the string to get this done (strings are immutable). Why make it make yet another one that accomplishes nothing?

2016-09-14 Reply Admin

And by doing so makes it so that it changes the caller's string with no way to tell it that you didn't actually want it to change what you were holding onto. Yep, have fun with that "friendly" language of yours. I'll take C++'s syntax any day.

2016-09-14 Reply Admin

The real WTF is to believe that wide characters help you in any way.

In reality, it is just impossible to do anything useful with single Unicode code points. Strings must be processed as a whole. There are plenty of characters that require more than a single code point. There are Emojis that take 7 or more code points. There are identical characters with more than one representation using one, two or three code points.

2016-09-14 Reply Admin

Multibyte characters (known as multichar characters in C ^_^) were made in the 1980's if not earlier. So were wide characters (wchar_t). In those days both encodings were locale specific. Most locales included copies of most of ASCII but Japanese couldn't represent some characters of French, Chinese, Korean, Swedish, etc., and vice-versa.

Then the C standard made it possible (not recommended but legal) for wchar_t to be a single byte, so one experiment that had been muddling along up to that point suddenly failed.

Microsoft hit on the idea of using Unicode for wchar_t, which requires tables to convert between each locale and wchar_t instead of simply shifting and or'ing, but it was a good idea and it worked for a while.

Then it turned out that 16 bits weren't enough and Microsoft's idea had to be turned into UTF-16 (multiwchar_t characters), so yet another failure.

This is software. EVERY experiment has failed.

2016-09-15 Reply Admin

Strings are often copy-on-write so that paragraph about copying them every time is incorrect, not to mention other compiler optimizations.Sadly not. The C++11 rules about invalidating iterators mean that this is not allowed. For example:


std::string a = "Illegal";
std::string b = a;  /// COW possibility
auto iter = b.begin();
b[0] = 'X';  /// *Must not* invalidate 'iter' by rule
std::cout << *it << "\n";  // Should print 'X' but won't if you COW.

unstd::toupper

Leave a comment on “unstd::toupper”