A Pair of Loops

Alexandra inherited a codebase that, if we're being kind, could be called "verbose". Individual functions routinely cross into multiple thousands of lines, with the longest single function hitting 4,000 lines of code.

Very little of this is because the problems being solved are complicated, and much more of it is because people don't understand how anything works.

For example, in this C++ code, they have a vector of strings. The goal is to create a map where the keys are the strings from the vector, and the values are more strings, derived from a function call.

Essentially, what they wanted was:

for (std::string val : invec)
{
    umap[val] = lookupValue(val);
}

This would have been the sane, obvious way to do things. That's not what they did.

unordered_map<string, string> func(vector<string> invec)
{
    unordered_map<string, string> umap;
    vector<pair<string, string*> idxvec;
    for(string name : invec)
    {
        umap[name] = "";
        idxvec.push_back(make_pair(name, &umap[name]));
    }   

    for(auto thingy : idxvec)
    {
        //actual work, including assigning the string
        thingy.get<1>() = lookupValue(thingy.get<0>()); 
    }
    return umap;
}

I won't pick on names here, as they're clearly anonymized. But let's take a look at the approach they used.

They create their map, and then create a new vector- a vector which is a pair<string, string*>- a string and a pointer to a string. Already, I'm confused by why any of this is happening, but let's press on and hope it becomes clear.

We iterate across our input vector, which this I get. Then we create a key in the map and give it an empty string as a value. Then we create a pair out of our key and our pointer to that empty string. That's how we populate our idxvec vector.

Once we've looped across all the values once, we do it again. This time, we pull out those pairs, and set the value at the pointer equal to the string returned by lookup value.

Which leads us all to our favorite letter of the alphabet: WHY?

I don't know. I also am hesitant to comment to much on the memory management and ownership issues here, as with the anonymization, there may be some reference management that got lost. But the fact that we're using bare pointers certainly makes this code more fraught than it needed to be. And, given how complex the STL data structures can be, I think we can also agree that passing around bare pointers to memory inside those structures is a recipe for disaster, even in simple cases like this.

What I really enjoy is that they create a vector of pairs, without ever seeming to understand that a list of pairs is essentially what a map is.

In conclusion: can we at least agree that, from now on, we won't iterate across the same values twice? I think about 15% of WTFs would go away if we all followed that rule.

Oh, wait, no. People who could understand rules like that aren't the ones writing this kind of code. Forget I said anything.

[Advertisement] Keep all your packages and Docker containers in one place, scan for vulnerabilities, and control who can access different feeds. ProGet installs in minutes and has a powerful free version with a lot of great features that you can upgrade when ready.Learn more.