It's a holiday in the US today, where we celebrate with a lot of explosive-induced accidents and emergencies. Instead of holding a cherry bomb until it's too late, let's instead look at some explosively parallel code. Original. --Remy
Parallel programming is hard. For all the advancements and tweaks we've made to our abstractions, for all the extra cores we've shoved into every CPU, deep down, software still carries the bias of the old uni-tasking model.
Aleksei P works on a software package that is heavily parallel. As such, when interviewing, he talks to candidates about their experience with .NET's Task
objects and the async
/await
keywords.
One candidate practically exploded with enthusiasm when asked. "I've just finshed a pretty large text processing project that reads a text file in parallel!" They whipped out a laptop, pulled up the code, and proudly showed it to Aleksei (and gave Aleksei a link to their repo for bonus points).
public async Task<IDictionary<string, int>> ParseTextAsync(string filePath, ParamSortDic dic)
{
if (string.IsNullOrEmpty(filePath))
{
throw new ArgumentNullException(nameof(filePath));
}
if (Path.GetExtension(filePath)!=".txt")
{
throw new ArgumentException("Invalid filetype");
}
Dictionary<ParamSortDic, Func<IDictionary<string, int>, IOrderedEnumerable<KeyValuePair<string, int>>>> sorting =
new Dictionary<ParamSortDic, Func<IDictionary<string, int>, IOrderedEnumerable<KeyValuePair<string, int>>>>
{
{ParamSortDic.KeyAsc,word=> word.OrderBy(ws=>ws.Key)},
{ParamSortDic.KeyDesc,word=>word.OrderByDescending(ws=>ws.Key)},
{ParamSortDic.ValueAsc,word=>word.OrderBy(ws=>ws.Value)},
{ParamSortDic.ValueDesc,word=>word.OrderByDescending(ws=>ws.Value)},
};
var wordCount = new Dictionary<string, int>();
object lockObject = new object();
using (var fileStream = File.Open(filePath, FileMode.Open, FileAccess.Read))
{
using (var streamReader = new StreamReader(fileStream))
{
string line;
while ((line = await streamReader.ReadLineAsync()) != null)
{
var lineModifyLower = Regex
.Replace(line, "[^а-яА-я \\dictionary]", "")
.ToLower();
var words = lineModifyLower
.Split(Separators, StringSplitOptions.RemoveEmptyEntries)
.Where(ws => ws.Length >= 4);
Parallel.ForEach(words, word =>
{
lock (lockObject)
{
if (wordCount.ContainsKey(word))
{
wordCount[word] = wordCount[word] + 1;
}
else
{
wordCount.Add(word, 1);
}
}
});
}
}
}
return sorting[dic](wordCount).ToDictionary(k => k.Key, k => k.Value);
}
There's so much to dislike about this code. Most of it is little stuff- it's painfully nested, I don't like methods which process file data also being the methods which manage file handles. Generics which look like this: Dictionary<ParamSortDic, Func<IDictionary<string, int>, IOrderedEnumerable<KeyValuePair<string, int>>>>
are downright offensive.
But all of that's little stuff, in the broader context here. You'll note that the file is read using ReadLineAsync
, which is asynchronous. Of course, we await
the results of that method in the same line we call it. await
is a blocking operation, so… not asynchronous at all.
Of course, that's a trend with this block of code. Note that the words on each line are processed in a Parallel.ForEach
. And the body of that ForEach
starts by creating a lock, guaranteeing that only one thread is ever going to enter that block at the same time.
Suffice to say, Aleksei didn't recommend that candidate.