• Stephen Cleary (unregistered)

    Await is not a blocking operation.

    The ReadLineAsync does block, but the reason has nothing to do with await.

  • (nodebb)

    Parallel code is always faster if you only use a single thread at a time! <facepalm/>

  • LZ79LRU (unregistered)

    You know. This level of incompetence reminds me of the time I took it upon my self to deliberately use my decades of experience to write the worst code I could as an exercise. At least he is not using exception handlers with function pointers as the error code as a messaging system so that the entire program can have no conventional method calls.

  • (nodebb)

    Nice there's a lot to dig into this one (sorry, Remy, you are going to be mention in this one too :D).

    .net has exactly three different ways to do concurrency:

    • Threading (good ol' threads, I guess there's no need to explain them)
    • Parellelism (Task objects which are queued super efficiently in a specific thread pool and can be chained together)
    • Asynchronous (Lightweight state machines that allow to wait for a result of a Task without blocking anything else).

    Now obviously the last one is tricky because there's again three ways to do it correctly, but let's focus on the async await pattern, which is pretty common everywhere right now.

    Of course, we await the results of that method in the same line we call it await is a blocking operation, so… not asynchronous at all. This is actually not the case at all. The only "thing" that is blocked is the awaited Task until it has a result or fails with at least one exception.

    Now let's check the code:

    using (var fileStream = File.Open(filePath, FileMode.Open, FileAccess.Read))
    

    Classic beginner mistake, this opens a file stream synchronous. You need to actually open a file stream with FileOptions.Asynchronous or all async Methods will actually be blocking. They did it to keep legacy behavior consistent hence I personally think the Stream class is by far the worst example of implementing concurrency (BTW it also feature methods for the two other leagcy ways to implement asynchonous behavior).

    Parallel.ForEach
    

    There is a simple rule of thumb, don't mix those three ways together. If you don't know what your are doing, it will lead to a maintenace horror, if you know what you are doing then you will do everything not to end up mixing them to avoid a maintenace horror.

    For non-.net coders, here this persons enqueues in the managed thread pool a new Task in an asynchronous state machine. It already sounds bad spelling it out.

    To be continued...

    Addendum 2024-07-04 09:37: Quote and response got mixed together, my bad.

  • (nodebb)

    Continuation of the ... eh, am I building a Task chain here, noice, thanks WTF post limit :-)

    lock (lockObject)
    

    This is now the first way to do things, threading. It basically locks the thread of the Task which was enqueued in a managed thread pool which was created in an asynchronous state machine. I mean... yeah. It sounds like nonsense, it literally is. Now we have all three ways of concurrency mixed together and in a best case, this will not instantly lock because everything ends up on the same thread by chance.

    For those not familiar with .net, you never every use the lock keyword (which uses a Monitor object for thread synchronization), but the correct way to do this is by using a SemaphoreSlim(1,1).

    Beyond the obvious concurrency nonsense, here a fun extra bit:

    Regex.Replace(line, "[^а-яА-я \\dictionary]", "").ToLower();
    

    This code generates in an iteration an instruction set for a constant expression. Look, it could be worst, it could be compiled as well for extra brownie points, but let's be frank these days you use code generated Regex via attributes and in the old days this would be a static field with a compiled Regex. However considering the rest of the horror showing off a complete lack of understanding basically anything, well, it's not that big of a deal, I guess :-)

    Addendum 2024-07-04 09:41: Correction: You don't use the lock keyword in an asynchronous context.

    BTW if you check how the SemaphoreSlim is implemented with a Monitor object, it becomes very obvious way (spoiler: deadlocks, race conditions and thread switching).

  • (nodebb)

    Yeah. Essentially our candidate played "buzzword bingo" with each of the gizmos .NET provides for parallelism.

    And left off the one thing that might have actually provided a tiny speed-up by opening the stream synchronously. In bingo terms that might be like leaving the "Free" square in the middle blank and wondering why you never win. ;)

  • Tim (unregistered)

    in my experience there's two approaches to multi-threading.

    If it's a simple "parallel foreach" where from the simplest code inspection, the threads are self-evidently independent, you can pretty much trust the code to work , although though there are a number of environmental reasons why it might not multi-thread so it's still work putting in a bit of logging just to make sure

    Anything else, test the hell out of it using sleep statements and injecting exceptions to force every possible race condition or failure mode, otherwise you'll never find the bugs until it's too late.

  • (nodebb) in reply to WTFGuy

    True, I didn't want to go into performance overall because especially when you implement the text parser correctly it's not a performance relevant part of the whole code - except perhaps if we are talking a gigabyte text file. It would make way more sense to to actually run the whole methods in an Task.WhenAll() context and paralize there.

    Side note: I just noticed there is a significant lack of cancellation tokens in this code, which means the code is not Kubernetes ready. But maybe this was long before virtualization was a thing. Or responsiveness in general :-)

  • (nodebb)

    The important thing here is that reading lines from a file is by definition a sequential operation - there is no way to "parallelize" it.

    If processing these lines is costly, (it's unclear from the code snippet if that's actually the case, but let's go with this premise) then the processing loop can be parallelized somehow.

    To me, this shows TRWTF: lack of understanding of fundamental computer concepts.

  • (nodebb)

    Okay, last one for this article (not a promise, more like a NY resolution), let's talk about this:

        if (string.IsNullOrEmpty(filePath))
        {
            throw new ArgumentNullException(nameof(filePath));
        }
    
        if (Path.GetExtension(filePath)!=".txt")
        {
            throw new ArgumentException("Invalid filetype");
        }
    

    As already mentioned, async methods are actually state machines and this means the code will only be called when executed out of the thread pool. In other words any validation methods may be also executed long after the Task object was created. Generally you should avoid validations like this in asynchronous methods, but in case it's a public contract there's a trick how to do this properly and it looks like this:

    public Task<IDictionary<string, int>> ParseTextAsync(string filePath, ParamSortDic dic)
    {
        if (string.IsNullOrEmpty(filePath))
        {
            throw new ArgumentNullException(nameof(filePath));
        }
    
        if (Path.GetExtension(filePath)!=".txt")
        {
            throw new ArgumentException("Invalid filetype");
        }
    
        return ParseTextAsync(filePath, dic);
    
        static async Task<IDictionary<string, int>> ParseTextAsync(string filePath, ParamSortDic dic)
        {
            // Rest of the code here
        }
    }
    

    This code really is the gift that keeps on giving.

  • (nodebb) in reply to MaxiTB

    Did you really mean to have both methods with the same name?

  • Aleksei (unregistered)

    MaxiTB, thank you for such a high evaluation, I am really flattered to see my old submission becoming "a classic WTF"!

    Did you not miss another gem regarding the regex? It was supposed to remove all non-cyrillic characters from each line (I guess?), and perhaps, should have left the word "\dictionary" untouched (no idea, why). In reality, it would convert "unpredictable canary" into "nricta canary", and they would even bypass the filter .Where(ws => ws.Length >= 4). The bottomless pit of WTF.

  • Marcel (unregistered)

    I like the “if file path is null or empty then make sure to put the path in the exception”

  • (nodebb) in reply to Steve_The_Cynic

    Yeah, it's an inner method. I generally name those in this situation the same. It's fully qualified name ends after all in ParseTextAsync.ParseTextAsync which I personally find very descriptive in this case. Plus I pretty much copied how the netcore did it ;-)

  • (nodebb) in reply to MaxiTB

    OK. TIL. (C# is not my thing. I've dabbled a little, but no more than that.) Although "inner method" brought back memories (and a creepy-crawly feeling up my spine) from my days doing stuff in Turbo Pascal, although they neutered some of the worst excesses of "nested procedures" compared to standard Pascal.

    Anyway, thanks.

  • sokobahn (unregistered)

    Regexp "[^а-яА-я \dictionary]" should be "[^а-яА-Я \dictionary]" . Looking at https://en.wikipedia.org/wiki/Cyrillic_(Unicode_block) it happens that А-я covers both lower and uppercase (unlike a-zA-Z) but serves only Russian and Bulgarian alphabets (other Cyrillic have characters outside those 0x410-42F parts.)

  • see sharp (unregistered)

    Candidate demonstrates awareness of the relevant keywords and primitives, but shows no evidence of understanding when, where, and how to apply them to a given problem.

  • markm (unregistered)

    Why would you "read a text file in parallel"? If the code actually did that, it would be reading a device by skipping back and forth through the file, and reassembling the results in memory. For most devices, wouldn't this be much slower than just reading the file linearly?

Leave a comment on “Classic WTF: For Each Parallel”

Log In or post as a guest

Replying to comment #:

« Return to Article