• LCrawford (unregistered)

    Unless I'm missing something, "await Frist.ReadAsync()" is asynchronous - it frees a thread although indeed mainline execution is blocked.

  • Bradley (unregistered)

    I am curious as to what the author would replace "Dictionary<ParamSortDic, Func<IDictionary<string, int>, IOrderedEnumerable<KeyValuePair<string, int>>>>" with, if he finds it so offensive.

  • William F (unregistered)

    Did this get shared on the C# Reddit or something? What's with the hater brigade? Do you really think Jon Skeet would write code like this?

    The point of the function is to do parallel processing. "Awaiting" each line blocks the function. Sure, other stuff can happen in other code, but not any "parallel" processing in this function. From the point of view of the function itself, await is blocking. It's not like yield, it's "yield and don't wake up unless I have my value."

    The four-deep generic type is a code smell. The code should be rewritten to do processing in steps and convert between simpler intermediate data structures. Or at least that's what I would do, and I am pretty sure it would be a lot more readable than this hot mess (and maybe actually parallel!)

  • Supersonic Tumbleweed (unregistered)

    For the love of cores, please benchmark your code. It would be evident then.

  • Tom (unregistered)

    await is a blocking operation, so… not asynchronous at all.

    Uh, what? await isn't blocking; it doesn't block the thread, which is released to perform some other work. The execution will resume at the point of the await when the asynchronous operation completes. The way the code's author reads the file is asynchronous. It's not parallel, though, despite the use of Parallel.ForEach (as you correctly noted).

  • Hasseman (unregistered)

    Sorry for someone asking stupid questions and never worked with parallel code. What would the purpose be to read a text file in parallel? If each line an independent unit maybe, but even then?

  • Pewe (unregistered) in reply to Hasseman

    Reading it asynchronous do have some uses, reading it in parallel could also have som uses depending on what you do with the files content but in my mind there's seldom any use for either...

  • King (unregistered)

    Maybe in a parallel universe this is considered good code?

  • (nodebb) in reply to Tom

    Exactly what I was going to say.

  • my name is missing (unregistered)

    Asynchronous and well left who to people is code best parallel it understand

  • LCrawford (unregistered)

    | . What would the purpose be to read a text file in parallel? If each line an independent unit maybe, but even then?

    Good question - the only time this makes sense is if the time required to process each line is much longer than the disk access time when trying to do a parallel read, and the parts of the text file do not depend on each other or cause excessive resource conflicts when processing each line.

  • Dude (unregistered) in reply to Hasseman

    Interestingly enough, it's not reading the file in parallel, it's processing the words of each line in parallel, with the end result being a count of words in the document. It's still only reading the file one line at a time (albeit async, so other operations can theoretically still occur).

    Even if it didn't have a lock inside the Parallel.ForEach(hello ConcurrentDictionary), I don't know how much that would actually benefit the total execution, as Parallel.ForEach breaks the list into appropriately-sized subdivisions, each of which uses a thread to process. Since manipulating the dictionary it likely one of the faster operations in this function, and there probably aren't a massive number of words on a given line, I doubt this really does any significant improvements.

    Also, Dictionary<ParamSortDic, Func<IDictionary<string, int>, IOrderedEnumerable<KeyValuePair<string, int>>>> appears to just be an over-engineered generic way of sorting the final collection. I personally would have gone with a more traditional switch block, but something like that could be reasonable (if consolidated into its own object) if you have a lot of potential options.

  • Kanitatlan (unregistered)

    Use of threading can be for a variety of different purposes and isn't always about improving performance of the code being executed. This sample code despite everything actually meets one of the other criteria which is to have code that doesn't block the UI. It doesn't parallelise itself in any useful but it should be non-blocking to other behaviour. In the old days when I learnt multi-threading we only had one core and the only reason you did it was to be non-blocking.

  • testing (unregistered) in reply to LCrawford

    Exactly! It is asynchronous, but not parallel...they are different things. That line will give control back to the method that called this method (while the async call is happening) so it won't block this thread.

    However, you don't get multiple threads (parallel).

  • Sole Purpose Of Visit (unregistered) in reply to Hasseman

    Good question.. (And there are good answers abòve.)

    Remy has (with the best of intentions) unfortunately nit-picked this snippet to death. So, I'll give it a try -- because anyone who is proud of code like this should not be employed.

    1. For a "task" like this, if you want paraleĺism, you want map-reduce. There's a map, but no reduce. This is actaĺy fundamental. (There's also an extremely weird filter, but let that para.

    2. You put a lock in the deepest bowels of indentitur, you are clearly incompetent.

    3. I don't actually care about the type signature per se (I love me my linqy vars), but since it's a concrete type, used later on? Smells of doofus.

    4. Weird arc and desc sorting on both key and value. Clearly an eejit who for example has never heard of "Reverse." Or,

    Apparently, concurrent dictionaries.

    In short, computer says no.

  • Sole Purpose Of Visit (unregistered) in reply to Hasseman

    Good question.. (And there are good answers abòve.)

    Remy has (with the best of intentions) unfortunately nit-picked this snippet to death. So, I'll give it a try -- because anyone who is proud of code like this should not be employed.

    1. For a "task" like this, if you want paraleĺism, you want map-reduce. There's a map, but no reduce. This is actaĺy fundamental. (There's also an extremely weird filter, but let that para.

    2. You put a lock in the deepest bowels of indentitur, you are clearly incompetent.

    3. I don't actually care about the type signature per se (I love me my linqy vars), but since it's a concrete type, used later on? Smells of doofus.

    4. Weird arc and desc sorting on both key and value. Clearly an eejit who for example has never heard of "Reverse." Or,

    Apparently, concurrent dictionaries.

    In short, computer says no.

  • (nodebb)
     if (Path.GetExtension(filePath)!=".txt")
        {
            throw new ArgumentException("Invalid filetype");
        }
    

    Great, a text editor that you can’t easily open things like HTML files in.

  • sizer99 (google) in reply to Bradley

    'curious what the author would replace "Dictionary<ParamSortDic, Func<IDictionary<string, int>, IOrderedEnumerable<KeyValuePair<string, int>>>>" with'

    You decompose it. A good idea whenever you see more than two < or >, like that >>>> on the end. If you've got a Dictionary of Lists of KeyValuePairs of KeyValuePairs it's usually time to replace it with a class or struct - makes it incredibly more readable and maintainable when you're not referring to item[3].Key.Value

    Now in this particular case all the complexity is in the Func, so just declare a delegate for that and you can replace it with 'Dictionary<ParamSortDic, BrilantFunc>' and it's totally readable.

    I would cut the guy who wrote this a little slack since it appears he's a c# noob who is just learning all this stuff and wants to throw everything in at once - he's looked up the syntax and even got everything functioning (a cut above most people on StackExchange) but doesn't quite get the implications. Might not hire him, or maybe that enthusiasm is usable with a little mentoring.

  • Barf4Eva (unregistered) in reply to King

    No sir, said parallel universe failed to materialize...

  • Geoff (unregistered) in reply to Hasseman

    It could be for some kind of ETL process. I had to create an application once that loaded lots of "not quite CSV" data into a database that was output into flat files; I would pickup via FTP from the mainframe that produced them. It was not as simple as just inserting records. I first had to do a little normalization. That required calculating a hash over some of the fields to figure out if this was a new item level record or a detail record for an existing item or a strait up duplicate. I would first query the DB to see if the hash matched any existing item level record (the hashes were the primary key in the item level table). The vast majority of records we either duplicate or detail so most of the time no insert or update was needed. It was much faster to split the file into regions and read each region on its own thread than to try and crank thru it linearly.

    The other model of course would be read the file and dispatch each line in round robin into queues each processed by some kind of thread pool. I tried it that way first as its simpler to implement but I found out the performance was much better the other way. A lot fewer copies in memory of some pretty long record lines.

  • (nodebb) in reply to Gurth

    Why go that far? How about README.TXT file?

  • (nodebb)

    I guess you could parallelise the next read with the insertion of the words into the dictionary, but that's annoying to express using just async and await. Something like this?

    string line = await streamReader.ReadLineAsync();
    while (line) {
        Task<string> task = streamReader.ReadLineAsync();
        /* insert words into dictionary */
        line = await task;
    }
    
  • nerdile (unregistered) in reply to LCrawford

    You're not missing anything, that is the proper usage of await and that part is not actually a wtf.

    await gets broken down into a continuation, so it really is asynchronous.

  • Allan Mills (google)

    I've got to say I've tried turning some of my applications into multi-threaded versions and some of them ended up being slower than the single-threaded version. The last few worked well though.

Leave a comment on “For Each Parallel”

Log In or post as a guest

Replying to comment #:

« Return to Article