• RLB (unregistered)

    That's a great way to get bus errors.

  • LZ79LRU (unregistered)

    It's obvious this guy was indeed a beginner. Allow me to list some of his mistakes so that other juniors can learn from them:

    1. Newer show your superiors or even peers you are smarter than they are. Especially so if you are. You will NOT get praised or rewarded for it. That is now how organizations work. It is not how people work, period. Show someone you are better than them at anything and you will be disliked or worse seen as competition or a threat. At best you will make your self look like a smartass. And nobody likes those.

    2. Newer violate coding rules and policies until it becomes essential or opportune. Doing so only gives the people who dislike you ammo to use against you. And for good reason.

    Remember, rules exist for a reason. That reason might be something as petty as the whims of your predecessors or superiors but it might well not be. Something that might look stupid, wrong or pessimal to you might be there because someone tried all the "right" ways and found them to cause problems. I have seen this in the wild. I have done this in the wild and left comments about it.

    And until you have a lot of experience both as an engineer in general and with the code base in question in particular you are not qualified to make that distinction.

    1. Newer go above and beyond to fix what ain't broken. This isn't being lazy or obstructive either. You are paid to do a job. And if there is no ticket to make things happen (such as making things run faster) that is because the powers to be decided that you have other more important things to be doing. Things they gave you tickets for.

    Doing anything outside of that scope, especially as a beginner or trainee is a good way to tell everyone you care more about what looks fun to you than about what you are paid to do.

    Instead, keep these "fixes" in your stage for a rainy day when the ticket does come and it's 16:30 on a friday.

  • LZ79LRU (unregistered)

    Also god dam did the formatting break there. Those paragraphs are supposed to go together and not like this. What ever this is.

  • dpm (unregistered)

    I hate side effects, I hate methods which do two things. I'd much rather break it out into two steps

    I beg to differ. You're just putting the onus on the caller to always make two calls, not to mention your suggestion still has the same side effect. Why not instead simply have each API function take a simple pointer and return the resulting address, instead of taking a handle and returning void?

  • (nodebb) in reply to RLB

    That's a great way to get bus errors.

    Indeed. When I saw the *(int *) and so on, my first thought was "let's see you run that on early-model ARM chips without good unaligned access handling..." (Or, frankly, x86 from the 486 onward, if you set the AC == Alignment Check flag...)

  • Kythyria (unregistered) in reply to LZ79LRU

    It's amazing how wtf generators find competence threatening. If there are all these good reasons for requiring things be done the stupid way, it'd be a very good idea to, say, document them. Rather than issue directives like "memcpy is too clever".

    The article sounds a lot like code which should be generated by machine, but someone who thinks memcpy is too "clever" probably won't think of that, and would probably insist it be hand-typed, because generating formulaic code is "clever".

  • Kythyria (unregistered)

    Also, addendum: Remy is proposing a WTF here; splitting the function into separate "write" and "increment" would allow the possibility of incrementing by the wrong amount, likely one of the footguns this is intended to prevent (it would also prevent automagically inserting padding).

  • Jonathan (unregistered) in reply to Kythyria

    Agree. Breaking the write and increment into separate functions severely breaks encapsulation.

    And maybe it's because I was once an old C programmer, but when a function asks you to pass in something with an "extra" level of indirection, that's a strong indicator that the function intends to modify the value of it.

    Heck, I've written a buffer class before. Mine was wrapped up in a nice C++ class, but if you're going to use C this is a nice, sensible, uniform interface.

    Refusing to consider an optimization seems silly. And ignoring potential alignment issues is also a warning sign. But having a standard API for buffer packing is eminently sensible. The first paragraph sounds like they think everybody should just "do it by hand, again" each time.

  • Barry Margolin (github)

    If you don't like side effects, have the function return the new pointer. Then the caller can do

    bufptr = func(bufptr, data);
    

    But probably you should just get over it. This kind of thing is idiomatic in C (C++ added reference arguments to make the code a little simpler).

  • (author) in reply to Barry Margolin

    I mean, this I'd have no complaints about. Or storing the new result in an output pointer. I just get hives when a parameter is both an input and an output.

  • dpm (unregistered) in reply to Remy Porter

    I just get hives when a parameter is both an input and an output.

    I am bewildered by this attitude. "Handles" are classic C that never bothered anyone I ever coded with, not to mention a legitimate use of parameters in C# via the "ref" keyword. Is your dislike just a emotional issue or can you give a rational basis?

  • (author) in reply to dpm

    At the end of the day, I'd call it preference- which I also called it in the article. Even if I'm passing by ref, I really want to be clear about what's an input and what's an output, and to store the output on top of the input just makes it harder to debug the program. It makes my tests more complicated. Maybe I've just inherited some functional programmer brain along the way.

    I recognize that C idioms are very much "it's all just memory anyway".

  • StackV (unregistered) in reply to Remy Porter

    Speaking of C idioms, this code also contains a classic C programmer sin: writing numeric types to binary without using a defined byte order.

  • Charles (unregistered)

    In addition to the bugs already mentioned, the code also is bad because it produces different results depending on the endianness of the CPU, which is a special case of relying on the specific refresentation of data types. Upgrade your compiler from 32-bit to 64-bit and all those serialised values break.

    However, Remy's desire to have two steps is definitly wrong. It's a violation of the DRY (don't repeat yourself) principle. While I write functions like this in the way advocated by Barry Margolin, if you can't handle pointer-to-pointer and read/write parameters, you shouldn't be writing C, C++ or similar code.

    The principle of using a library rather than having multiple programmers invent slightly different variants is a good principle, but as always, having a standard way of doing things amplifies both the good and bad features of the one way, so a standard should always be open to improvements - exactly the opposite of the attitude of Stefano's colleagues.

  • (nodebb) in reply to LZ79LRU

    Ha, I know the feeling when it comes to formatting - I'm so used to markup preview in Visual Studio, I am helpless writing it in the blind.

  • (nodebb)

    For those wondering how memcpy is generally optimized:

    Depending on the platform, they try to hit cache boundaries and use the biggest available type.

    So instead of copying each 8-bit byte for example, they use a 64-bit long. That alone gives you sometimes a boost 10x faster and sometimes more.

    For the cache boundaries it's getting more tricky; basically you want to hit physical addresses with a specific mod to zero. So for example if you cache is organized around 16 byte addressing, than you want to start always at x mod 16 == 0.

    Obviously both heavily depends on the platform, so it's best to select the latest c lib by the vendor and never implement it yourself, because even if you get it right today, tomorrow might be different because memory management is usually done by the OS on most platforms.

  • (nodebb)

    memory management is usually done by the OS on most platforms

    Sort of. Memory management by standard C (malloc and free and friends) is certainly not done by the OS on the major platforms (Windows, Linux, ...). man malloc on my Ubuntu 20.04 work machine tells me that the page lives in section 3, which is for user-mode libraries, and FreeBSD (11.3, my build-VM) says the same. In both of them, and in Windows, the user-mode library behind malloc will call into the relevant OS calls to grab big blocks, and then cut them up to satisfy requests from the application.

  • Yoh (unregistered) in reply to LZ79LRU

    Yeah, hard no. I honestly can't imagine how you might enjoy your life as a human being, let alone as a developer, in such a place.

    Your organisations (or maybe the culture around them? I don't know where you work, but it sounds East European or Toxic American) are obviously broken if they work like this. Sure, there's a balance to be struck between old and new, and everyone needs to learn before they can teach, and nobody likes an obnoxious smartass (whether newbie or not), ... but this kind of advice is just taking it to a surreal and absurd extreme.

    If there's any fledgling developer reading this and you want to know how on earth codebases get into such a shocking state when everyone has reasonable technical chops, this is how it happens. Starts with poor culture, ends with dead company.

  • (nodebb) in reply to LZ79LRU

    @LZ79LRU, I'd love to tell you that in these 20 years I've learned my lesson. Well, PARTLY I have... but sometimes I'm still that young, naïve coder. Anyway, even if THOSE specific seniors were the way they were... not everybody in the company was like that.

    @Steve_The_Cynic: don't worry about that, it was (and it still is, I suppose) a pure x86 application.

  • Fizzlecist (unregistered)

    This may be a minor quibble, but the API interface defines a string function named TheLibraryName_Client_SerializeString, but the implementation of the string function was named TheLibraryName_Client_SerializeMemory. Unless I'm missing something?

  • (nodebb) in reply to Yoh

    Agree wholeheartedly. I have had the misfortune of working at dysfunctional places where LZ79LRU's advice was actually valid due to the prevailing company culture (the "Toxic American" subtype) especially as perpetrated by the pointy-hairs. These were extremely depressing and demotivating places in which to be.

    A place with a decent culture - which values competence, transparency, decency, and a good work ethic - does not need such stifling rules, except perhaps the first sentence of LZ79LRU's rule 2 (and bits of the following paragraph).

  • jp3 (unregistered) in reply to The Beast in Black
    Stefano didn't use their library. He continued to use his memcpy implementation. Performance crept up as he made changes, and no one complained. That program is still in production somewhere, all these years later.
    In a future TDWTF: So, I was tasked to update this ancient serialization code which would have been an easy enough job, if not for the junior developer before me who had decided to be clever and roll their own solution instead of the API that was used everywhere else...
  • jp3 (unregistered)

    Aaand the real wtf is of course this comment system. Didn't mean to reply to the post above mine.

  • Randal L. Schwartz (github)

    Some sort of memory copy or memcpy function.

    You stomped on your own punchline. I'd have written that as:

    Some sort of "memory copy" function. And to make sure people know how to find it, let's call it, I dunno, "memcpy" perhaps?

  • Erik (unregistered)

    Firstly, this pattern (pointer to the the "cursor" (pointer) of what you're manipulating, and doing the extra deref each time) is not necessarily bad (if the "cursor" exists outside of this function and is passed to others later, then it needs to exist in some sort of structure that is passed to them all and needs to be updated anyway ). The WTF is probably that it should be declared as a single member of such a structure - which should incur no extra runtime overhead - instead of **arg, it would be spelled *arg.cursor).

    Secondly - "I'd much rather break it out into two steps: writeData(dest, data); incrementAddress(dest, sizeof(data))." Yep, so if 'data' is an array or literal, you're probably fine - but remember that sizeof("foobar") includes the nul terminator - is that what you wanted? If it's a pointer to something with a length other than a "sizeof(void *)", you've got a bug which the compiler will not tell you about.

  • Worf (unregistered)

    I don't get how that code could work - alignment issues everywhere. memcpy() is the only safe way to do it.

    Otherwise it has a lot of issues. Architectures like ARM can't do unaligned accesses (the compiler has to do it if you access packed structures). Other architectures ASSUME alignment - so if you have a point that's unaligned, then attempt a 32-bit write, it will assume the lower two address bits are 0 - so you can overwrite data. (Heck, some x86 accesses will do that - they "wrap" but don't actually increment the upper bits)

    I'm going to say there were probably weird and obscure bugs caused by that library not doing it properly that no one could figure out.

  • Foo AKA Fooo (unregistered) in reply to Remy Porter

    Just to echo what others have said, Remy, I think you're taking the SRP too far. A function having a single responsibility doesn't mean it must only do one thing. If several things logically belong together, it should do them all. Separating them is not only less convenient to the caller, it's also more bug prone, both to the implementer of the library and the user, and possibly less efficient (say you want to serialize a C string, you'd have to count its length in two places).

    The FP analogy doesn't really work either because the real side effect is not the modification of the pointer, but the memory. IOW, in FP you could fork a buffer like this:

    f1 = func(f0, a);
    f2 = func(f1, b);
    f3 = func(f1, c);
    

    to get f2 containing a and b, and f3 containing a and c. Of course, this won't work here.

    C is no FP language. It's not OOP either, but closer to it. You can always do hand-made OOP. For a start, just wrap the buffer pointer in a struct (no runtime overhead). That already removes one "*" from the function signature, and makes it clearer that *this or whatever you call it is modifiable, as is usual in OOP.

    Further on, it will allow you to make the struct opaque for better encapsulation or add a buffer end pointer (at no extra cost as far as parameter passing is concerned) to do such newfangled stuff like bounds checking, etc.

    Other than that, using memcpy is, of course, correct here, both for strings and other POD types (because of alignment, as was mentioned). So actually, if you turned the parameter from const char * to const void *, it would just work for int and double and all POD types as well. Though there is a danger of abusing it by e.g. applying it to a struct containing pointers which doesn't serialize so trivially, so that would be a change you shouldn't make on your own without consultation.

  • seebs (unregistered)

    okay but actually it's an atrocious idea to separate "write data" and "increment pointer", because then you can BE INCONSISTENT.

    it is in fact just unambiguously more correct to have "write this data, incrementing pointer" be atomic.

    the much bigger WTF is that this interface is a useless serialization interface because it doesn't give you any way to detect that you're deserializing the wrong thing. this is why we use headers in our real serialization formats...

  • Your Name (unregistered)

    I'm also on the "keep-those-two-methods together" wagon. The functions are called serialize. Make life easier for the API users, not burden them with some unwritten rules on how to call them.

    If you call serializeint serializestring serializefoo

    I expect and int, followed by a string, followed by a foo, not a clobbered mess.

  • Nick (unregistered)

    TRWTF is the suggestion that the “copy memory” and “increment pointer” behaviours should be separated into individual functions…

  • LZ79LRU (unregistered) in reply to Yoh

    You don't. Places like this are not jobs you enjoy. They are jobs you survive.

    The only reason anyone would work at them is if they have no choice. Either because they need the money or because they have to start somewhere.

    And you see, everyone does indeed have to start somewhere. And your first job is typically chosen by a kid who is both inexperienced in the real world, enticed by big companies because they pay well (for a first job) and most importantly limited by who ever is willing to hire you without the standard 2-3 years of work experience everyone seems to want.

    Jobs like the place in today's article work by hiring young people, keeping them for those 2-3 years and than replacing them as soon as they flee. In which time they hope to drain as much value out of them. And flee they will. For as a junior these places literally just exist to give you a check on your CV so that you can find a better job as soon as you survive those couple years.

    And the scary thing is it works for them. I have in my past worked in companies just shy of the fortune 500 who operated on sheer incompetence even though software development was their primary occupation. Their solution to horribly inefficient practices is simply to hire and throw more young people at them WW1 style.

    I have literally seen projects take 3-5x the development hours they should have. But it didn't matter since they could afford to hire 5x the young people and still make a profit.

    That is why my advice is primarily aimed at juniors starting their first job. Once you get past that hurdle it is still relevant but to a far lesser extent as you have more experience to detect bullshit jobs and more freedom to pick and choose.

  • FTB (unregistered)

    I would NOT call two separate functions (write and increment), that's just asking for bugs.

    What I'd do is have the function return a value - the updated pointer:

    char* TheLibraryName_Client_SerializeInt(char* serializedBufferPosition, int value);
    

    Except I wouldn't do that at all, untyped pointers to memory should be void*, not char*.

    void* TheLibraryName_Client_SerializeInt(void* serializedBufferPosition, int value);
    
  • (nodebb) in reply to LZ79LRU

    And your first job is typically chosen by a kid who is both inexperienced in the real world, enticed by big companies because they pay well (for a first job) and most importantly limited by who ever is willing to hire you without the standard 2-3 years of work experience everyone seems to want.

    Once again, I am reminded that a great deal of stuff about my career is atypical. For a while, my first post-graduation job was at a company that consisted of the owner, me, and a temp secretary who came in on Wednesday mornings to do paperwork.

  • (nodebb)

    I agree with everyone who disagrees with Remy here: Doing both the write and increment is the way to go.

    What irks me about this code is the lack of bounds checking. How big is the buffer? Who knows? (also, the lack of endianness handling in SerializeInt etc. Hopefully this code will never leave the x86 world)

  • (nodebb) in reply to Medinoc

    If I was doin this, instead of char* I would define a struct with three members:

    • the start of the buffer
    • the index of the next free location in the buffer
    • the size of the buffer

    The first parameter of the function would be a pointer to an instance of the struct which would be mutated by the function (the free location index would be incremented). The struct would be declared in the header as an incomplete type so it is opaque except in the C file in which the functions are implemented. Integer types would be stored with a defined and documented endianness.

    Using a struct in this way is fairly standard practice, but in reality the code in the WTF is a stripped down version of the same approach. The biggest WTF is definitely the lack of bounds checking.

  • (nodebb) in reply to LZ79LRU

    Remember, rules exist for a reason. That reason might be something as petty as the whims of your predecessors or superiors but it might well not be. Something that might look stupid, wrong or pessimal to you might be there because someone tried all the "right" ways and found them to cause problems

    There's a name for this. It's called "Chesterton's Fence". You drive down a road and you come across a fence across. You want to continue down the road but that would require dismantling the fence. You think the fence is pointless so you dismantle it. You are immediately set upon by a pack of ravenous velociraptors that were on the other side.

    You should never abolish a rule until you understand why the rule is in place and you understand the consequences of getting rid of it.

  • nunya business (unregistered)

    "Side effects" are FP-cultist nonsense. It's an abuse of a mathematical term in a non-mathematical context, chosen to evoke medical (not mathematical) imagery in the minds of the audience. A side effect is something unexpected and unwanted that your medicine does to you. (The equivalent software terminology is "bug.") Having code that is supposed to perform perform observable effects actually perform observable effects exactly as expected is correct, expected behavior.

  • Kythyria (unregistered) in reply to jeremypnet

    Except that's not what Chesterton's Fence is. Chesterton's Fence is where there are no raptors or any other reason for the fence in sight, but there is a guy who won't let you remove the fence unless you can guess why it's there, and won't just tell you why put a fence in such a strange place.

    It's basically appeal to tradition dressed up.

  • LZ79LRU (unregistered) in reply to Kythyria

    Bottom line is that when it comes to arcane and seemingly stupid rules in any job the repercussions of breaking them can range anywhere between nothing through setting the system on fire potentially loosing millions or even getting people killed to insulting the fool who designed them who is now your boss.

    Listed in order of severity.

    So don't break them unless you have enough knowledge and experience and understanding that you can determine if and when it is appropriate to do or not do so and enough seniority that you can get away with it.


    That I feel is one thing CS graduates severely lack. School teaches you all these cool and perfect ways to solve problems. They don't teach you that in the real world some times you have to get dirty and do things the pessimal way to please a higher power, be that a PHB or the machine spirit of an ailing code base running on ancient hardware.

    They also don't teach you the most important lesson of all. The only thing you know is that you know nothing.

  • LZ79LRU (unregistered)

    Also. HOW THE LIVING HELL DID MY TEXT GET SO HUGE? I literally did nothing. it just happened.

    That was not the intention either. It looks hideous.

    Although ironically enough it actually worked out to emphasize my point.

  • LZ79LRU (unregistered)
    Comment held for moderation.
  • (nodebb) in reply to jeremypnet

    Yeah, I'd do pretty much the same.

  • Yoh (unregistered) in reply to LZ79LRU
    Comment held for moderation.
  • (nodebb) in reply to Worf

    Architectures like ARM can't do unaligned accesses

    It depends on whether the people putting the chip together put an MMU on, as one of the things that does is hide the details of doing multiple memory accesses to read or write an unaligned word. It's commonly present for A-series cores (such as will appear as main CPUs in a laptop or server), but really rare for M-series cores (widely used in embedded systems).

  • nasch (unregistered) in reply to LZ79LRU
    Comment held for moderation.
  • (nodebb) in reply to Fizzlecist

    This may be a minor quibble, but the API interface defines a string function named TheLibraryName_Client_SerializeString, but the implementation of the string function was named TheLibraryName_Client_SerializeMemory. Unless I'm missing something?

    My mistake, while preparing the snippet to submit I confused the function that serialized classic C NULL-terminated strings and the one to serialized generic chunks of memory (the WTF in the implementation was similar, anyway)

  • (nodebb) in reply to jp3

    @jp3

    In a future TDWTF: So, I was tasked to update this ancient serialization code which would have been an easy enough job, if not for the junior developer before me who had decided to be clever and roll their own solution instead of the API that was used everywhere else

    LOL. If it happens... well, I probably deserve it! :D Anyway, I remained in the team that worked to improve and extend that program for 10 years. Ten years are a VERY long time in our world. During that time, I cursed my younger self many times for many bad decisions, but NOT for this one.

  • (nodebb) in reply to Medinoc

    @StackV

    this code also contains a classic C programmer sin: writing numeric types to binary without using a defined byte order

    @Charles

    the code also is bad because it produces different results depending on the endianness of the CPU, which is a special case of relying on the specific refresentation of data types. Upgrade your compiler from 32-bit to 64-bit and all those serialised values break

    @Worf

    I don't get how that code could work - alignment issues everywhere

    @Medinoc

    the lack of endianness handling in SerializeInt etc. Hopefully this code will never leave the x86 world

    you are right, of course. A bit of context.

    My department never made software for the ARM architecture. The application was x86 only, only for a specific compiler, and for a specific OS.

    The serialized data was passed to/from other applications (also written by our teams) running in the same computer (AFAIK, never to other computers, never to human beings, never written to disk).

    As I said before, I was in the team that maintained it for 10 years, and it stayed that way. Then, I don't know what happened in the NEXT 10 years, but I can bet it did not move to x64 or else: it's mostly an obsolete old relic from a distant past, replaced by newer products, only very, VERY old customers still use it.

    @seebs

    the much bigger WTF is that this interface is a useless serialization interface because it doesn't give you any way to detect that you're deserializing the wrong thing

    I totally agree. And yes, that caused problems sometimes.

    The idea of those who came before me was probably: "all involved applications are written by us, and we NEVER serialize/deserialize something wrong!".

  • (nodebb) in reply to Medinoc

    @Medinoc

    What irks me about this code is the lack of bounds checking. How big is the buffer? Who knows

    @jeremypnet

    The biggest WTF is definitely the lack of bounds checking

    Again, sadly true. Buffers in that application were handled in ... er ... peculiar ways ...

    You might see an example of that in a future WTF, if Remy publishes it ;)

  • (nodebb)

    thanks for info

    Addendum 2023-06-19 19:32: I came across a website that specializes in providing guidance and services for creating LLCs. This platform legalzoom llc review 2023 offers a range of tools and resources to simplify the LLC formation process. From helping you choose the right South Carolina LLC Service state for formation to assisting with the necessary legal paperwork, their experts are there to support you every step of the way. Don't miss out on this valuable resource to ensure your business is set up for success

Leave a comment on “Copy Serialization”

Log In or post as a guest

Replying to comment #:

« Return to Article