The Daily WTF: Curious Perversions in Information Technology

2022-07-21 Reply Admin

This could have been my posting from 1989. I went over the map with a device to output lat and long when I clicked on the button, student entered the points into his C program, using exactly the hardcode technique. I let him get on with it because it wasn't my hill to die on.

2022-07-21 Reply Admin

Having the data point in a compiled form could have some sense. In assembler and consequently in plain C having a data structure that is the binary image in memory could be useful in some circumstances, especially if your target is a low-performace microcontroller.

In C++ one could make a program to read the data, load in memory and then serialize the object to a file, a bit like is possible in Java.

So it could be possible to update the data point simply changing the serialized objects.

2022-07-21 Reply Admin

Too little context to say whether this is a WTF or not, sorry.

Steve_The_Cynic · 2022-07-21 Reply Admin

I agree. The datasets could easily be the sort of thing that never changes, depending on what the numbers are latitudes and longitudes of.

2022-07-21 Reply Admin

The worry would be the same one that you always have with data-driven machine-generated high level language code - at what point are you going to hit some magic compiler limit and blow the whole approach out of the water?

That would be reason enough for me not to do it.

2022-07-21 Reply Admin

So why not dump it as a binary blob which is actually accessible from both languages, easier to manage and doesn't require a compiler?

TheCPUWizard · 2022-07-21 Reply Admin

If the data is not subject to external change, then this can be a PREFERRED method. As far as "your compilation will be much slower, encouraging developers to think more carefully about their code before they hit that compile button." that is largely a fallacy with all of the modern tools out there. For example the compiled geographic data could all be in a NuGet package so the application developer NEVER compiles is, and it is only compiled by a CI/CD pipeline in that "other" environment and then published.

Personally, I would be very happy to NEVER see another JSON unless it was actually representing JavaScript Object that has been serialized and will be desterilized (sic - love this auto correct) back into JavaScript

2022-07-21 Reply Admin

New definition of microservice: an app that can only serve a very small fraction of what it could if it weren't hardcoded.

2022-07-21 Reply Admin

It's not what's happening here, but this seems to be the standard in the lower-end Arduino-level examples I see. Advantages are that you don't need a file system, or the libraries to parse strings into numbers, which matter for super-small systems.

2022-07-21 Reply Admin

To me, this looks suspiciously like a "I'm just going to write a one off program" that somehow became permanent.

2022-07-21 Reply Admin

This could actually be good if they had not used an std::vector, thus basically generating a gigantic initializer that will run on startup and allocate a bunch of memory. It should have been a regular array.

2022-07-21 Reply Admin

I'm going to make a case for this paradigm. I wrote a game for Second Life years back. Its first incarnation had no website access and I had to rely on the scripts alone. Those scripts had limited size. I needed a full scrabble dictionary in the game. This could be put inside a single "notecard" in the application, but lookup (due to technical limitations) could take minutes per word. On the other hand, write a program (Python, C#... APL if you like) that reads the dictionary, applies a simple hash algorithm to break the dictionary into 32 parts which become 32 scripts with a near instant lookup of a word and you're all set. Steve_the_cynic makes a good point and it applied to what I did: the dictionary didn't change.

"We do what we must because we can." -- GlaDOS

2022-07-21 Reply Admin

Agree that this is not necessarily a WTF.

If it was a statically-allocated constant array instead of std::vector, using fixed precision (i.e., integer values), unpacked instead of a GpsPt, and any loops had constant bounds derived from the data, a compiler could probably do a lot of constant propagation/folding and optimize away a lot of work at compile time.

2022-07-21 Reply Admin

As others mentioned, if the data set never changes, then it makes sense to compile it in.

Another case for compiling it in is when you're just getting a project off the ground and don't yet have the file reading/saving code in place, but you just want to see something on screen. Such was the case for me back in 1987 when I was working on a new graphics program for the company that I was at. We hard coded the points and other data needed to draw a floor plan on the screen and then got the code for drawing it working; and once that was working, we worked on the file saving and reading code and tossed out the hard coded data after we got that working. (Of course, nowadays, a lot of that kind of stuff is available in libraries, so there's less need to have temporary data in code anymore.)

It's only a WTF if the hard coded data becomes a permanent part of the code and it's data that's supposed to be able to change.

2022-07-21 Reply Admin

Other than the use of the wrong datatype this could make a lot of sense if the target is an embedded system and the data is rarely updated.

I've actually done a very limited version of this once--I needed a few thousand data items for a geometric model that could be calculated but with an expensive runtime. I didn't actually output a whole C# program, just C# formatted data elements that I then cut and pasted into the code.

2022-07-21 Reply Admin

I had one such hardcoded array that would initialize a vector (generated by some code generator, of course). MSVC was so smart that it inlined the copying of each number into the global initializer instead of putting the numbers in rdata and doing a memcpy. That meant that for every 2 byte number that I had, it would generate 13 bytes of x86 that would set the data in the vector. That blew the size of the binary. I'm still not sure if I should blame MSVC or the code generator.

Andre Alexin · 2022-07-21 Reply Admin

I definitely remember there being some sort of binary format available in Python specifically to avoid CSV and JSON related problems.

Also, to support Steve, I also saw a library using a dataset during its compilation. Though that was indeed some sort of etalon/reference data, rarely if ever updated between releases of said library.

2022-07-21 Reply Admin

May not be the wrong datatype if you don't know the total number of points when you start writing them out, because std::array requires knowledge of the total length, whereas vector doesn't.

2022-07-21 Reply Admin

Oh, I take that back; you could always use an intializer_list with a basic array - so long as you don't mind using sizeof() instead of STL operations like .size(), .begin(), .end().

2022-07-21 Reply Admin

I use a spreadsheet to generate monochrome LCD graphics data for an embedded system. 128x64 matrix of cells, put an X in a cell, conditional formatting turns X cells black and non-X cells green. Another tab turns those into 0s and 1s and then formatted into rows of C array values that get copy/pasted into a .h file.

2022-07-21 Reply Admin

@Brian. That description reminds me of this, and not in a good way. ;)

https://xkcd.com/763/

2022-07-21 Reply Admin

The real WTF is storing Lat/Long degrees to 10 decimal places. That's a location to the nearest 0.01 millimetres or less!

2022-07-22 Reply Admin

Dang. Next step: switch to C# and use the built-in Just In Time compiler stuff (Roslyn) to recompile your datacode when you get new data. After all, you have cycles to burn on those vms you rent from some server-rental (cloud) company.

dkf · 2022-07-22 Reply Admin

There are some tables of data that really don't change very much, and which are appropriate to burn into code. Some even get to go into hardware (that is how your CPU implements trig functions). Latitudes and longitudes are unlikely to be good candidates for that, especially if you're using doubles; someone's got to account for continental drift after all!

2022-07-22 Reply Admin

First time commenter here, we (embedded developers) do shit like this ALL the time. As a C/C++ dev I will go to great lengths to generate stuff into code. Data? Straight to C tables. XML? Custom binary format, rendered to a byte array in C file, straight into the compiler. JSON schema? Looks like an IDL to me, straight to code! Among my team we consider this good form, the compiler works for us god damnit.

2022-07-22 Reply Admin

What I see here is a programmer taking some degree of ownership over those numbers: the numbers will be under source control; a change in the numbers can trigger the reviews or testing or whatever a programming change usually triggers; a disastrous change in the numbers has a chance of showing up during regular working hours.

In different environments, these rationales could be well justified or nonsensical.

2022-07-23 Reply Admin

Wouldn't it be easier to use a text editor instead, maybe with a 128x64 template of spaces, so you can easily switch between overwrite (standard) insert (if needed), and use ' ' and '#' (because a '#' probably looks better than a 'X' - but it doesn't actually matter -, and then simply use ... | sed 's/ /0/g; s/X/1/g' (or awk, including the declaration elements needed) to create the .h source file? It would probably be much more comfortable to "draw" with this approach, and scaling your terminal window, you could even see how it looks for smaller dot matrix displays. Embed all that in your Makefile, and every "image update" is just a case of executing "make". Maybe also use a text editor with horizontal and vertical "rulers" so creating "graphics" and custom symbols can be quite easy as you see your exact positioning (by the rulers, as well as by the editor's cursor position display). Note: I'm saying all this because that's how I successfully dealt with a comparable task decades ago, for a very non-standard application where everything had to be created from scratch.

2022-07-24 Reply Admin

If you have a better way to make an interactive drawing -> visual graphics -> C code workflow that can be coded in under an hour, I'm all ears. I'd much rather get the job done then spend 10~20x longer just to write a one-off program.

2022-07-24 Reply Admin

I don't think that would be easier. Pixels in the controller chip (that's on the LCD glass) has the bits layed out vertically so something has to calculate the hex values. The Excel approach with conditional formatting allows screen capture of what the graphics actually look like on the display which is good for management review. Like I said to WTFguy - it was quick and served every purpose. Store the excel in the docs folder in the source repo for the next guy. Your approach would require additional tool dependencies (Cygwin, sed, etc) added onto GUI-based Windows-only tool chain. That, and we've slightly altered the screens exactly one time in the last 15 years.

2022-07-24 Reply Admin

TRWTF is using a vector for data whose size is known in advance. So many optimisation opportunities wasted. Throw that data into an array on the stack.

And for heaven's sake, have the codegen run as part of the build process rather than relying on rerunning it manually. That path has too many dragons.

2022-07-25 Reply Admin

This is a perfectly valid approach, given some tweaks eg. a const C array would be better than a Vector, with automated maybe some extern declarations so it's only ever compiled once. We used to use this system on the Gameboy. I can't even remember if it had a file system. Anyway, removing the complication of the file system for data that rarely changes is a good thing. It's a whole load of potential errors for badly formatted files, files not found etc. that can be completely ignored. Keeping it simple!

2022-07-25 Reply Admin

Turns out C++23 even has a standard for automatically converting binary files to something you can stick in a datastructure: #embed eg. https://thephd.dev/finally-embed-in-c23

2022-07-31 Reply Admin

This is a perfectly valid approach, given some tweaks eg. a const C array would be better than a Vector, with automated maybe some extern declarations so it's only ever compiled once. We used to use this system on the Gameboy. I can't even remember if it had a file system. Anyway, removing the complication of the file system for data that rarely changes is a good thing. It's a whole load of potential errors for badly formatted files, files not found, etc. that can be completely ignored. Keep it simple!

Compiling Datasets

Leave a comment on “Compiling Datasets”