- Feature Articles
- CodeSOD
- Error'd
-
Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
FieldType: Text FieldName: T5ZA1 FieldNameAlt: T5ZA1 FieldFlags: 25165824 FieldJustification: Left FieldMinLength: 5 FieldMaxLength: 5 FieldDefault: Frist
Admin
The Boulder is conflicted over the use of a bitmask (25165824 = 0x1800000) in a PDF. Are we still in the era of saving bytes? Please reassure The Boulder that it was converted to separate columns when writing to the database . . .
Admin
I'd like to see one of those forms actually... Because if you look at, say, tax forms where I live, all the fields are numbered. And for this kind of stuff, skipping "O" particularly makes sense, as it eliminates the risk of users confusing it for zero.
Admin
Oh and I see an opportunity to be the frist one to mention growing fucks in those fields.
Admin
True . . . but then it seems obvious that "I" should be skipped as well.
Admin
The "flags" looks to come from the PDF spec. If I'm reading it right, the fields have the "Comb" (split into single-character boxes) and "DoNotScroll" (exactly what it says) flags set. Not really relevant information in this case, but it looks like Sally's code is just dumping the field definitions.
Admin
This is not a WTF. We don't no the background, but I'm sure there is some documentation out somewhere which explains exactly the meaning of each field.
And it makes sense because each field is easily identifiable. For instance, if you want to marry your grandmother's stepsister, some documentation may tell you "fill out field C3 to C5", and there will be no discussion what exact fields these could be and where to find them.
Admin
FWIW, I'm currently working in the aircraft industry. Some days it feels like there's a rule against naming anything sensibly. Everything is either numbered or has an absurd acronym. If you encountered "ARINC 424" without a context, would you have any clue that it's the file layout for a standard list of airports, runways, and approach points?
Admin
Your analysis is probably approximately correct, but it's worth noting that they could have separated the common prefix T5Z from the actual field names e.g. T5Z_C3, which would have made the end result more immediately readable. (In the long run, of course, we learn to "see" that name T5ZC3 as the two parts, but it sure does make it harder to read initially.)
Admin
That's a good point, but a PDF is one place where if one so chooses, they can all but guarantee what the characters the user sees look like. If one counts on choosing their styling such that an attentive reader can tell as many problem characters apart without comparison as possible (setting aside for a moment whether that's a good idea), then O is probably the hardest one to do that with. There are designs for most other potentially confusing characters that are unlikely to be confusing (consider Cascadia Mono or Ubuntu Mono, for example), but while a slashed or dotted 0 might make me pretty confident I'm not looking at an O, an O without any 0 near it is probably the least likely for me to be sure about.
Admin
When I look at it, removing the common prefix, I see Excel cell numbers. They're all Letter-Digit values. As far as Alt text, it's
FieldNameAlt, so I'm guessing if no value is explicitly assigned, it just returnsFieldNameas the default, avoiding issues with null or empty values. The main WTF is they could have put a descriptive name in theFieldNameAltand the cell reference inFieldNameor vice versa.Admin
Sounds like COBOL. COBOL was deliberately designed to be readable by non-technical managers, so to protect themselves against the dangerous situation where managers know what they're doing, COBOL devs take care to give their jobs, modules and variables names like DBQN-B4ZQ-SXDN or GXFJKQ or NKBLG-KDGB4
Admin
In the once upon a time. IBM slashed the letter o and Teletypes slashed zeros.
Admin
I don't know why you're trying to make out efficient use of space as an outdated practice. Sure, we have gzip compression and hard drives that hold many gigabytes, but designing efficient file formats is not obsolete. The few bytes here and there stack up to what is a noticeable amount of saved space. You've clearly never seen the outworking of a laptop released in 2023 with a 128 gig soldered-in ssd (the result is that organisations buy them by the pallet load because they're cheap then get stuffed because everyone is out of storage by the time they've installed all the software they need).
Admin
Glad to see you volunteering to write the SQL query that has to select against the bitmask column.
Admin
@dpm You're misunderstanding. The bitmask is part of the field definition in the PDF form. It's rendering & validation instructions to the PDF app. It is not part of the user input data to be stored in the database.
Admin
Don't forget S/5
Admin
Just for the record: While rather obscure and kind of unfriendly, PDF forms actually have a standard for sending data directly (much like web forms), so there really isn't any need to scrape the PDF.
(So the described procedures are really much like: (1) user fills in web form (2) user makes screenshot (3) user submits screenshot to backend (4) OCR and scrape the screenshot, just with PDF.)
Admin
When I started at my current employer, we used a data acquisition software configured by another company to my predecessors specs. He had everything listed in an excel sheet, so they just went ahead and named everything after the cell addresses. We had formulas like 2*R2+e^Q4 and it was a complete pain to debug this stuff.