• (nodebb)

    Since cleaning data was mentioned... Back in 92-94 I was responsibility for taking the printed (OK, SGML) products of a major tech publisher (one of the big two, and not ZD) and producing an electronic subscription product. The technology was pretty straight forward (though there is a tale or two to be told there), but the enormous aspect was determining wat words meant so see of they were talking about the same thing...

  • (nodebb)

    The code is a WTF, for sure. But the underlying issue is the existence of multiple systems where the data can reside. This can be names or any other object and field. It happens especially often in financial firms. They just love buying software. They buy a trading system, and an accounting system, and who knows what else. These third party packages never do exactly what the client needs, usually not even close; the vendor sells the client N hours of consulting to "configure" the software. Then the fun starts; an army of staff and contractors move and massage data between all the systems. The result is always a mess. The cost is triple; 1) buy software license plus recurring renewal, 2) then customize it to work for them, 3) then continuously ETL data around. Also, eventually people request reports, so a data warehouse is built with more data integration and coding and maintenance.

    If you so much as suggest that an in house solution is built, you're laughed out of the room. "Do you really think you develop a ______ system yourself? Hah!"

    🤦‍♂️

  • Vault_Dweller (unregistered)

    That commented out loop makes me think that the original request/programmers original interpretation of the request was to only get the first name. The code was probably something like:

    For i = LBound(parts) To UBound(parts) temp = parts(i) Exit For Next i

    It explains why the current version is as is: it closely matches the logic of the original code

  • Vault_Dweller (unregistered)

    Not sure how to add new lines there, but you should be able to make out the gist of it.

  • ooOOooGa (unregistered) in reply to TheCPUWizard

    The technology was pretty straight forward (though there is a tale or two to be told there), but the enormous aspect was determining wat words meant so see of they were talking about the same thing...

    No. No, that is not the same thing. That makes my head hurt reading it.

  • Anon (unregistered)

    Working in machine learning, this hit way too close to home... Everyone gets all excited about the latest transformers derivative with a "clever" name, but the real value will forever be data science. So, so, so many people get into this job thinking they're going to develop clever models, but sorry, your job is cleaning and transforming data, and then MAYBE some model development, IF the data is good enough. (University ML courses really need to make this point a bit clearer.)

    This is part of the reason Google et al. dominate the industry. They have the data, and the resources to clean and sort the data. Nobody else can compete.

  • Fire Mountain (unregistered)

    "so what's the longest surname you can find that's a substring of a reasonably common first name"

    It doesn't have to be even all that complicated. I have a friend who's first and last names are the same. Came about as a result of the parents naming their kid after the mother's maiden name, the parents splitting, the child being raised by the mother. When old enough, the child decided they didn't want their father's last name. So, had it legally changed to their mother's maiden name. The result is first and last name being the same.

    I'm sure it's not the only case...

  • (nodebb) in reply to Fire Mountain

    https://en.wikipedia.org/wiki/Boutros_Boutros-Ghali offers a different but related problem.

  • gnasher729 (unregistered) in reply to Fire Mountain

    “Three Men in a Boat” was written by…

    Jerome K. Jerome.

  • JB (unregistered)

    Fun related wiki article: https://en.wikipedia.org/wiki/List_of_people_with_reduplicated_names

    which includes this completely outrageous name: Leone Sextus Denys Oswolf Fraudatifilius Tollemache-Tollemache de Orellana Plantagenet Tollemache-Tollemache

    Good luck parsing that.

  • Foo AKA Fooo (unregistered) in reply to Fire Mountain

    https://en.wikipedia.org/wiki/Lang_Lang

  • Chris (unregistered)

    Funny timing. I just had a go at a national fast food chain whose website wouldn't allow my 2-word last name because it has a space in it (or any punctuation). Not that I was upset about it, but perplexed that it's an issue in 2021 (and not in some mom-and-pop store, with a website built by their nephew who is "good with computers").

    We can't assume a last name is only one word. Can we assume a first name is one word (even if hyphenated)? Not counting middle names. I can't think of an example off the top of my head, but I doubt it. Even if you could, you just know someone somewhere has entered a hyphenated or concatenated name as two separate words, even if it shouldn't be.

    So how do you parse a string containing the whole name, which has more than 2 words, to determine what the first name is? Obviously not this. I guess in the absence of more information, the best you could do is return the first word.

  • ismo (unregistered) in reply to Fire Mountain

    If first name is same as surname then the method actually works, the result is still the same (buggy algorithm just happens to create correct answer which is also kinda sad).

  • Jinks (unregistered)

    Try living in a country that insists on everybody having two surnames when you only have one.

    eg. Spain.

  • D J Hemming (unregistered)

    We had one old system break because it assumed surnames would never be more than 30 characters and one Iberian guy had something like "De Valadares y Seguenda Villaraya Garcia Marquez"

  • RLB (unregistered) in reply to Chris

    Can we assume a first name is one word (even if hyphenated)?

    Nope. There's a guy in my chess club who goes by two first names, unhyphenated. And not just the first one on its own, either.

  • (nodebb) in reply to Fire Mountain

    Cheating slightly because he is fictional: Catch 22 has a character who was named Major Major Major and who was accidentally promoted to major, so he is Major Major Major Major.

  • (nodebb)

    Certainly some people are named "Mary Ann" as a first name in addition to a middle name.

    I think the correct answer is that names cannot be parsed or manipulated for display. Many modern tools ask for multiple versions of a person's name for different purposes. For instance, Slack asks for a "full name" and a "display name" (which they indicate could be a first name, or whatever you wish to be called)

    Of course, entity resolution is an entirely different situation which could call for considerable manipulation....but the results probably shouldn't be shown to the user.

    Addendum 2021-08-31 10:36: (meant that Mary Ann would also have a separate middle name)

  • Sou Eu (unregistered)

    I brought my wife to the US from Brazil on a K-1 visa. She and our kids are Brazilian citizens (our children are also US citizens by virtue of my citizenship). Brazilians generally receive one surname from each parent (mother followed by father) and this may sometimes include prefixes ("Gomes dos Santos" for example). All Brazilian documentation for my wife and kids have a single given name and two surnames, while US documentation has given names and one surname. It makes international travel fun.

  • Barf4Eva (unregistered)

    Let me just say... writing a good name parser is a bitch. A real bitch. Wish google api would come up w/ something to meet the need. Anyone have any suggestions out there? Thanks.

  • Rahul Chauhan (unregistered)
    Comment held for moderation.
  • kelsieb (unregistered)
    Comment held for moderation.
  • ichbinkeinroboter (unregistered) in reply to Fire Mountain

    Be plenty in Scotland and Ireland ... e.g Alastair McAlastair, Connor O'Connor, Donald MacDonald....

  • Sanjeev Kumar (unregistered) in reply to sibtrag
    Comment held for moderation.
  • Clubs21id.com (unregistered)
    Comment held for moderation.
  • Chamakta rajasthan (unregistered)
    Comment held for moderation.
  • (nodebb)

    The longest surname I can come up with that is a proper substring of a fairly common given name is "Hall", which can be found in names like "Challis", "Dashall", "Halle", "Khallil" and "Marshall". Can anyone match or beat four characters? Presumably, non-Roman alphabets are fair play.

  • (nodebb) in reply to Jinks

    I've met a guy who had NO surname. Apparently an effect of being born without a legal father in his culture.

    So yes, parsing names is always impossible.

  • (nodebb)

    Slightly off topic but related: Password rules. I recently had trouble because my password didn't fulfill the "no spaces" rule. Only... It was my existing password and the rule was being checked on the login mask, not the register mask.

  • eric bloedow (unregistered) in reply to Fire Mountain

    somehow this made me think of "Dr. Chandrasekharamphili", the full name of Dr Chandra from "2010". (that's in the novel)

Leave a comment on “By Any Other Name”

Log In or post as a guest

Replying to comment #:

« Return to Article