- Feature Articles
- CodeSOD
-
Error'd
- Most Recent Articles
- Secret Horror
- Not Impossible
- Monkeys
- Killing Time
- Hypersensitive
- Infallabella
- Doubled Daniel
- It Figures
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
[frist] comment deleted
Admin
Anyone else looking at that
and thinking: "Lovely, magic numbers"?Admin
Why does any human have credentials to directly access the database?
Admin
I'll hold my hand up and admit that I didn't even know you could do regex in SQL where clauses.
And now I know that you can I shall immediately and deliberately make myself forget it again, just in case there's ever a situation in which I might be tempted to use it
Admin
The database administrator will need direct access to the database server and the databases hosted therein, but should have no access to the middle-ware code base, which is where that level of logic belongs. Also, any time the DBA logs into the database server should be logged in a tamper-evident way.
Admin
To be fair, that is often hard to avoid in SQL without either breaking normal form or spending half your query time comparing strings.
Admin
claimnum is a string, not some autoincremented number?
Admin
To fix things when our Robotic Overlords go on break and leave Artificial Stupids in charge.
Admin
The one mercieful thing about this world is the inability of the human mind to comprehend it.
Admin
That's actually perfectly normal, if you have a domain entity which has a property of an enum type, you want to store that enum as the backing integer, not as its text representation, for space/performance/ease of constraints reasons.
However if the table is queried by reports, etc., a second, calculated column can/should be created which converts the number to a string, for ease of querying outside of the code base (eg. analysts looking at data for whatever reason).
Admin
@Nutster - "The database administrator will need direct access to the database server and the databases hosted therein, " WHY? Give me a reason why all access can not be done through services with complete audit tracking (with autogenerated, installed and maintained passwords)....
Admin
The better solution to the DB access restriction problem rather than just giving access to only a limited group of DBAs who then become a bottleneck, is to have a DB schema/SP/view/etc. modification monitoring mechanism. If somebody makes a change, it gets reported to the stakeholders, and the user making the change has to justify the change (release X, emergency bug fix Y, etc.)
Admin
When it comes to production systems, including databases, but also VMs, etc., nobody's main login (say, primary AD account) should be allowed in. Instead, a temporary access grant system should be in place, preferably accessed with a second password, not the main AD password (like jsmith-admin account in addition to jsmith). Basically, I agree with auditing, temporary auto generated passwords, etc., but often people need to just query the data to troubleshoot an issue and you just need a SQL client like SSMS.
Admin
Depends on your database. MySQL, for example, has enums precisely for this case: https://dev.mysql.com/doc/refman/8.0/en/enum.html
Admin
@Mr. TA - We partially agree. The difference is in the implementation (I have been using the "Service only" approach with major clients for close to a decade). Even queries can e done by the service with full audit - If the person deos "Select Name, Salary from Personnel" they better be able to have a good reason for it, regardless of their role!). Given a series of identical environments from "Dev"->->->"Prod" much of the diagnosis should be doable in a pre-production environment (if it is not, then that should raise questions about how to address). Having the ability to move SANITIZED data from later stages back to previous stages.
Admin
"Folks who first learned to type on typewriters tend to prefer putting two spaces after a period."
Except the metric speaking world already switched to one space in the mid-to-late 1950s. I doubt there are many people left who were taught typing before that. TBH: I never even heard of two spaces after a period, until now. And I learned to touch-type in 1984 IIRC.
Admin
"Regexes are expensive at the best of times," -- Wrong! Actual regular expressions are fast, O(n) search. (Pattern compilation can be more expensive, but for constant patterns, like here, that's a one-time cost.)
Unfortunately, in the last few decades, most software systems have replaced regular expressions with some kind of generalized string-matching functions, confusingly using a similar syntax and the same name as regular expressions. They have more features, but at the cost of worse, often much worse, performance, even when those features are not used (like here).
A good library such as RE2 should have no problems with such kinds of things. (Of course, in this case the blame would go to the database server; AFAICS SQL doesn't mandate any non-regular features, esp. back-references, which would preclude using a pure RE library.)
For those who are interested (rather than shouting "two problems"), this (long) article covers the relevant stuff: https://swtch.com/~rsc/regexp/regexp1.html
Admin
Doing the RegEx doesn't strike me as the most expensive part of the new query. I'm assuming the Claims Number field had an index, or was part of a composite index. Uppercasing a Claims Number and transforming it with RegEx means any index on that field cannot be used anymore. The original query likely checked an index for matches with the user's input whereas the new query transforms the Claims Number into something that isn't indexed so the DB must scan the full table for matches. If I had to guess, that's the reason the query went from being fast to being... not fast.
Admin
Doing some regex in the where conditions won't be what killed this query's performance.
Doing transforms (expressions, function calls) on columns defeats any defined database column indexes and results in a seq scan.
However, it's possible (at least in PostgreSQL) to define an index on an expression, like this:
CREATE INDEX idx_claim_claimnum ON claim upper(REGEXP_REPLACE(claimnum,'( ){2,}',' '))
After this, the query should perform just fine. It's also better than defining an extra "indexed column" and updating that from the application or a trigger, as the DB can likely update an index more efficiently.
Admin
Yes with very sensitive data like that, it makes sense to restrict/audit even querying data, however this is not very likely to happen. The example you cited, getting your colleagues' salaries, is usually impossible because the company the programmer works for probably has a 3rd party HR system like Workday and the programmer in question does not maintain Workday.
We tried the data sanitizing at a couple of companies and it just doesn't work well. The cost of maintaining the sanitization scripts is enormous; they run for a long time, making it slow for a programmer to troubleshoot a production issue; and very often, you need to debug a problem which concerns a particular record, which has a name - eg. Initrode account has a problem - then you restore a sanitized DB and it's ALSKDJ instead of Initrode, now good luck finding the record and what the problem is.
Admin
I guess the obvious solution would have been to fix existing database entries, then enforce the single-space rule for the data entry and the query string whenever they're done in the future (or just do the regex call on the :1 query field).
That audit feature sounds great. You'll be able to see who destroyed your entire database! :P But seriously, good for tracking down "what changed?" and maybe a stepping stone to figuring out "why?" for the little stuff.
Admin
A similar, but more reasonable, example from my work: Data is stored with a date field. Customer wants report grouped by the day of the week. Problematic solution - put an EXTRACT('dow' FROM entry_date) into the GROUP BY section. It worked fine in the test data...
Working solution: Create a day_of_week field and put EXTRACT('dow' FROM new.entry_date) into an ON INSERT trigger.
Admin
The low-tech DevOps approach to this is to use a collaboration tool like Teams or Slack, create a channel for managing a specific database server, grant admins permissions to post in the channel, and configure a bot to run any posted script and post the responses back to the channel. It easy enough to set up that people shouldn't have any excuses to not do this.
Admin
And then somebody changes the data type used to represent this enum in the application, and all of your FALSEs turn into FILE_NOT_FOUNDs.
Seen it a lot with Java, especially back when you had to write your own enum translation code. I'm sure it happens in every other language.
Disk is cheap. If you can't afford to store a dozen bytes per enum value, you shouldn't be in business.
Admin
Converting an indexed lookup to a linear table scan (by virtue of a regex in a where clause) is "horribly expensive" for any non-trivial table size.
Admin
@Mr TA -- You must be correct, and I must not have been running a successful company with hundreds of happy clients for the past few decades. My apologies.
Admin
@Jamie, Yup, that is a simple approach, and one that can come in handy for many situations.
Admin
@kdgregory - Compromise solution... Have a table with the Enum strings, and then use the integers elsewhere. Query (And join) on the String. Can almost always beat the performance of "use string everywhere" and avoid the risk of "use just the number"...
Admin
Because at some point, the database will go down, and that’s the guy that’s going to fix it.
DBAs do a ton of background work that requires direct access.
Admin
I still type 2 spaces after a period. There, I just did it again. Even though spaces aren't supposed to be used to provide padding. Isn't that still taught as proper in writing basic essays, etc.?
Admin
Not when I was at school, but I went to school when you still had to write on paper. The arguments in favour never made sense, they tend to depend on either a) fonts where the ink part of a period is really huge, or b) proportional-fonted typewriters.
It's pointless using double regular spaces in most web contexts, though. They get fused into one. You have to use one of the other space characters, and might have to protect it with an entity reference. Which some markdown implementations can't understand.
Personally I wish keyboard layouts had curled quotes, guillemets, a variety of spaces, etc.
Admin
Why would it be? Who still types up their essays in a monospaced font without any kerning? That's why the convention got started in the first place: technical limitations in typewriters.
Admin
PostgreSQL has them, too, and they're proper types, not (like MySQL booleans) just reskinned ints.
https://www.postgresql.org/docs/current/datatype-enum.html
Admin
There's still a moderate argument for two spaces instead of one. Imagine a scenario where a period is present, but doesn't end a sentence. Isn't that the same reason it was done in the first place (on typewriters)?? I still prefer two spaces, because a single space makes things look too crowded.
Admin
What do you mean "somebody changes the type"? The gremlins? Ghosts? "If you can't afford to store a dozen bytes per enum value, you shouldn't be in business." - right there you demonstrated a terrible lack of understanding how databases work. It's not like storing movie files on a home desktop hard drive. More like, "if you can't stop yourself from arbitrarily changing enum variable types in code, you shouldn't be in programming."
Simple rule: enums use language enums in code, and backing integers in database, across all code, period. No changes, no problems.
Admin
Jeez, take it easy, mister successful entrepreneur. All I said was in my experience I see somewhat different circumstances. What's more important is that I think we agree on 80% of it.
Admin
That's actually the best way to do it, especially in localized situations. You can have the main table and labels table. The only caveat is making it clear to the team which one of these lookup tables are linked to enum types and are therefore hard coded and perhaps drive business logic, and which ones are user (or admin) configurable and are merely id-name pairs. Usually if there are no database querying requirements, I skip creating these enum lookup tables, specifically to avoid confusion between them and regular lookup tables.
Admin
@Mr. TA.... Fair enough.... I was reacting to your post of "it just doesn't work well".. You didn't say, "it didn't work well for us", you made an absolute statement meaning here is no condition where data sanitization does work well...
What gets me, are statements (this one was not made by you) like "DBAs do a ton of background work that requires direct access."... And I think "really????".... EVERY DBA I have seen uses a client program of some type to access the data (e.g. SSMS, OSQL, or any of hundreds of others). Anything that these standard (often supplied with the database) client programs can do can also be replicated in a custom client, there is no magic to those specific programs.... (I admit I would be quite impressed, and terrified if I found a DBA who used http://www.winhex.com/disk-editor.html for their "Ton of background work" - but even this program could be duplicated with full auditing and access control).
Can it be a lot of work? Sure, especially initially; change is disruptive and hard. But once mindsets have adapted and investments made (including designing from the ground up for these types of paradigms), the costs plummet and the value remain.
Admin
I was still forced to use 2 spaces after the end of a sentence when I was typing up things on PC word processors in the 1990s. It's ingrained in me nowadays. Old standards die hard I guess.
Admin
Oh, do enlighten me. It's been nearly 20 years since a DBA tried to 'splain to me!
Believe it or not, the world changes and so do application requirements.
Admin
Because they are DB developers. Otherwise, cause laziness.
Admin
I agree with this. Seems simple and elegant, so I'm not sure the backlash here.
Admin
What a joy! Indeed my local Python instance shows exponential growth in time.
Admin
I'm not a DBA. The requirements specify what the application, including code and database and whatnot, has to do. They do NOT specify how the application does it. If an enum property/ column is no longer needed, no problem, remove it. If an enum property needs to be a regular lookup value, awesome, change it. Both of these changes mean refactoring the code and making database changes. Under no circumstance would a member of a functional team change the enum property to a string willy nilly.
Key word, "functional".
Addendum 2021-07-13 19:38: Regarding database, you do know about pages, indexes, seeks and scans, etc? Right? If so, do you really think VARCHAR vs INT is just "a few bytes of storage, you broke loser"? Or is there more to it than that?
Admin
Yeah that sounds fishy to me. Most maintenance stuff needs to be scheduled, eg. in MSSQL as jobs. If a DBA is constantly tinkering, it's a recipe for mistakes to happen, and that's assuming a well meaning professional DBA.
Admin
This is why you leave that task to a typesetting system such as LaTeX: It can handle "double spaces" as extended spaces at "end of sentence" position (without requiring them in the source!), as well as "half spaces" in the correct place, i. e., like here, in abbreviations containing periods like "z. B." or "z. Hd.", in titles like "Prof. Dr. sc. techn.", and so on. It also handles NBSPs, spaces joining numbers and units like "128 kg", and, if you have to, intendedly used multiple spaces. All that doesn't work correct in regular word processors, nor does on the web, without requiring lots of manual work and reconfiguration. But those things are aspects of typography, so they belong to a specific realm where appropriate tools exist to handle them in the desired way.
Admin
“ What gets me, are statements (this one was not made by you) like "DBAs do a ton of background work that requires direct access."... And I think "really????"
I may have misunderstood. Of course we use SSMS. I believe the original statement was referring to using scripted front-end software, and limited access roles - not really an option for the DBA.
We have God Mode for a reason. All other users should generally have restricted security roles. By ‘background work’, I was referring to server configuration, indexing, query optimization, and table structure manipulation.
We also generally keep a ‘back door admin’ connection available, in the event of total resource lockup.
Admin
I do use scheduled jobs for large tasks that might disrupt the users, but it’s generally better to be there to watch, in case of blocking issues. :)
Ideally, no changes would be needed after the original design/build, but in real-world companies, we often aren’t brought in until the initial ‘make it work’ structure is reaching the limits of sanity.
Tech debt is our job security.
Admin
I learnt my touch typing in Papua New Guinea in the mid 70's. Always two spaces after a full-stop. One of these days I will redo the training....
Admin
"the middle-ware code base, which is where that level of logic belongs"
No. If you use your database as a dumb object store, you use it wrong.