by dmundhra on 12/15/21, 8:07 AM with 78 comments
The common way I know is the access is given to a small number of trusted people who make the updates. Are there any other ways or best practices such that the small set of people don't end up becoming bottlenecks, apart from the potential risk of abuse of trust.
by quilombodigital on 12/15/21, 1:12 PM
by aggrrrh on 12/15/21, 10:02 AM
In my opinion such queries should be subject of established development process: 1. Commit your query to a repository 2. Send to code review 3. After successful review and merge it should be deployed automatically by ci/cd
It may be necessary to run query directly via console in some cases, though. But such query should be committed to the repository latter any way.
And of course you should use proper tools like comments suggest.
by cabraca on 12/15/21, 9:23 AM
by ungawatkt on 12/15/21, 10:23 AM
- reproduce the issue and fix locally, with prod data copied over if needed (better than guessing)
- save (back up) all data that is being changed somewhere, even if it's on your local machine temporarily (again, better than not having it)
- if your datastore has transactions, run the change and a confirmation query in a transaction/rollback several times before committing
- don't hide what you're doing, tell everyone who will listen that you're mucking about in prod
- if someone will hop on a screen share with you, let them
Main theme here, when making data changes, is test as much as you can and make sure folks know it's happening. If you're making data changes, the code has already failed, so don't feel bad doing what you need to do, but mitigate loss as much as you can. Shit happens, deal with it, but don't make it worse if you can.
by KptMarchewa on 12/15/21, 12:40 PM
In other case, you write small job called fixtool, that goes through normal code review process, then gets deployed once, runs, and gets deleted after it fixes the situation.
by gerardnico on 12/15/21, 8:57 AM
You can also use a specialized tool. https://datacadamia.com/data/database/migration#tool
Make a branch, test your code and deploy it.
by gwbas1c on 12/15/21, 2:09 PM
Is this about processes as a small team grows into a larger company? At a certain point your day-to-day software engineers will loose access to production systems. You will need to take away "ops" from the software engineers and have a dedicated "ops" team. Some places call the "ops" team "devops" for political reasons, especially if some founder has a chip on their shoulder about not having dedicated "ops." (Software engineers not touching production systems is industry-normal security practices.)
Is this about how to do the database migration safely? That really depends on your stack, business type, and scalability needs. Assuming you aren't running a hyper-scale product, and you're on a normal database, the easiest thing to do is to have "planned downtime" at a time when your load is lowest. Your business logic layer should return some kind of 5xx error during this period, and your clients / UI should be smart enough to retry after 1-2 minutes. If it's a minor update, (plenty of good advice in this thread,) the downtime should only be 1-2 minutes tops. (The only reason to plan and notify your customers is in case someone is trying to do something business critical and complains.) One thing you can do is "let it be known" that you have 5 minutes of planned downtime Monday night and Thursday night, and that's your window for anything that isn't an emergency.
Is this about the frequency of updates? This is a quality problem, and the only thing you can do is improve your testing and release process to catch these bugs sooner. This is the "growing up" that all small tech companies go through. As you grow, make sure to bring in some people who are mid-late career who've been through these changes. In short, you will need to introduce processes that catch bugs sooner, like automated testing and code coverage. You may find that your "test engineers" write nearly as much test code as your software engineers put into the shipping product.
by unklefolk on 12/15/21, 1:05 PM
by pingsl on 12/15/21, 12:07 PM
1. Please don’t issue I/U/D SQL statements directly. You’d better write up some programs acting as the “revert transactions” to do the data modification. In this way, you don’t need to grant the I/U/D privileges of the production tables to any user IDs. Instead, you give access to those programs. It will help you to remove the human error as much as possible. And the behavior of the programs is more predictable and consistent.
2. You should have a fixed change window for these kinds of data changes. You should not execute those “revert transactions” whenever you want.
3. Then you give the execution access of those programs to the people who need to do the work only during the change window. That is you grant the execute access to the user ID beforehand and revoke the access afterward. Since this is grant/revoke between user ID and programs, it’s much safer. If you have to grant/revoke between user ID and tables, there might be cascade effects.
4. Before the change, capture the “before” data and get ready the fallback script.
5. Don’t forget to turn on the audio options/tools during the change window.
6. If you guys work in a physical office, you can think about binding those revert transactions to a dedicated desktop.
I know these rules are complicated and tedious, but they could protect the team and the data as well :)
by baash05 on 12/15/21, 8:55 AM
by jacobwilliamroy on 12/15/21, 10:08 AM
by jkingsbery on 12/15/21, 3:02 PM
I think this was a common way of doing it 10+ years ago, but what I've seen work the best is to have your CI/CD process apply the updates. Assuming the migration scripts follow the same process as all your other code (code reviewed, tested, and are runnable), automatically applying these scripts is much less error prone than having a person apply them. Humans are pretty terrible at following a script, which is one of the things computers are great at.
Some objections I've seen to relying on automatic deployments:
- "But we should have a person run them to make sure they do they right thing!" Agreed, you should, but that should happen at test time, or maybe code review time, not right before you are about to apply something into production the Shift Left [1] principle applies to database changes as well as code changes
- "What if something goes wrong?" Well, what if something goes wrong with any of your deployments? Usually, you can roll back (either literally rolling back, or rolling forward, or rolling in some other direction that leaves your system in an ok state), look at the logs to see what went wrong, and try again.
- "But data can get messed up!" Data can get messed up by code changes. You should have a backup/disaster recovery process.
by nickjj on 12/15/21, 1:01 PM
In development I create the migration which ends up being a file that alters the schema. I run the migration command and make sure things work as planned and all tests still pass.
Then I push up the feature branch and it gets code reviewed. This review could either be by yourself or a team member depending on team size.
After everything passes on CI it gets merged and deployed. At this point it depends on how you deploy things but that could be to a staging environment where it runs which gives you extra confidence that it works before rolling it out to production or you could go straight to production if you don't have a staging environment.
As for how it gets run, usually it happens after the new version of your code is available to be pulled but before your web app gets restart. This way it can run while the old version of your app is running, then once it completes you can restart your web app to pick up the new code base that was part of the deploy.
If you're careful about how you migrate things you can do a lot of migrations without hard downtime, but it's usually a balancing act between how complicated you want the migration process to be (multiple steps / deploys) vs how much downtime you can live with (1 step / deploy).
Basically migrations are code that get treated like any other code deploy with a proper review. If it's an especially important migration it wouldn't hurt to get it reviewed by more than 1 person and also perform a production DB backup right before you deploy the migration.
by lucastech on 12/15/21, 8:27 PM
The steps I took were:
1. backup production before making any changes
2. import the production backup locally
3. test the script locally, QA the result to ensure desired outcome
4. repeat 2-3 until you feel confident that you're able to consistently get what you need from the script
5. wait until a low volume time for your platform (if possible), run a backup, import and test 1 more time locally, apply the change in production
However, after doing this, a coworker noticed something we had missed when QAing the data. As a result I had to revert the changes, I had a rollback query ready to restore the modified records.
As part of the planning for this, I added a new column to the table in question to track if/when a fix was made to the record. This allowed me to identify the records which were modified to roll them back. While this isn't always practical, it made sense long term for our situation.
A secondary option I've used in the past is exporting the table to be updated and then writing a simple script with the modifications necessary to generate the SQL to update each record individually. I've then used the same script and export to build a revert update query as well just in case something unexpected comes up.
At the end of the day, make sure you're backing up before any changes, plan ahead in case of unexpected errors so that you can revert quickly to avoid downtime instead of scrambling when it goes wrong. Test it a few times locally (change + revert) to try and sort out the issues prior to applying them in production.
edit: formatting
by cable2600 on 12/15/21, 9:14 AM
by zerocount on 12/15/21, 9:57 AM
I only had 2 of these in the 5 years I worked there, but here's an example. We had an internal purchase request system used for puchasing a keyboard or what ever you needed for work. Of course the purchase request went through a chain of approvals starting with your manager and eded with the CTO. The CTO threw a fit for having to approve keyboards and other small items, so it was deemed an 'emergency' to fix it right away. I had to immediately patch the code so he wouldn't see trivial requests. The 'fire call' allowed me to submit the code directly into production without going through the change control procedures, which only happened once per week.
And you better be damn sure that your changes are correct, crap rolls down hill very quickly when it involves very senior people.
by xchaotic on 12/15/21, 9:09 AM
by lmilcin on 12/15/21, 2:13 PM
What you do is you write any operations that are to be run on production as code (for example service that can be triggered).
You then get the code reviewed (second pair of eyes) after which it is automatically compiled and delivered to pre-production. On pre-produuction you have a chance to test it against full environment to see if it is breaking anything. If tests results are accepted it can be promoted to prod.
We actually have special service that is only meant to run one-off database operations. It is a webservice where you can trigger jobs.
Jobs do not take any parameters at all -- you just give the name of the job and that's it. This allows the entire operation to be reviewed in full.
Jobs can't run in parallel -- there is a queue and you can schedule a job to run. The job can also be saving progress (and it will restore from the saved progress in case it fails or is cancelled mid-operation).
by tailspin2019 on 12/16/21, 1:37 AM
https://octopus.com/blog/database-deployment-automation-adho...
by zv on 12/15/21, 9:01 AM
Either way, you need to designate someone who is knowledgeable to oversee the process. Automated deployment just makes this process way faster (thus "reducing bottleneck") where this person only needs to review code to see if anything suspicious isn't there and approve the deployment.
Manual deployments are prone to human error, especially under stress/time pressure. If manual deployment "package" (a set of sql scripts) is poorly written, there's huge incentive to "contact directly", which again could lead to manual errors.
The biggest drawback is culture which is the hardest ("we done this way for n years and we don't want to change").
by tristor on 12/15/21, 9:30 AM
In all cases though you should have a change management process where the schema change is documented, reviewed, and approved and only a small number of people have the necessary access to run these types of changes in production. Change management is your friend.
1: https://www.percona.com/doc/percona-toolkit/3.0/pt-online-sc...
2: https://www.percona.com/software/database-tools/percona-tool...
by PeterisP on 12/15/21, 11:25 AM
by brightball on 12/15/21, 1:07 PM
Migrations are a good place to store code for an update. If it’s an update that will take a while I might use the migration to enqueue a job.
by darkr on 12/15/21, 11:37 AM
I’m a big fan of sqitch[1] but many migration tools will handle data updates as well as DDL.
by winrid on 12/15/21, 2:31 PM
Former job eventually had a system of request - you would request access for your user for 24 hours.
Job before that only managers had prod db access. Do not recommend if possible.
by JensRantil on 12/16/21, 10:44 AM
[1] https://kevin.burke.dev/kevin/reddits-database-has-two-table...
by thiht on 12/15/21, 11:05 PM
Whenever I need to patch data on a Postgres, the first thing I type is ˋbegin;ˋ to start my transaction. After that I just run my updates safely and check that the impacted data matches what I expected. Then I just need to commit or rollback the changes.
2 things to do before that:
- have the queries reviewed by members of the team knowledgeable about the db - try the query on a dev environment (ideally a preprod with a replica of the data)
by rognjen on 12/15/21, 3:14 PM
If you mean one-off changes to existing data what has served me well so far is implementing an audit log from the start coupled with making sure that records are fudgeable by a select few admins.
This means that a few people can make changes to nearly everything from the interface rather than having to log into the DB directly. At the same time there is a record of changes done and the ability to comment on and revert them.
by fer on 12/15/21, 9:09 AM
by preetamjinka on 12/15/21, 8:11 AM
We have the same approach. A very small number of people have write access to the production databases. If things can’t wait for a schema change release (15-30 min) and can’t be done through the back office API, we do it manually. It’s very rare.
by Notanothertoo on 12/16/21, 4:23 AM
by plasma on 12/15/21, 8:27 AM
It should be a rare exception not a normal thing, so a few people is fine.
Best practice is to also have your change reviewed beforehand, and run in a transaction where you also validate expected results before committing, etc.
by Mister_Snuggles on 12/15/21, 1:36 PM
* User reports the issue to a business analyst via a ticket.
* Business analyst confirms the issue, adds any relevant details, and opens a ticket with the technical team.
* The technical team creates a script to perform the update and runs it in the 'support' database (which is refreshed nightly from Production)
* Business analyst validates results
* User validates the results
* Technical team logs a Change Request, it is discussed at the next Change Advisory Board meeting, then the script is run during the next change window (weekly). If it's an emergency, we can run it the same day.
* The Database Administration team runs the script in Production
It sounds like a heavy weight process, but with multiple people involved and actions logged at every step of the way it's a very safe and auditable process.
TL;DR: A small number of trusted people have the required access, there is a fair amount of business process built around using this access.
by wizwit999 on 12/15/21, 9:43 PM
by natoliniak on 12/15/21, 9:01 AM
by NicoJuicy on 12/15/21, 1:18 PM
by ItsBob on 12/15/21, 11:29 AM
I've never had to work on a 24/7 critical system so I dunno how to do that.
by tbrownaw on 12/15/21, 1:34 PM
by seancoleman on 12/15/21, 2:10 PM
by sgt101 on 12/15/21, 11:34 AM
In addition I would build a validation service that checks and tests the update data (the list of changes) before they are submitted to the change service.
I would not permit any adhoc changes to the prod database. The rest service should be the only way for a mutation to be done short of a complete new release.
by gizmore on 12/15/21, 3:34 PM
by mamcx on 12/15/21, 3:54 PM
Other commenters say a lot about PREVENTING an issue. That is good, but you can get stuck in a situation that (very common, saddly) you are under a pile of "stuff" that hide a simple fact:
You CAN ruin and revert nearly everything* in a RDBMs and still get on top.
* as long you are inside SQL/transactions and not messing with the binary data
---
The point here is that you can model the database in ways you get stuck with limited options, or instead make it easier to get of trouble.
The good thing is that is mostly apply good design practiques and is not that problematic.
The key is that data in RDBMs IS values (aka:inmmutables (rows, tables) on top of a mutable facade (INSERT, DELETES), so you can always take snapshot of data and apply reversible operations.
This is what most do with "copy to sql, csv, do a backup, etc" but is in fact more in-built than that. You can, totally, side-step extra operational complexity if you know this fact (and the change is kinda small. I don't say don't make copies or backup, instead, that SQL is VERY friendly to mess with it!)
The major issues is when you turn your powerfull RDBMS in a pathetic datastore, with a lot of MongoDb-style data/schemas, anti-patterns everywhere and close-to-zero understanding of (advanced)sql like usage of views, indexes, triggers, TEMP tables (note this!), administrative commands, write your own functions, know what tools the db gives for free, etc.
It sound like a lot, but even a "complex" RDBMs like postgres requiere far less study than learn JS. Is just a matter of understand that RDBMs are not "dumb" datastores.
---
I forgot to make this actionable. Most RDBMs have a comparable way to do this:
CREATE TEMP TABLE _address ON COMMIT DROP AS
SELECT customer_code, address_code, is_default FROM address;
Do the changes there. Your temp tables are "FREE" to get ruined.Then apply back with a update/insert from tables:
UPDATE address
SET ...
FROM _address
WHERE address.address_code = address.address_code;
You can ALSO copy to another schema (PG) or db (sqlite) and do the mess there: CREATE SCHEMA fix;
CREATE TEMP TABLE fix.name AS
SELECT ;
--or
CREATE DATABASE fix WITH TEMPLATE old;
And this mean you can also "rename" the old db/schema, do a copy of it, then see if the copy was ok, if not, drop the schema and rename again.This is how I done some of the most nerve-cracking fixes!