by sajithw on 2/6/19, 6:44 PM with 54 comments
by Daishiman on 2/6/19, 9:30 PM
There are basically no solutions today that satisfy fundamental requirements such as minimizing downtime and guaranteeing correctness.
It is _such_ a huge problem that most inexperienced developers see kicking the problem down the line with NoSQL document storage as a viable alternative (it isn't; you'll be either dealing with migrating all data forever and special-casing every old version of your documents, or writing even more convoluted migration logic).
It's also clear that even the most modern ORMs and query builders have not been built in mind to consider the issues that arise in migrating data.
It would be a refreshing thing to see more research devoted to this problem. Unfortunately, migrations end up being so different from each other with such heterogenous requirements that we'll probably be working on this for a really long time.
by existencebox on 2/6/19, 9:48 PM
And honestly? I hate this answer. As a solo dev it's pragmatic, but the solutions described in this article are _SO NICE_ that I'd love to leverage them.
If there's any way that those deprecate_column and rename functionalities could make their way into OSS/upstream support, I'd have a field day. (Those who know more about PG than I do and perhaps may be able to suggest another workaround, feel free, I'm very much learning this space as I go)
If nothing else, thanks to the benchling team for taking the time to write such a clear yet technical expose. This really hit the sweet spot of "explanations without uneccessary verbosity, technical without being impenetrable, and giving sufficient explanations of motivations and tradeoffs/pitfalls" and will give me a north star for where I aim my own DB work.
by ThePhysicist on 2/7/19, 1:50 PM
Alembic is great for many simple use cases but we found that for a production system it often isn't easy to maintain compatibility between two different DB systems like Postgres and SQLite anyway, as that would mean either adding a lot of feature switches and custom logic to our code or not using most of the interesting native Postgres features. Therefore Alembic offered very little benefit over a plain SQL file in terms of functionality and in addition made it harder to generate correct migrations in some case, as the auto-generation process does not work very reliably in our experience and some things are buggy/inconsistent, e.g. the creation and deletion of enum types. In addition, we found that it's much easier to write complex migration logic (e.g. create a new column and populate it with data from a complex SELECT statement) directly in SQL. Last point is that we can of course execute these migrations using any programming language / tool we like (for example we also wrote a small Go library to handle the migrations), which is a nice bonus.
That said we also heavily use SQLAlchemy in our Python backend code and like it a lot.
by theptip on 2/7/19, 6:29 AM
One item I've been considering; under Downtime, a reason for flakes in migrations is "long running transactions".
I've seen this too, and wonder if the correct fix is actually to forbid long-running transactions. Typically if the naive long-running transaction does something like:
with transaction.atomic():
for user in User.objects.all():
user.do_expensive_thing_to_related_objects()
You can often recast that migration to something more like for user in User.objects.all():
with transaction.atomic():
user = User.objects.get(id=user.id) # Read the row to lock it; or do a SELECT FOR UPDATE
user.do_expensive_thing_to_related_objects()
This example is somewhat trivial, but in most cases I've seen you can fetch your objects outside of the transaction, compute your expensive thing, and then lock your row for the individual item you're working on (with a sanity-check that your calculation inputs haven't changed, e.g. check the last_modified timestamp is the same, or better that the values you're using are the same).I've considered simply configuring the DB connection with a very short connection timeout (something like 5 seconds) to prevent anyone from writing a query that performs badly enough to interfere with other tables' locks.
Anyone tried and failed/succeeded in making this approach work?
The other subject that's woefully underdeveloped is writing tests for migrations; ideally I want to (in staging) migrate the DB forwards, run all the e2es and smoke tests with the pre-migration application code, migrate back (to test the down-migration), run the e2es again, and then really migrate forwards again. That would cover the "subtly broken deleted field" migration problem.
But how do we test that our migrations behave correctly in the face of long-running transactions? I.e. what's the failing test case for that bug?
by ngrilly on 2/7/19, 8:39 AM
by Ixiaus on 2/7/19, 2:19 AM
[1] https://tathougies.github.io/beam/schema-guide/migrations/#a...
by benmmurphy on 2/7/19, 12:42 AM
by wwweston on 2/7/19, 2:16 AM
As far as I can tell, it's a really poor fit. It generates the expectation that movement of existing schema + maybe data from one host to another or one environment to another. What's usually happening instead is essentially a schema diff / mutation.
by danielbigham on 2/6/19, 9:23 PM
by bayesian_horse on 2/7/19, 6:42 AM
It's not for every project, certainly, and you sometimes need to work around limitations of the ORM. And of course some people don't like ORMs in the first place.
by seanwilson on 2/7/19, 4:06 AM