by w01fe on 9/24/14, 6:03 PM with 12 comments
by spoondan on 9/24/14, 7:51 PM
In consulting and mentoring on this topic, I've found a lot of engineers push back against how "dirty" it is to have multiple copies of the data around in different formats. It feels wrong to not have a single, authoritative data format at any given instant. If the idea is to change the column type, why not just `ALTER TABLE ... ALTER COLUMN` instead of `ALTER TABLE ... ADD`?
But if you think about it, excepting trivial cases, once you're migrating data, there are parallel realities at least for the duration of the migration and deployment. It's not a question of whether you create divergence by versioning/staging (in some fashion) your data. It's a question of whether you manage the divergence and convergence of the parallel realities that already exist as part of a migration. If you don't, you either incur downtime or risk data corruption.
One big win here is that, by being disciplined about your code and data changes, you can cleanly separate deployment from release. You can deploy a feature but have it disabled or only enabled for a subset of users. Releasing a feature means enabling its feature flag, not orchestrating a set of migrations, replications, and deployments.
by nostrademons on 9/24/14, 9:31 PM
by tcopeland on 9/24/14, 7:58 PM
When you introduce a new API endpoint or format
for data at rest, think hard
Yup. I've added columns where I've used a datetime where a date would have sufficed and then regretted it later once tons of data was already in the table. Or added a varchar(255) and only later realized that that wasn't big enough. Sometimes the wrongness of a type only becomes clear down the road. If you're designing an experimental server-side
feature, see if you can store the data off to the
side (e.g., in a different location, rather than
together with currently critical data) so you can
just delete it if the experiment fails rather than
being saddled with this data forever without a
huge migration project.
Yup, sometimes an extra join or lazy-load is well worth the isolation.by shykes on 9/25/14, 12:41 AM
So we shipped a migration routine which ran at startup every time and gave up (gracefully and atomically) at the slightest sign of trouble. Over time, we reasoned, each install would converge towards full migration, and the huge majority of containers would be migrated within seconds of the upgrade. The rest would be much easier to deal with if anybody had any trouble.
Of course we had the luxury of a data structure which allowed this.
by jamessantiago on 9/24/14, 11:55 PM
For the client side it's usually a good idea to specify a versioning relationship between server and client. AWS, for example, you request the API version you want to use: http://docs.aws.amazon.com/AmazonSimpleDB/latest/DeveloperGu...
by tieTYT on 9/24/14, 8:10 PM
I think that means you can have Process V1 and Process V2 running on the same server simultaneously. If they read from the same database, won't you run into issues?
by w01fe on 9/24/14, 9:07 PM