Two Phase Data Migration
I ran into an interesting situation today, where I’m refactoring a
has_and_belongs_to_many target into two tables. Originally I had:
class User < ActiveRecord::Base has_and_belongs_to_many :idxes end class Idx < ActiveRecord::Base has_and_belongs_to_many :users end
But now I want:
class User < ActiveRecord::Base has_and_belongs_to_many :associations end class Association < ActiveRecord::Base has_and_belongs_to_many :users has_many :idxes end class Idx < ActiveRecord::Base belongs_to :association end
The interesting part of this is the data migration. Basically the
idxes_users table needs to be dropped, but not before it’s used to populate the
associations_users table. The
has_and_belongs_to_many :idxes association also needs to stick around as the migration code is a lot cleaner if I can use the ActiveRecord methods instead of resorting to direct SQL.
This kind of bugged me because it means I can’t complete my code and data changes in one chunk. It needs to be broken down into two code updates and two migrations. But how to organize the split? I’d basically completed the code updates before thinking about this, so my choice was to update all the code and then remove the leftovers in a tiny subsequent update.
However for the future, I might try a different approach which is to create the minimal data migration to add the new tables and relationship data up front before working on any actual code changes. In this case the new database contents can silently co-exist with the old until the full code changes are done. Then I can deploy the new version of the app in one shot without any leftovers.