Darwinweb

Two Phase Data Migration

August 4, 2007     

I ran into an interesting situation today, where I’m refactoring a has_and_belongs_to_many target into two tables. Originally I had:

class User < ActiveRecord::Base  
  has_and_belongs_to_many :idxes
end

class Idx < ActiveRecord::Base  
  has_and_belongs_to_many :users
end

But now I want:

class User < ActiveRecord::Base  
  has_and_belongs_to_many :associations
end

class Association < ActiveRecord::Base  
  has_and_belongs_to_many :users
  has_many :idxes
end

class Idx < ActiveRecord::Base  
  belongs_to :association
end

The interesting part of this is the data migration. Basically the idxes_users table needs to be dropped, but not before it’s used to populate the associations_users table. The has_and_belongs_to_many :idxes association also needs to stick around as the migration code is a lot cleaner if I can use the ActiveRecord methods instead of resorting to direct SQL.

This kind of bugged me because it means I can’t complete my code and data changes in one chunk. It needs to be broken down into two code updates and two migrations. But how to organize the split? I’d basically completed the code updates before thinking about this, so my choice was to update all the code and then remove the leftovers in a tiny subsequent update.

However for the future, I might try a different approach which is to create the minimal data migration to add the new tables and relationship data up front before working on any actual code changes. In this case the new database contents can silently co-exist with the old until the full code changes are done. Then I can deploy the new version of the app in one shot without any leftovers.

Pratik says…
August 4, 2007 at 5:28AM

It’s quite common practice to re-define model inside migration class. I think that should solve your problem.

Richard Livsey says…
August 4, 2007 at 6:57PM

As Pratik says, define your models in the migration as your migration shouldn’t care about how they are setup in your actual app.

This is also handy, as you can define helper methods in the migration model to tidy up the migration, but you don’t want them in the actual model itself.

In cases where the models db structure changes during a migration, you can call Model.reset_column_information to reload it.

Gabe da Silveira says…
August 5, 2007 at 1:05AM

Hmm, thanks for the tips guys. That approach never crossed my mind.

Gabe da Silveira says…
August 7, 2007 at 5:35AM

To follow up on this, I found the ideal solution in this case to be to extend rather than redefine the User class. I did this by placing the following at the top of my migration file (outside the actual migration class):


require 'user'
class User < ActiveRecord::Base  
  has_and_belongs_to_many :idxes
end

This way I didn’t have to redefine the whole class, only the little bit I needed. The ‘require’ is necessary otherwise defining the class will prevent the lazy-loading of the application model.