Darwinweb

Subsets of Associations in ActiveRecord

September 4, 2006     

I’ve recently bumped up against my first real issue with ActiveRecord. It’s not that a work-around is particularly ugly or inefficient, but there’s the psychological effect of losing some of ActiveRecords’s cleanliness. Consider the following scenario.

Artists have many Events.

Over time the number of events in the database grows and grows, however, 95% of the time we are only concerned with events in the future.

First Stabs

Now coming out the PHP world where I wrote a lot of SQL by hand, I’m quick to use eager-loading. So in my controller I originally had:

@artist = find(params[:id], :include => :events)

This fetches all the events in a single query, but it’s fetching an awful lot of rows we don’t need. For some types of relationships this extra data is probably unimportant, but in this case it really seems like senseless overhead.

Optimizing the Query

The first solution was to simply add conditions to the query like this:

@artist = find(params[:id], :include => :events, :conditions => "events.date > NOW()")

This works as long as there are always future events for any particular artist. But if there aren’t then no rows get returned. The SQL looks (roughly) like this:

SELECT * FROM artists LEFT OUTER JOIN events ON artist.id = events.artist_id WHERE artist.id = 1 AND (events.date > NOW())

Of course the actual ActiveRecord query explicitly selects the columns it needs and aliases them to prevent column name collisions, but the essence of query is accurate. The flaw with this query is simple: if no events match the criteria then no rows are returned at all, so we don’t even get the artist information back. As long as you have no conditions on events, the problem is automatically taken care of by the semantics of the LEFT OUTER JOIN, which is that it creates null column values for the joined table the case that nothing matches the ON clause.

Fixing the Query

The ON clause is the key. Everything works great if the query is restructured as:

SELECT * FROM artists LEFT OUTER JOIN events ON artist.id = events.artist_id AND events.date > NOW() WHERE artist.id = 1

It’s so easy in SQL, but in ActiveRecord you have to think about how to integrate it into the ORM layer. Digging into the ActiveRecord::Associations API docs the first thing that jumped out at me was adding the following to my Artist model:

has_many :events, :conditions => "events.date > NOW()"

Which has the effect of working not only with eager-loading but also just-in-time loading when Artist.events is called. In fact, it works so well that getting all the events becomes nearly impossible. It breaks things like :dependent => :delete_all (ie. deleting events when the artist is deleted). As far as I can tell, there’s no way to get around that condition other than going directly to the Event class itself which opens up a whole world of new problems. The lesson here is that conditions on an association fundamentally limit the association itself, to the point where the presence of the foreign key in that table becomes a moot point as far as ActiveRecord is concerned. I’m hard pressed to think of a situation where this would really be useful (by itself anyway! See below for the latest update!), but I digress.

Subsets of Associations

What I’m really trying to do is get a small subset of an association cleanly and efficiently. ActiveRecords association semantics are really based around the idea of the complete collection being available through the added methods. So rather than fight the paradigm, my solution is to simply define additional methods to implement subsets of a given collection as necessary. The following meets my needs fairly cleanly:

def upcoming_events
  if @upcoming_events.nil?
    @upcoming_events = events.find("date > NOW()")
  end
  @upcoming_events
end

Since I (currently) only use events on a single artist page, this just adds one query which isn’t a huge deal compared to the productivity I’m getting out of ActiveRecord. However, eager-loading becomes much more of a necessity if you are fetching collections of collections and you hit the dreaded n+1 problem. It appears this problem is far from solved… or is it?

Multiple Versions of the Same Association

As of five minutes ago this article was already over, and I had settled on the previous solution. However, I suddenly had a new insight to the problem. I wanted a subset of an association without losing the full association. In true eureka! style I feverishly entered:

has_many :events, :dependent => :delete_all
has_many :upcoming_events, :class_name => "Event", :conditions => "date > NOW()"

Sure enough, ActiveRecord engineering comes to the rescue. I tested both associations and they work as expected! I wouldn’t be surprised to run into some gotchas down the line, but on first inspection this appears very robust.

Further Reading

http://api.rubyonrails.com/classes/ActiveRecord/Associations/ClassMethods.html
(Association API docs are a must read)

http://www.infoq.com/articles/Rails-Performance
(Contains some interesting notes on eager loading and the n+1 problem among other things)

Dr Nic says…
September 4, 2006 at 10:57AM

Good summary!

Tim Lucas says…
September 4, 2006 at 6:11PM

A somewhat cleaner implementation of your upcoming_events method might be:


has_many :events, :dependent => :delete_all do
  def upcoming
    @upcoming_events ||= find(:all, conditions => "date > NOW()")
  end
end

but that still doesn’t help you with eager loading.

If only we could do something like:


has_many :events, :dependent => :delete_all do
  subset :upcoming, :conditions => 'date > NOW()'
end

@artist = Artist.find(:all, :include => 'events.upcoming')
@events = @artist.events.upcoming
Gabe da Silveira says…
September 4, 2006 at 11:23PM

Tim, great stuff.

I had forgotten my ruby idioms, and I have to admit to missing the entire section on association specialization.

I’ve continued working with this, and the worst thing so far has been column name ambiguities (eg. I set up a recent_events association at the same time) which prevents eager loading of two associations on the same table at the same time. Fortunately I can eager load one and implicitly load the other at call-time so I’m no worse off then when I started.

josh susser says…
September 5, 2006 at 10:37AM

I use that trick often. I refer to it as specialized associations. You can see several examples of it on my blog, or look at some of the ActiveRecord unit tests. It works great for a lot of things, though as you note it’s not good for restricting assignment of new objects to the collection.

Gabe da Silveira says…
September 5, 2006 at 10:23PM

Thanks for the comment Josh. I knew someone had to be doing it, but didn’t know where to look. Your blog is a great antidote to the myth that ActiveRecord sucks for relations.