Darwinweb

Using INNER JOINs to Simplify Conditions

July 6, 2007     

In my quest to squeeze every ounce of power out of ActiveRecord’s querying abilities, the INNER JOIN has really brought a lot to the table. Consider the following schema:

class Album < ActiveRecord::Base
  has_many :photos
end

class Photo < ActiveRecord::Base
  belongs_to :album
end

Now say you want to search for only the albums that have photos. You could do something like this:

Album.find(:all, :include => :photos, :conditions => '`photos`.id IS NOT NULL')

Of course, that depends on the joins from the eager-loading, so it won’t work if you drop the :include option. It also doesn’t work with hash-based conditions, which I’m a huge advocate of for merge-friendliness.

One way to tackle the first issues is by adding an extra LEFT JOIN that adds the necessary column for use by the :conditions. Like so:

Album.find(:all, :joins => "LEFT JOIN `photos` ON `photos`.property_id = `properties`.id", :conditions => '`photos`.id IS NOT NULL')

But a better solution is an INNER JOIN which automatically drops any rows not matching the ON condition:

Album.find(:all, :joins => 'INNER JOIN `photos` ON `photos`.property_id = `properties`.id')

If you also :include => :photos this will result in a name conflict, which is easily solved by aliasing the table name:

Album.find(:all, :include => :photos, :joins => 'INNER JOIN `photos` AS my_photos ON my_photos.property_id = `properties`.id')

As of Rails 1.2.3 this will not work as expected if you add a :limit clause, as explained in my previous article about advanced search. However, this issue is resolved in recent versions of edge Rails, and so it will be fully functional out of the box in Rails 2.0.

It’s also worth noting that this technique can be applied much more generally. The ON clause of the INNER JOIN can contain any other conditions you wish to apply. For instance, if you wanted to fetch the properties of the first 5 photos of the database, you could do:

Album.find(:all, :joins => 'INNER JOIN `photos` ON `photos`.property_id = `properties`.id' AND `photos`.id IN (1,2,3,4,5))

You could even include conditions that have nothing to do with the joined table, which would probably be bad form, but it’s legal SQL, and it could come in handy for certain situations where setting up your :conditions might otherwise be very difficult.

Important caveat

When you add :joins to an ActiveRecord query, it doesn’t change the :select clause automatically. It will continue to SELECT * which will cause all the JOINed columns to be returned as well. Sometimes this is handy, but usually it’s a nightmare because the joined attributes clobber the real ones. Most notably id exists in most tables, so it’s always clobbering your actual attributes. This breaks links in your results pages, and things can get really nasty if you save one of these bogus objects back to the database.

If you are eager loading anything than ActiveRecord takes care of everything nicely behind the scenes by being very explicit with it’s SELECTing. But if you are just doing raw :joins with no :include then there is potential for some real headaches.