Darwinweb

Implementing Advanced Search in Rails using Search Models

May 13, 2007     

ActiveRecord’s philosophy is to make the easy things easy, but leave access to raw SQL for the hard stuff. This is a good philosophy because SQL provides much richer access to data than any object-oriented language can hope to do. But just because SQL is more powerful from a data perspective doesn’t mean we enjoy typing out the same boring queries ad nauseam. ActiveRecord handles the most common queries with ease, but more importantly it uses type information from your database to abstract away the type issues that arise from interfacing data over HTTP. It proves that ORM can really make programming easier and less error-prone.

The tradeoff is that any complex SQL has to go through find_by_sql or ActiveRecord::Base.execute. Not a problem. But what if you don’t want to give up the sugar? What if you aren’t just doing a single complex query, but you want to provide a rich querying interface to the user? What of “Advanced Search”?

ActiveRecord::Base.find takes a set of options that makes it very easy to build a complex query dynamically. I consider eager loading via the :include option to the be the lynchpin of the whole scheme. After all, queries that include associations are probably more common than queries that don’t. You could work around that to some extent with explicit :joins, but it’s no substitute for getting full models back. I particularly like the hash-based conditions added in Rails 1.2 as it makes it possible to easily merge conditions without resorting to :with_scope hackery.

An Advanced Search Recipe

What I’m looking for is a solution with the following criteria:

  1. Make it easy to create persistent search forms.
  2. Must work with eager loading.
  3. Must work with limits and offsets for paging.
  4. Easily search on the widest set of possible criteria.
  5. Work with the simplest possible URL query strings.

The first criteria hints that the search itself should be a model. Aside from making the form setup easy, it also makes it possible to save search criteria in the database as well as everything else AR models do (eg. validations). My first attempt at this created a model that inherited directly from the model it was supposed to search. This worked, but it was only a first step; the search model didn’t have its own table and so the search fields were exactly the same as the model’s fields.

The search class must be a unique ActiveRecord model. This allows us to explicitly define the search parameters we need. Some of them will likely be literal fields from the searchable model, but you can devise any kind of special search you want and map it to the necessary search conditions as demonstrated below. All the find options will be generated by the search model.

Code speaks louder than words, so here’s a rough example:

class PostSearch < ActiveRecord::Base
  LITERAL_CRITERIA = %w{id title}
  CONTAINS_CRITERIA = %w{body_contains}
  RANGE_CRITERIA = %w{created_at_gte created_at_lte}
  ARRAY_CRITERIA = %w{ids titles}
  SPECIAL_CRITERIA = %w{new has_comments}
  NUMERIC_ARRAY_CRITERIA = %w{ids}
  
  CRITERIA = (LITERAL_CRITERIA + CONTAINS_CRITERIA + RANGE_CRITERIA + ARRAY_CRITERIA + SPECIAL_CRITERIA).uniq
  STRAIGHT_CRITERIA = LITERAL_CRITERIA + CONTAINS_CRITERIA + RANGE_CRITERIA #With AR::Extensions help.
  
  #Array getters and setters
  ARRAY_CRITERIA.each do |w| 
    serialize w.to_sym #Store the array
    
    self.class_eval <<-EOF
      def #{w}
        self[:#{w}] || Array(#{w.singularize})
      end
      
      def #{w}=(value)
        if value.is_a? String
          self[:#{w}] = value.split(',').map(&:strip)
        else
          self[:#{w}] = Array(value)
        end
      end
    EOF
  end
  
  NUMERIC_ARRAY_CRITERIA.each do |w|
    #We are redefining the setter here... order matters
    self.class_eval <<-EOF
      def #{w}=(value)
        if value.is_a? String
          self[:#{w}] = value.split(',').map(&:to_i).reject(&:zero?)
        else
          self[:#{w}] = value.map(&:to_i).reject(&:zero?)
        end
      end
    EOF
  end
  
  #Pass in an assload of attributes and use 'em if possible.
  def set(params)
    params.each do |k,v|
      begin
        self.send(k.to_s + "=",v)
      rescue NoMethodError
        #This way we can pass in params willy-nilly
      end
    end
  end
  
  #Basic find method.
  def find(first_or_all,opts={})
    Post.find(first_or_all,options.deep_merge(opts)) #deep_merge is a recursive merge method I have in lib/
  end
  
  #Basic count method
  def count(opts={})
    Post.count(options(:count).deep_merge(opts))
  end
  
  def options(count_or_select = :select)
    {:conditions => conditions, :joins => joins, :select => select(count_or_select), :order => order}
  end
  
  private
  def conditions
    @conditions = {}
    STRAIGHT_CRITERIA.each do |f|
      @conditions[f] = self.send(f) unless self.send(f).blank?
    end
    ARRAY_CRITERIA.each do |f|
      g = f.singularize
      @conditions[g] = self.send(f) unless self.send(f).blank?
    end
    SPECIAL_CRITERIA.each do |f|
      unless self.send(f).blank? || (c = special_conditions(f)).blank?
        @conditions.merge!(c)
      end 
    end
    @conditions unless @conditions.empty?
  end
  
  def joins
    SPECIAL_CRITERIA.inject("") do |s,f|
      unless self.send(f).blank? || (j = special_joins(f)).blank?
        s + j
      else
        s
      end
    end
  end
  
  def select(count_or_select = :select)
    if joins
      return 'DISTINCT posts.*' if count_or_select == :select
      return 'DISTINCT posts.id' if count_or_select == :count
    end
  end
  
  def special_conditions(field)
    case field
      when "new"
        {:created_at_gt => Time.today - 2.weeks}
    end
  end
  
  def special_joins(field)
    mysql = ActiveRecord::Base.connection
    case field
      when "has_comments"
        "INNER JOIN comments AS search_comments ON search_comments.post_id = posts.id "
    end
  end
end

This example is pretty rough around the edges, but it is extracted and simplified from a live application. Most of what you see here will eventually be made into a plugin and given a nice declarative syntax. It also requires one core ActiveRecord patch to fully realize its potential. Here are the details.

Basic find/count/options methods.

These methods actually perform the search that is stored in the object. Usually you would just use the find and count methods directly since they will merge any additional options that are passed in. But the options method is public so you can perform additional voodoo, or at least see what options you are getting.

Set method

The set method is basically the same as attributes= method except it accepts invalid fields and ignores them. The purpose of this is to meet design criterion #5—simple URLs. This way anyone can type in a URL with a string like ?new=1&title_contains=ruby, and it won’t all blow up if there are other criteria (such as a submit button). Some might consider this sloppy, but I implemented it for legacy reasons and found that it doesn’t really bother me.

STRAIGHT CRITERIA

The STRAIGHT_CRITERIA are fields which can be used directly as hash pairs in the conditions field. Thanks to ActiveRecord::Extensions by Zach Dennis this works with a number of suffixes for greater hash-based conditions. I can’t overemphasize the power this brings to the entire solution. Without hash-based conditions, merging a bunch of variable criteria is annoying—either you dynamically build an SQL conditions string (yechh!) or you hack something crazy together with with_scope. Some people encourage using with_scope for all kinds of simple problems, but there has been a fair amount of backlash against it as well because it can obfuscate things. The official word is that it will be protected in Rails 2.0 which means only your models can use it. I’m not a fan of protecting programmers from themselves (this is Ruby after all), but I agree that with_scope is overused. In any case, it’s definitely overkill for the purpose of applying multiple conditions. The bottom line is ar_extensions largely eliminates the need for condition strings at all. Something like ['created_at > ? AND created_at < ?',min,max] is gracefully replaced with {:created_at_gt => min, :created_at_lt => max} which is clearly more flexible.

ARRAY CRITERIA

Hash-based parameters automatically accept arrays for any parameter and use SQL IN syntax to match against any item in the list. But that all breaks down for a persistable search, because the array will not be saved. Declaring the parameter with serialize will persist the array, but then you lost the core data type. For instance, you can pass a hash like {:id => [1,2,3]} to the conditions clause, but that can only be serialized to some sort of text field. My solution to this is to allow two fields. id is an integer field, and ids is a serialized text field.

To add some sugar, I dynamically add setters that will split strings on commas so that an array can be passed in as a flat string (for simpler URL query strings), and impose the correct type in the case of numeric fields (the example only supports integers though). I also create a getter that builds an array artificially from the singular field if there is no array value (eg. ids returns Array(id) if ids is empty). This last bit isn’t really satisfactory to me as far as the plugin is concerned, but it serves a purpose in my application.

SPECIAL CRITERIA

SPECIAL CRITERIA currently encompasses everything tricky. You can see a criteria like new is actually a boolean in the database, but it creates a specific condition on the created_at field. This is quite a powerful technique because it makes it easy to create a checkbox for new posts in the search form with all the fringe benefits of the form helpers and built-in Rails semantics. The best part is that only the search model need know what “new” even means. The controller just passes the parameters through and gets the results it needs.

To take this whole technique a step further is the special_joins method. Look at the has_photos field. This is another boolean, but it searches over an association. If you only want posts that have comments, the easiest thing is to do an INNER JOIN to the comments table. This works even if the comments are already joined on via a LEFT JOIN, but you must alias the table to avoid a name conflict. The only tricky part is once you introduce a LIMIT.

In case you haven’t noticed, eager loading associations with a LIMIT requires two queries. The first query grabs just the ids of the objects you are fetching, and then replaces the LIMIT and OFFSET clauses in the query with a simple WHERE id IN (_list of ids_) clause. This is because LIMIT only works on rows, and eager loading will introduce additional rows any time there is more than one associated object. If there were some way to say LIMIT ON DISTINCT posts.id then the two queries would not be necessary, but MySQL doesn’t support anything like that (do any DBs?). The problem in Rails is that the eager loading query does not use your specified joins, so in a vanilla Rails 1.2.3 the pre-fetched ids will not be restricted by an INNER JOIN. When the final query is run it will be restricted by the INNER JOIN and so you will most likely get fewer than the requested LIMIT of objects.

The fix for this is a very simple patch to ActiveRecord::Associations::ClassMethods::construct_finder_sql_for_association_limiting. If the second line of that method has one condition that checks for the presence of explicit joins it will work as expected:

is_distinct = include_eager_conditions?(options) ||
                     include_eager_order?(options) ||
                     !options[:joins].blank?

I currently have a monkey patch for this set up in my applications that use this search model technique. But I haven’t submitted a patch to core because I’m not sure what assumptions they are relying on here. Although the ActiveRecord test suite fully passed, I’m not 100% that it doesn’t break something somewhere.

Going Even Further

It’s possible to extend this technique further using GROUP BY and HAVING clauses, but that introduces more complexity and exposes more edge cases of ActiveRecord. I haven’t had a compelling use for those clauses yet, but I have experienced some problems with them to work in complex cases. I think there’s a lot more that can be done with this, but I won’t find out until I really need it.

Controller and Views Implementation

There are as many ways to implement the controller and views for the advanced search as there are to implement any other model-backed form in Rails. I love this solution because it doesn’t feel like I’m compromising anything. The issues I’m currently dealing with amount to usability questions.

RESTful routes

I haven’t made search controllers using RESTful routes yet for one simple reason: GET requests can’t be coerced into POST, PUT or DELETE via the _method parameter. The implication is that I won’t be able to search with a GET request without munging things. That is problematic because I’m doing a lot with APIs, and it’s annoying to have to implement everything via POST, even if for no other reason than you can’t type a POST request into a browser’s address bar for testing.

Sessions

I’ve been storing searches in the session. This is probably bad because it means if I push an update with a change to the search object, everyone who’s online could get a session error (we’re live in beta mode). But on the other hand, it prevents passing around really long query strings (think paging), and it’s super easy. The best solution might be to save every search to the database and somehow associate it with the current session so that it will be purged along with the session. Another option would be to give the search object the ability to serialize itself as a hash that could be passed to url_for—just spit out the non-blank fields.

Plugin Development

So far I’ve been very pleased with the power of this search technique, but I’m really looking to get it into plugin form for easier re-use. Formalizing it as a plugin requires a lot of decisions though. The different criteria types are critical to making it work, but right now they are sort of ad-hoc. I’ve changed the types several times already, and they still have a ways to go. I’m putting this out there now for feedback from the community, and hopefully inspire some discussion about how other people are handling advanced search on their projects.