In Defense of ORMs
Laurie Voss just posted an article that ORM is an anti-pattern in which he calls out ActiveRecord (although nothing about ActiveRecord is specifically cited) as doing more harm than good. Some of the flaws he mentions are real issues, but he hand-wavingly dismisses the benefits without any real analysis. Let’s examine the claims:
Some ORM layers will tell you that they “eliminate the need for SQL”. This is a promise I have yet to see delivered.
Which ORM says this? This is just a strawman set up by painting ORM projects as making ridiculous claims. There are quotes around the words, but if you plug them into Google, nothing on the first page is a claim from an ORM, they’re all about eliminating [MS] SQL Server or SQL experts.
Others will more realistically claim that they reduce the need to write SQL but allow you to use it when you need it. For simple models, and early in a project, this is definitely a benefit: you will get up and running faster with ORM, no doubt about it. However, you will be running in the wrong direction.
Now he brings it back to reality and admits you can get up and running faster with ORM, but then claims you’re running in the wrong direction, full stop. There’s no reasoning why you are running just in the wrong direction, just a blind assertion.
“Code Generation” is also mentioned as a benefit, but without going into much detail. “Efficiency is ‘good enough’”, is mentioned here as well but seems out of place; it’s a known tradeoff, not something to be touted as a benefit.
Now getting to the meat of the article.
The most obvious problem with ORM as an abstraction is that it does not adequately abstract away the implementation details.
This is because there is a fundamental parity mismatch between SQL and imperative code. It’s certainly valid that ORMs are one of the leakiest abstractions imaginable, and therefore they are poor abstractions on the continuum of abstraction quality. That’s a fundamental challenge of using SQL regardless of whether you are using an ORM or not. Writing your SQL by hand does nothing to address that, you still have to convert back and forth between imperative code and data structures to SQL and flat rows of data. The challenge of doing this well and creating a nice interface is the whole reason there are so many different ORMs and why they vary so widely in form and function. If you reject ORMs you’re really just saying that you have a better way of doing it, at least for the project at hand, which may very well be true, or you may wind up digging your very own anti-pattern.
The whole point of an abstraction is that it is supposed to simplify. An abstraction of SQL that requires you to understand SQL anyway is doubling the amount you need to learn: first you need to learn what the SQL you’re trying to run is, then you have to learn the API to get your ORM to write it for you.
The premise of ORM is not that it precludes the need to understand SQL, anyone telling you otherwise is selling snake oil. You’re not learning “twice as much”, the ORM is there to provide a standard way to get your data in and out of an SQL database. It certainly is quicker to learn the primitive SQL driver functions, but all the time and more you saved by not learning the ORM will be spent figuring out how to construct SQL queries and map row data to objects. Again, you might create a more optimized solution for your project, but just as likely you will run into dead-ends and require refactorings because you don’t have the experience of thousands of developer hours with widely varying use-cases behind your homegrown solution.
A defender of ORM will say that this is not true of every project, that not everyone needs to do complicated joins, that ORM is an “80/20” solution, where 80% of users need only 20% of the features of SQL, and that ORM can handle those. All I can say is that in my fifteen years of developing database-backed web applications that has not been true for me. Only at the very beginning of a project can you get away with no joins or naive joins. After that, you need to tune and consolidate queries. Even if 80% of users need only 30% of the features of SQL, then 100% of users have to break your abstraction to get the job done.
Well, in my 27 years of programming (that’s right, I first sat down in front of an AppleSoft BASIC prompt when I was 6), I’ve found that on 34% of projects, we need 58% of the features of 66% of the RDBMS engines available.
But seriously, if you work with ActiveRecord for any amount of time you realize that assertion about the percentage of SQL feature coverage is meaningless. There is no such percentage. I’m hard pressed to think of a feature of SQL that ActiveRecord does not support. It’s true that there are some higher level methods (associations for instance) that do more with less code, but even those can take additional options that tweak the SQL in a variety of ways. Even when you drop down to the more basic building block methods (
#group, etc) you still gain the benefits of the ORM in that you can chain them with each other and higher level methods and not have to worry about munging your own SQL strings.
If your project really does not need any relational data features, then ORM will work perfectly for you, but then you have a different problem: you’re using the wrong datastore.
This is continuing with the straw man that ORMs don’t do joins, but they do. The subsequent line of thought follows directly from this fallacy. Then he comes to:
On the the other hand, if your data is relational, then your object mapping will eventually break down.
This is true in one sense. No ORM can make all possible relations be first class citizens in the library. At some point you have to drop to a lower level. Depending what it is you may need to drop down to the raw data, or maybe an intermediate level. In any case, this is not invalidating the other thousands of queries that for which your app happily makes use of the high-level ORM features. The two can very happily co-exist.
Death by a thousand queries
This leads naturally to another problem of ORM: inefficiency. When you fetch an object, which of its properties (columns in the table) do you need? ORM can’t know, so it gets all of them (or it requires you to say, breaking the abstraction).
So the solution is to force yourself to specify the columns every time without exception? Again, baby right out with the bathwater; it’s no harder to specify in an ORM than it is in raw SQL. Also you may end up shooting yourself in the foot because sometimes RDBMS software has optimizations around
SELECT *, and therefore you hurt yourself by removing columns unless you really are shaving a lot of data.
Initially this is not a problem, but when you are fetching a thousand records at a time, fetching 30 columns when you only need 3 becomes a pernicious source of inefficiency.
It’s not pernicious. You just add
.select('col1,col2,col3') after you profiled the action and realized this was the bottleneck. Or you could trace the usage of every query, optimize the
SELECT clause for each and every case, and endlessly refactor as your application and interface grows and changes. You could also write the whole thing in binary and I bet it would be pretty easy to squeeze out twice the performance in 100x the development time.
Many ORM layers are also notably bad at deducing joins, and will fall back to dozens of individual queries for related objects.
On the other hand if you are joining several relations to a base table, this is exactly what you want to avoid cartesian product explosion. Contrary to the implication, ORM developers do profile and optimize their code. You can often do a better for a specific case, but it’s far from being necessary every time.
The problem, I have discovered with experience, is that there is seldom a single “magic bullet” query that needs to be optimized: the death of database-backed applications is not the efficiency of any one query, but the number of queries. ORM’s lack of context-sensitivity means that it cannot consolidate queries, and must fall back on caching and other mechanisms to attempt to compensate.
The use of “magic bullet” here frames the idea of selective optimization in a negative light, because all programmers know there are no magic bullets. Unfortunately it’s a poor use of metaphor because a bullet refers to a solution not a problem. A more apt platitude would be that premature optimization is always bad. There is more likely more than one problem query in a large application, but there are also likely hundreds of inconsequential boilerplate queries needing no more fine-tuning than an index or two.
In any case, it would be disengenuous to say that ORMs don’t come with performance overhead, but it’s a measured tradeoff. There’s no cliff-dropping performance hit as indicated by the last vaguely damning sentence about using caching to “compensate”, whatever that means.
Solutions: Use Objects
If your data is objects, stop using a relational database. The programming world is currently awash with key-value stores that will allow you to hold elegant, self-contained data structures in huge quantities and access them at lightning speed.
Okay this is a bit offtopic for a rant like this, but I can’t let it slide. Your data “is” not either objects or relational. The choice of data structures and data stores is an abstract representation of concepts that can be interpretted many different ways. It really bothers me whenever someone offhandedly says that there are many types of data that aren’t relational. Quite frankly, I think that’s horseshit. Let me go on record that I think that there is no useful data that can not be made more useful through solid relational modeling as opposed to some object DB dump.
With a relational model you have a far more powerful querying paradigm than you have on any object data store. By its nature, a relational model is meant to slice and dice data in unforeseen ways. Of course, this flexibility comes with a price. The reason for all these new high performance data stores is to commit to a less-flexible data model that allows you to break out of the performance limitations of a strict ACID RDBMS.
The reason people start with a relational database is because it’s the most flexible way for your data to sit. You can pivot your startup to completely different purposes using the same data and all you need to do is change a few indexes around to support your new query structure. When you use some kind of document or object-oriented store, you are making a much firmer assertion about exactly how this data will be used, and what kind of queries and reports you intend to run from it.
The fact is that most applications will never outgrow the capability of a single moderately powered relational DB server. If and when you do, you will have scaling issues that don’t magically go away just because you chose the high-performance datastore dujour. You’ll have a lot of architecting and optimization to do regardless of the path you take eliminating bottlenecks, so I think it’s entirely justified to slap in an SQL database on day one with the realization that you may have to change it later.
Solutions: Use SQL in the Model
However, remember that the job of your model layer is not to represent objects but to answer questions. Provide an API that answers the questions your application has, as simply and efficiently as possible.
I would add changing state to this, but fundamentally I agree with the philosophy. Given the fact that interacting with a relational database can be messy, I think it’s very smart if you can keep the logic entirely contained in the model, and therefore I could accept that keeping ActiveRecord primitives out of your controllers and views might be a good idea. In practice I only work this way for more complex types of “questions” that I’m asking the model, and Rails sort of blurs the lines by defining a bunch of methods, which could theoretically be replaced if you ripped out ActiveRecord and replaced your backend with something completely different.
OO is itself an abstraction, a beautiful and hugely flexible one, but relational data is one of its boundaries, and pretending objects can do something they can’t is the fundamental, root problem in all ORM.
This closing statement implies a false belief about the design criteria and intention of ORM systems. It seems like a case of misattributing the beliefs of newbies who wish SQL didn’t exist to ORM developers who clearly have a deep understanding of SQL. In reality ORMs are mostly designed for experts. I haven’t used a lot of other ORMs extensively, so I can’t comment on the design criteria or optimal usage of something like SQLAlchemy, but for ActiveRecord and probably a lot of ORMs there are some clear wins:
- Automatically map database types to the equivalent programming language types and vice-versa. If you’re managing your own SQL then you’re going to waste a lot of time figuring this out. It’s one place where sensible defaults go a long way. Why debug the edge cases yourself?
- Greatly reduce error-prone SQL string munging. Assuming you have your pants on straight and don’t open up any injection holes, you still have to have complex string manipulation logic to compile conditions and joins, or else a bunch of very similar but different queries that are difficult to read and hard to refactor.
- Composability of queries. With a proper relational algebra engine, you can programmatically build up a query by chaining clause methods together. Doing so allows you to compose many complex queries and keep the code in the logical place without the design being driven by how you are organizing the SQL itself. An example would be if you have the concept of Authors who might have a “new” flag, and Posts that might have a “rant” flag. It’s clear that the definition of those flags belongs in the respective models, but what if you want to query both at once? In the case where you manually craft your SQL you have to put it somewhere, maybe in one model or the other, or somewhere else. You also have to decide if you are going to munge a bunch of conditions together in one query or split them out. Throw something like an advanced search form in the app and this becomes a minefield of niggling details.
- Simplify the simple stuff. There are tons of boilerplate queries in any sizable app. An ORM gives you a concise way to call them and get a standard default mapping of fields to attributes.
- Facilitate higher level patterns. The concepts of
has_manyin ActiveRecord are a simple encapsulation of very common patterns that come up again and again in relational databases. Through a series of simple options it allows you specify and execute a wide variety of queries for counting, instantiating, or loading multiple collections. They can be chained with the lower level functions, and always output correct and straightforward SQL. Sure they can’t do everything, but they certainly are capable of handling the majority of queries in a typical app quite elegantly.
The bottom line is that the concept of an ORM is a catchall for a range of projects that implement common patterns for using SQL in an object-oriented system. An ORM is nothing more than a codified set of patterns. Because of the nature of the relational model, there are many many different approaches one can take. To dismiss ORMs entirely is to dismiss SQL as well. You might claim that you don’t use an ORM in your code, but in effect you are just implementing your own idea of an ORM. If you are just peppering queries around your model then it might be a very lightweight ORM, but the fact is you are still Mapping Objects to Relations and vice-versa.