Join tables are a common citizen in Ruby on Rails apps. Their use case is to provide many to many relation between database models. Adding correct Postgres indexes on join tables is not obvious. I’ve noticed that some tutorials, Stack Overflow posts, and even Rails itself provide incorrect advice on how to do it. In this tutorial, we’ll look into how compound PostgreSQL indexes work and how to correctly use them with join tables.
We will start by explaining the basics of join models and later deep dive into analyzing the output of the PostgreSQL EXPLAIN query plan with different database indexes.
What is a join table and model in Rails?
Many to many relation is often necessary to model the business logic of a web app. Let’s illustrate it with a flagship scenario: a user that has many favorite products.
If you already have a Product and User models, you can well join them using a join table by writing the following migration:
Let’s look at the model files now:
With this structure in place, you can assign multiple users to products and vice versa:
If you need more fine tuned control you can use the join model directly:
You could use has_and_belongs_to_many relation with create_join_table migration helper method and omit to declare the
UserProduct model. I don’t recommend this approach. It might be quicker to start with but much more constrained in the long run. E.g.,
create_join_table generates a table without a primary key, so you couldn’t easily add a customizable join model afterward.
How to add proper indexes to the join table in Rails?
You might have noticed that we did not add any indexes to the
users_products table yet. For larger dataset, this setup could result in slow queries killing the performance of a web app. Let’s analyze how Postgres handles fetching objects through the join table without indexes.
Query EXPLAIN plan when indexes are missing
Unfortunately, ActiveRecord does not support
EXPLAIN ANALYZE out of the box. You can use this gem if you’d like to measure the actual performance of the queries. In the below examples, we’ll use the built-in
explain method, which displays the detailed PostgreSQL plan for executing the query.
As you can see in both cases, it is performing a
Seq Scan on
Seq Scan means that Postgres has to loop through all the rows of a table to run the query. For a larger dataset, it is usually terribly inefficient.
Let’s add indexes to see how it will change the behavior.
Query EXPLAIN plan with two compound indexes
We’ll start with the approach I often see mentioned in tutorials. Even Rails suggests adding two compound indexes when generating migration using
Unless you have a non-standard use case, you should add unique indexes to validate join objects on the database level.
algorithm: :concurrently is the best practice that allows you to add indexes even to large tables without acquiring a full table lock. Otherwise, migration could easily bring down your production.
You can check out my other blog post if you want to learn more exciting tricks on how to crash your production servers.
Let’s start by replying our previous queries:
As you can see
Seq Scan has been replaced by
Index Scan. But if you analyze
Index Cond, you’ll see that only the second query uses both fields from the compound
Let’s see what will happen if we query the other side of the relation:
The first query behaves as expected. It uses
Index Cond from
index_users_products_on_product_id_and_user_id index. But if you look at the second output, you’ll notice that it is still using the same index as the query executed in another direction!
It means that duplicating the compound indexes for join tables is unnecessary. Postgres is smart enough to use both keys of a compound index if they are matching, regardless what’s the order of columns in a query.
I’ve validated this behavior with PostgreSQL from version 9 up to 12.
Simpler SQL query
The above examples might be a bit convoluted. Here’s a more straightforward query you can use to replicate this behavior:
Both compound indexes were present when running those queries. One was working hard looking up the query results. The other one was sitting there, useless, doing nothing.
Usually, you can track unused indexes with tools that display those that are getting few hits e.g., Rails PG Extras. Unfortunately, detecting a partially used compound index is not that simple using automated tooling.
Correct indexes migration for join tables
To optimize usage of database resources, the migration adding a join table to your Rails app could look like that:
With those indexes in place accessing any side of relation will generate to following EXPLAIN query plan output:
Even with only a single compound index in place, none of the queries has to resort to performing an inefficient
Seq Scan operation. A single-column index
index_users_products_on_user_id handles finding products assigned to users.
product_id column from the compound index
index_users_products_on_product_id_and_user_id takes care of finding users assigned to a product.
The same compound index is used on both
product_id columns interchangeably for queries in both directions.
Keeping unnecessarily bloated and unused indexes not only uses up the disk space of a database but also slows down the write performance.
This might seem like a micro-optimization hardly worth the effort. Like with many PostgreSQL performance tips, they only make some actual difference when the dataset starts to grow. The best approach is to follow good practices from the start of designing your database schema.