Lessons learned Migrating my SAAS to Rails 8

New Rails features are represented by stones Photo by Pixabay

My side project Abot for Slack, has been around for ~7 years, i.e., since Rails 5.1. It’s now yielding passive income while running on an almost complete autopilot. But I’m still keeping its dependencies up-to-date. The recent migration to Rails 8 was arguably the most impactful in the app’s lifetime. In this blog post, I’ll describe features introduced in the newest version of the framework and how they affected my project.

Kamal vs Dokku

TLDR initial impressions migrating Abot from Dokku to Kamal: it’s better, apart from one major flaw.

Abot started as a weekend project deployed to Heroku free dynos. Later, I migrated it to AWS EC2 with the RDS PostgreSQL database using Dokku.

Initial dokku setup

The initial Dokku setup long long time ago

Dokku requires a bit more dev-ops expertise than PAAS solutions, but after the initial setup, it offers a workflow almost identical to Heroku. I’ve used Dokku for over 4 years, and it provided a seamless deployment experience with very few glitches along the way. This is one of the OS projects that I regularly donate to.

But the times are changing, and Kamal is a fancy new tool that everyone is talking about.

Kamal’s great advantage is a declarative one-file config, providing an instant overview of the current state of the project. With Dokku, similar insight is scattered across several commands.

Another benefit is seamless deployment setup. Just point Kamal to a barebones VPS, and everything gets deployed via a single kamal setup command. Dokku requires manual installation on the server unless configured with a custom Ansible playbook or similar tool.

Yet another win for Kamal is support for multi-server architecture. With Dokku, you’re limited to single server deployments, so you have to fit all the services, including databases, caches, etc., on a single VPS.

One thing that’s missing from Kamal is the built-in Nginx integration. Dokku provides a customizable Nginx template allowing for fine-tuned configuration of headers, SSL, compression, etc. With Kamal, you’d need to manually configure Nginx in front of the proxy to achieve the same effect. Apparently, the thruster gem can provide similar features as Nginx, but I have not explored it yet. In the end I’ve reimplemented each of the Nginx config tweaks directly in Rails or Cloudflare.

But here comes the previously mentioned major flaw.

Kamal registry-based deployments

Dokku uses a git-based deployment flow, i.e. each git push triggers a new docker image build combining the base image with app-specific Dockerfile. A new docker image is built directly on the target VPS and kept locally.

Kamal requires a nonoptional registry.server (defaults to "hub.docker.com"), registry.username, registry.password config values. These are credentials for one of the commercial (DockerHub, GitHub, Digital Ocean) or custom docker registry service. It’s possible to spin up your own registry, but it’s cumbersome. Built images are pushed to the registry and later downloaded by a target host.

An issue is that the default Kamal configuration exposes Rails application source code in a public image hosted on Docker Hub. In theory, it could leak nonpublic projects if someone followed the Kamal tutorial without analyzing each step. I wish the Kamal team was more explicit about this behavior in the docs and learning materials.

A way to prevent exposing project source code is manually creating the private image up-front or changing the default visibility in DockerHub settings:

Docker hub privacy settings

DockerHub image privacy settings

A free DockerHub account can host only a single private image. Even the $11/month plan offers only 5GB storage, so with image sizes at ~1GB it might not be enough. A $16/month offer with 50GB space for private images could be necessary for many users. It might not seem expensive. But a choice between exposing source code or incurring monthly costs could be discouraging for someone just exploring the Rails stack.

Fortunately, there are rumors about the Kamal team working on a way to deploy containers, without relying on 3rd party commercial services. In my opinion, this change would significantly enhance Kamal’s appeal.

Goodbye Redis

Since I started working in Rails around a decade ago, the Redis + Sidekiq combo was the de-facto standard for most projects. Rails 8 challenges this status quo by defaulting to the solid_queue gem for background processing. Solid Queue uses SQL database storage instead of Redis. Together with mission_control-jobs UI it’s a drop-in replacement for Sidekiq. It means that we can now get rid of another process and dependency to maintain.

Caching was another area where I’ve seen widespread use of Redis. On the side node, reusing the same Redis instance for both Sidekiq and caching has always been a fun source of sneaky bugs when both processes’ data interfere despite the namespaces…

(╯°□°)╯︵ ┻━┻

Past traumas aside, with the introduction of solid_cache gem modern Rails apps can remove Redis from their stack, simplifying the infrastructure. While conducting my Rails performance audits, I’ve seen projects where Redis was responsible for a significant portion of monthly infra costs. Migrating to SQL-based solutions can result in a major cost reduction for many Rails projects. Also, as per Rails 8 marketing: “you can now cache weeks instead of hours worth of data”.

Here’s a breakdown of monthly AWS cost for ElastiCache Redis with a specified RAM vs RDS database disk space:

Size	Redis RAM	RDS disk space
1.3GB	~$23	~$0.16
6.3GB	~$110	~$0.78
52GB	~$900	~$6.5
210GB	~$3300	~$26

Size values were chosen to find matching ElastiCache instance

RDS incurs additional db instance costs, but on a larger, scale they will be insignificant. Cache-specific databases execute only simple queries over well-indexed tables, i.e. no joins or seq scans. That’s why caching database CPU/RAM can usually be much smaller than primary. But, workload and traffic patterns differ for each project, so please run your benchmarks before making any production infra changes. More cost-aware projects can also squeeze both databases on a single RDS instance.

I did calculations for ElastiCache, so you could slightly reduce the cost by running Redis on EC2 instances. But RAM will always be an order of magnitude more expensive than disk space.

I’ve seen comments that migrating to “non-memory” cache storage will result in a performance drop. So let’s see the numbers in action.

Measuring non-memory caching performance

I’ve run the following benchmark script against different Rails cache stores:

keys = []
json = File.read("sample.json")
Benchmark.bm do |x|
  x.report("cache") do
    2000.times do |i|
      if i % 5 == 0
        key = "test-#{i}"
        Rails.cache.write(key, json)
        keys << key
      end

      Rails.cache.read(keys.sample)
    end
  end
end

Please take this benchmark with a grain of salt. It’s a straightforward script that does not simulate how the cache store behaves under a constant load or when there’s more data.

It distributes read and write operations in 80/20 relation. Here’s the result for different store types:

Store	Total time	Avg. time
FileStore EBS	20.0s	~4ms
Redis/Memcache	20.1s	~4ms
FileStore EFS	22.4s	~4.5ms
SolidCache SQLite	24.4s	~4.8ms
SolidCache PG	27.05s	~5.4ms

For me, the main takeaway from the above table is that in the context of web applications, the caching layer is fast enough, regardless of the underlying store type. The amount of data that can be cached is vastly more important than sub-millisecond performance improvements.

Although the in-memory Redis/Memcache should be the fastest, they are on par with the File Store EBS. The reason is probably the lack of networking overhead for local file storage.

Another interesting observation is that the FileStore is consistently ~10-20% faster than SolidCache SQLite. It makes sense as SQLite is a layer of abstraction on top of the file system, so it incurs additional overhead. I’m not 100% sure if this metric will hold true for systems under production-like load. But if you want to squeeze out a few more microseconds then the the good ol’ ActiveSupport::Cache::FileStore instead of SQLite could be a better choice.

Please remember that AWS EBS (even gp3) volumes are notoriously slow. When running the benchmark locally on an MBP, I’m observing ~3x better performance for FileStore and SolidCache SQLite. So if you want to maximize the SQLite and cache performance, choosing a VPS provider with high-speed NVMe disks is probably worth it.

While we’re at data stores, I would like to highlight the AWS EFS file system. It’s EBS-equivalent but can be shared between multiple EC2 instances. It’s ~10x more expensive than EBS, but using it instead of PostgreSQL would mean saving money on database instances. It could be an interesting choice for projects using SQLite with multi-server architecture or for leveraging a shared file-based caching layer. I’m currently working on a blog post covering more details on using EFS with Rails. You can follow or subscribe to get notified when it’s live.

Using a custom compressor for Rails cache

Another exciting feature introduced in Rails 8 is support for custom cache compressors. I’ve built the rails-brotli-cache gem around this idea, so it’s awesome to see it upstream. The feature has not made it to the Rails guides yet, but hopefully, this PR will get merged sooner or later. As of now compressor config is described only in ActiveSupport::Cache::Store docs.

I’ve tested a few compression algorithms when working on rails-brotli-cache (benchmark). Choosing “the best one” is a matter of tradeoffs between performance and compression ratio. But zstd-ruby with level: 10 config is a quick & easy choice, offering speed and compression superior to both gzip and brotli.

lib/zstd_compressor.rb

require "zstd-ruby"

module ZSTDCompressor
  def self.deflate(payload)
    ::Zstd.compress(payload, level: 10)
  end

  def self.inflate(payload)
    if payload.start_with?("\x78")
      Zlib.inflate(payload)
    else
      ::Zstd.decompress(payload)
    end
  end
end

config/production.rb

config.cache_store = :solid_cache_store, { compressor: ZSTDCompressor }

Applying this config should improve caching read/write performance by ~30% and compression ratio by ~20%. If you want to use this feature make sure to include:

config.load_defaults "7.1"

in config/application.rb. This change is backward compatible, so you can start using the new algorithm without clearing the current cache contents.

SQLite vs PostgreSQL

In Rails 8 SQLite becomes a first-class citizen. This talk by Stephen Margheim provides an in-depth look at the changes introduced and explains how SQLite’s support and performance improved.

I briefly considered switching Abot’s primary database to SQLite instead of PostgreSQL. I’d love to cut the monthly RDS costs, but I’m not comfortable keeping the business-critical data on the server. “cattle not pets” they said. I’m used to regularly nuking and reprovisioning EC2 instances, so the chances of accidentally wiping production data skyrocket with SQLite.

litestream is an awesome tool for continuously backing up SQLite. But I’d rather not bet the project’s existence on it working seamlessly.

Another disadvantage of SQLite vs. PG is the lack of advanced monitoring. Tools like PgHero, (shameless plug alert!) rails-pg-extras, or pg-locks-monitor provide detailed insights into what’s going under the hood in PostgreSQL.

With SQLite going mainstream in the Rails world, hopefully, open source will soon catch up, and we’ll have more ways to analyze it. I’ve started building rails-sqlite-extras. But I don’t have much SQLite exp, so ideas on how to improve it are welcome.

For now, I’m happy to keep Abot’s solid queue and cache databases on SQLite and primary on PostgreSQL.

Summary

Rails 8 is a decent step forward. It makes things cheaper, faster, more productive. There are some drawbacks, like the Kamal mandatory registry, but I hope it will be sorted out soon.

Lessons Learned Migrating my SAAS to Rails 8