Heroku offers a robust backups system for it’s Postgres database plugin. Unfortunately, you can irreversibly lose all your data and backups just by typing a single command. It might seem improbable, but still, I would rather not bet my startup’s existence on a single faulty bash line. In this tutorial, I will describe how to set up a proprietary redundant Heroku PostgreSQL backups system to a secure AWS S3 bucket.
I will be covering various tools including AWS CLI, OpenSSL, GPG, Heroku buildpacks, and scheduler but you don’t need to be familiar with any of those. By following this guide, you will set up a reliable, custom backups system for Heroku PostgreSQL database even if you don’t have much dev ops experience up your sleeve.
Let’s get started!
How to lose all the Heroku Postgres data
Though the first paragraph is a bit “clickbaity”, it’s actually true. Just by typing:
you can completely destroy the Heroku app including all it’s add-ons, databases and backups data:
Read on if you’d like to safeguard yourself against a potential total loss of your startup’s data.
Set up an encrypted AWS S3 bucket
Our redundant backups system will periodically upload encrypted snapshosts of the PostgreSQL database to a secure AWS S3 bucket.
Let’s start with adding a correctly configured S3 bucket. For a more in-depth tutorial on how to work with AWS S3 buckets, you can check out my other article.
Make sure to disable public access and enable encryption for your new S3 bucket:
Add an IAM user
You will need Amazon AWS credentials to upload the backup dump to S3 bucket. One common mistake is to use your primary account credentials instead of creating an IAM user with limited permissions.
Check out the official docs for info how to add the IAM user. For the backups system to work securely, you should use a custom policy for the IAM user. Check out my other tutorial for more in depth info about configuring correct access rights on S3. To continue with the tutorial you can add the following inline policy granting the user permissions for the single bucket only:
Make sure to copy both
AWS Access Key ID and
AWS Secret Access Key generated because we’ll need them later.
Let’s move on to writing the actual backup script:
Backup bash script
We’ll need to install and configure AWS CLI buildpack to upload our backup file to S3 from the Heroku dyno.
Now, add another buildpack to enable Heroku CLI access from within the script:
You also need to set a secure password that will be used to encrypt the database dump files before uploading them to S3. You can use OpenSSL for that:
Just make sure to save this password somewhere safe. Otherwise that destructive one-liner will also prevent you from decrypting the secondary backup.
You must also set your Heroku app name because it will be used by Heroku CLI to download the latest backup:
Now let’s see the actual script
Depending on your S3 bucket location you might have to change
s3-eu-west-1 to other region.
Now make the file executable by typing:
Before you test it make sure you have the up to date backup by running:
You can try to run the script locally if you set all the correct shell variables.
Now commit the changes to your repo and deploy them to Heroku. You can now test your script in Heroku environment by typing:
After the script execution you should see your encrypted backup file on the S3 bucket!
The script is a bit complex so if you fail to upload the backup file to S3 you can try the following:
set -euxfor more verbose script output
- leave a comment, took me a while to make it work, happy to help here ;)
Implementation without additional Heroku buildpacks
You can avoid the Heroku CLI buildpack by using the
pg_dump directly on the database:
You might run into compatibility issues if your database version if different than one supported by the
pg_dump currently installed on your Heroku dynos.
To upload to the S3 bucket without installing the AWS CLI, you could generate a one-time upload URL in the backend and pass it to the script. Later you could upload the file via cURL. Details on how to implement it are beyond the scope of this tutorial. You can find tips on client-side uploads with pre-signed URLs in my previous post.
Use Heroku scheduler for automatic script execution
You can now run the backup script manually, let’s make it automatic. Heroku Scheduler is a
cron like tool to run Heroku jobs in predefined time periods.
You can add it by typing:
and configure it like that:
To make sure that the newest database dump will be stored daily you can schedule Heroku backup to take place just before the scheduler script execution:
That’s it. Now your Heroku PostgreSQL will be backed up daily to your own secure S3 bucket. You might also want to consider adding a bucket lifecycle rule to remove the older files and optimize storage costs.
How to restore Heroku S3 PostgreSQL backup
It is a good practice to double check that your backups actually work and can be restored in case they are needed. Let me show you how to do it.
AWS CLI configuration
AWS CLI will be needed to restore the backup. You can install it locally by following this tutorial.
Now authenticate the AWS CLI by running:
and inputting your IAM user
AWS Access Key ID and
AWS Secret Access Key. You can just press ENTER when asked to provide
Default region name and
Default output format.
When it’s it up and running you can now generate a short-lived download URL for your encrypted backup file. Let’s assume that it’s S3 path is
s3://heroku-secondary-backups/heroku-backup-2019-06-25_01.30.gpg. You can download it with the following command:
Once you have it on your local disc you can decrypt it by running:
Now you have to upload the decrypted version of a backup back to S3 bucket, use it to restore Heroku database and remove it from the bucket right after its been used. We will start with testing it out on a newly provisioned database add-on:
You can now check if the content of your database looks correct by logging into it and running some queries:
If everything looks OK you can now restore the backup file to your production database:
Alternatively, you could promote the new database add-on as your new primary database:
I hope this blog post will help you secure your Heroku app data from random incidents. A secondary backup on a proprietary secure S3 bucket is the best practice that every startup and non-trivial side project should follow.
I am not too much into devops, so tips on how this tutorial could be improved are welcome.