Heroku offers a robust backups system for it’s Postgres database plugin. Unfortunately, you can irreversibly lose all your data and backups just by typing a single command. It might seem improbable, but still, I would rather not bet my startup’s existence on a single faulty bash line. In this tutorial, I will describe how to set up a proprietary redundant Heroku PostgreSQL backups system to a secure AWS S3 bucket.
I will be covering various tools including AWS CLI, OpenSSL, GPG, Heroku buildpacks, and scheduler but you don’t need to be familiar with any of those. By following this guide, you will set up a reliable, custom backups system for Heroku PostgreSQL database even if you don’t have much dev ops experience up your sleeve.
Let’s get started!
How to lose all the Heroku Postgres data
Though the first paragraph is a bit “clickbaity”, it’s actually true. Just by typing:
you can completely destroy the Heroku app including all it’s add-ons, databases and backups data:
Read on if you’d like to safeguard yourself against a potential total loss of your startup’s data.
Set up an encrypted AWS S3 bucket
Our redundant backups system will periodically upload encrypted snapshosts of the PostgreSQL database to a secure AWS S3 bucket.
Let’s start with adding a correctly configured S3 bucket. For a more in-depth tutorial on how to work with AWS S3 buckets, you can check out my other article.
Make sure to disable public access and enable encryption for your new S3 bucket:
Add an IAM user
You will need Amazon AWS credentials to upload the backup dump to S3 bucket. One common mistake is to use your primary account credentials instead of creating an IAM user with limited permissions.
Check out the official docs for info how to add the IAM user with correct permission policies. For the backups system to work, you should use
AmazonS3FullAccess policy for the IAM user.
Make sure to copy both
AWS Access Key ID and
AWS Secret Access Key generated because we’ll need them later.
Let’s move on to writing the actual backup script:
Backup bash script
To avoid installing unnecessary heavy dependencies on Heroku dynos, we presign the S3 upload URL manually. GPG is used to encrypt the database dump after downloading it using Heroku CLI.
First, add a buildpack to enable Heroku CLI access from within the script:
For security reasons, you might want to consider forking a CLI buildpack repo and use the fork instead.
Now you need to set S3 credentials you generated when adding an IAM user and your bucket name:
You also need to set a secure password that will be used to encrypt the database dump files before uploading them to S3. You can use OpenSSL fot that:
Just make sure to save this password somewhere safe. Otherwise that destructive one-liner will also prevent you from decrypting the secondary backup.
You must also set your Heroku app name because it will be used by Heroku CLI to download the latest backup:
Now let’s see the actual script
Depending on your S3 bucket location you might have to change
s3-eu-west-1 to other region.
Now make the file executable by typing:
Before you test it make sure you have the up to date backup by running:
You can try to run the script locally if you set all the correct shell variables. Just be careful if you are using
zsh as your shell. I’ve had issues with generating the correct signature when working on
zsh and had to switch to
bash to make it work.
Now commit the changes to your repo and deploy them to Heroku. You can now test your script in Heroku environment by typing:
After the script execution you should see your encrypted backup file on the S3 bucket!
The script is a bit complex so if you fail to upload the backup file to S3 you can try the following:
set -euxfor more verbose script output
- if S3 upload request returns
SignatureDoesNotMatcherror make sure that the signature string is the same as expected by AWS, also use
- make sure that the date you are sending to S3 is in the UTC timezone
- leave a comment, took me a while to make it work, happy to help here ;)
Use Heroku scheduler for automatic script execution
You can now run the backup script manually, let’s make it automatic. Heroku Scheduler is a
cron like tool to run Heroku jobs in predefined time periods.
You can add it by typing:
and configure it like that:
To make sure that the newest database dump will be stored daily you can schedule Heroku backup to take place just before the scheduler script execution:
That’s it. Now your Heroku PostgreSQL will be backed up daily to your own secure S3 bucket. You might also want to consider adding a bucket lifecycle rule to remove the older files and optimize storage costs.
How to restore Heroku S3 PostgreSQL backup
It is a good practice to double check that your backups actually work and can be restored in case they are needed. Let me show you how to do it.
AWS CLI configuration
AWS CLI will be needed to restore the backup. You can install it locally by following this tutorial.
Now authenticate the AWS CLI by running:
and inputting your IAM user
AWS Access Key ID and
AWS Secret Access Key. You can just press ENTER when asked to provide
Default region name and
Default output format.
When it’s it up and running you can now generate a short-lived download URL for your encrypted backup file. Let’s assume that it’s S3 path is
s3://heroku-secondary-backups/heroku-backup-2019-06-25_01.30.gpg. You can download it with the following command:
Once you have it on your local disc you can decrypt it by running:
Now you have to upload the decrypted version of a backup back to S3 bucket, use it to restore Heroku database and remove it from the bucket right after its been used. We will start with testing it out on a newly provisioned database add-on:
You can now check if the content of your database looks correct by logging into it and running some queries:
If everything looks OK you can now restore the backup file to your production database:
Alternatively, you could promote the new database add-on as your new primary database:
I hope this blog post will help you secure your Heroku app data from random incidents. A secondary backup on a proprietary secure S3 bucket is the best practice that every startup and non-trivial side project should follow.
I am not too much into devops, so tips on how this tutorial could be improved are welcome.