Many web apps use AWS S3 buckets for storing static assets like PDFs and images. Unfortunately, the default config makes it just too easy to irreversibly lose all the data. For many projects, it would probably mean that all the team can pack their bags and go home. Read on if you want to find out how you can accidentally or maliciously obliterate S3 production data. We’ll also learn how to safeguard your project from these tragic scenarios using replication backups to a secondary AWS bastion account.
How to lose your S3 production data
Let’s start by discussing the fragility of an AWS S3 bucket existence.
The web console UI provides decent protection from accidental removal of a bucket:
But, the AWS CLI makes it possible to run a single bash command which COMPLETELY and IRREVERSIBLY removes the bucket with all of its contents.
Please dig through the docs if you want to find it. I don’t want my blog to be a source of a fat-fingered copy-paste disaster. Accidentally or maliciously executing this command would cause all your data to disappear. It might seem improbable, but still, I would rather not bet my startup’s existence on a single faulty bash line. S3 is a self-managed service, so I doubt that the AWS support would be able to restore it.
Please let that sink in. EVERYONE in your company with AWS admin access can IRREVERSIBLY remove all your production assets. So every developer, support agent, or external consultant with admin access is a single point of failure that can destroy your startup. Or an attacker who gained access to your account could remove the data and only restore it after a hefty Bitcoin donation.
“Why won’t you JUST use correct IAM security policies?”
I’ve covered this topic before in the context of redundant PostgreSQL database backups, and I’ve received comments like:
“It’s not a problem unless you grant untrustworthy actors admin access.”
“It is your fault for not following the principle of least privilege for IAM policies.”
But even with more experience, it’s not straightforward to fine-tune IAM policies for every single developer and application process. Especially when you’re working on more obscure parts of the AWS stack. It’s understandable that unless you’re hiring a full-time AWS security specialist applying admin access for every developer is a sane default.
You can check my other blog post for more details about configuring safe IAM policies for S3 buckets. But, even the best IAM policies will not change the fact that a single production bucket is a single source of failure for your project.
Let’s now discuss how to add cross-account replication to significantly improve the security of your data.
Replicating S3 data to your AWS bastion account
By “bastion” AWS account, I mean a separate account that’s used only for secondary backups, and only necessary users can access it. It only holds a few S3 buckets and is not used for development work. Its configuration rarely changes, so chances to introduce a security loophole are minimal.
We’ll now create a new S3 bucket in the AWS bastion account. This bucket will keep all the data from your primary account production bucket. While in your new AWS account console, go to S3 > Create bucket.
In Bucket settings for Block Public Access, leave the default option Block all public access. Next, enable the Bucket versioning and Default encryption with the Amazon S3 key (SSE-S3). Now you can press Create bucket.
We’ll also want to add a so-called lifecycle rule for our backups bucket. Objects marked as deleted in the origin bucket should be hard deleted in a backup bucket after a configured delay. This setup prevents the size of your data from growing indefinitely and allows app users to delete their assets permanently. It might be critical if you care about confirming with compliance rules like GDPR.
In the Management tab of our backup bucket, select Create lifecycle rule. Give the rule any meaningful name. Next, tick and confirm the Apply to all objects in the bucket checkbox. Finally, in the Lifecycle rule actions, select Permanently delete previous versions of objects after a number of days that works for your use case.
Configuring cross-account replication
Now in your PRIMARY AWS account, go to S3 > your-production-bucket > Management. In Replication Rules, press Create replication rule. Give it any name, select Create new role for IAM role and choose Enabled status.
In the Source bucket panel, choose Applies to all objects in the bucket for a Rule scope.
For Destination, select Bucket in another account. Input account ID and bucket name from your SECONDARY AWS account.
Tick the Change object ownership to destination bucket owner checkbox. Also, check the Delete marker replication. Thanks to this setting, objects removed from the source will eventually be hard-deleted in the backups because of the lifecycle policy that we’ve configured before.
Now in your SECONDARY AWS account go to S3 > replication-target-bucket > Management. In Replication Rules, press Actions > Receive replicated objects. Input the account number of your PRIMARY AWS account and click Generate policies. Expand View bucket policy and tick the Include permission to change object ownership to destination bucket owner checkbox. Now press Apply settings. You don’t have to apply KMS related policy.
That’s it. All objects uploaded to your primary S3 bucket will be automatically replicated to a bucket in your secondary account.
Synchronizing data between cross-account S3 buckets
The described approach works for buckets that hold up to ~5 million objects or ~1 TB of data. If your bucket has more objects, you should avoid using AWS CLI
sync and check out S3 Batch Operations instead.
With the replication in place, all the new data will get copied over to the secondary bucket. However, to complete our setup, we still have to synchronize data that was added to our source bucket before we’ve enabled replication. AWS CLI
sync method is a perfect tool for that.
AWS provides excellent docs on copying objects cross-accounts, so I won’t be duplicating the same content here.
Just instead of
aws s3 cp command from the docs:
you have use to
AWS CLI is smart enough only to copy the objects that are missing in the destination bucket.
A quick overview of the whole process goes as follows:
- In the source AWS account provision, an EC2 instance from Amazon Linux AMI with AWS CLI preinstalled
- Still, in the source account, create an IAM role with necessary permissions and assign it to the EC2
- In the bastion account assign a correct Bucket policy to the destination bucket to allow write-only access for an external IAM role
- Run an AWS CLI
synccommand from the EC2
If you have a lot of data, you can use a bash
screen command to background the process and leave it running overnight. BTW such migrations are a perfect occasion to accidentally corrupt or obliterate the data. Always make sure to use IAM roles with minimal permissions. If you’re running a data migration using
AdministratorAccess policies, you’re a single fat-fingered typo away from disaster.
Once the synchronization is over, you can run the following commands to compare object count in source and target buckets:
Just watch out for the potential API calls cost of running these commands if your bucket holds millions of objects! Alternatively, you can wait one day for web console Metrics to display the new object’s size and count data.
And that’s it! Your production bucket is now safely backed up and replicating to a bastion account. So even if an attacker or intern messes up your primary data storage, you’ll still be able to restore your project.
I hope this blog post will help you secure your S3 data from random incidents. Secondary “panic” backups are a simple best practice that can save your company from disaster.