Easy to Miss Way to Irreversibly Lose AWS S3 Data

AWS S3 bucket assets backup is represented by two folders Photo by Pixabay from Pexels

Many web apps use AWS S3 buckets for storing static assets like PDFs and images. Unfortunately, the default config makes it just too easy to irreversibly lose all the data. For many projects, it would probably mean that all the team can pack their bags and go home. Read on if you want to find out how you can accidentally or maliciously obliterate S3 production data. We’ll also learn how to safeguard your project from these tragic scenarios using replication backups to a secondary AWS bastion account.

How to lose your S3 production data

Let’s start by discussing the fragility of an AWS S3 bucket existence.

The web console UI provides decent protection from accidental removal of a bucket:

AWS S3 bucket removal UI

S3 bucket removal confirmation UI

But, the AWS CLI makes it possible to run a single bash command which COMPLETELY and IRREVERSIBLY removes the bucket with all of its contents.

You can lose all the production data without the redundant backup on S3

Please dig through the docs if you want to find it. I don’t want my blog to be a source of a fat-fingered copy-paste disaster. Accidentally or maliciously executing this command would cause all your data to disappear. It might seem improbable, but still, I would rather not bet my startup’s existence on a single faulty bash line. S3 is a self-managed service, so I doubt that the AWS support would be able to restore it.

Please let that sink in. EVERYONE in your company with AWS admin access can IRREVERSIBLY remove all your production assets. So every developer, support agent, or external consultant with admin access is a single point of failure that can destroy your startup. Or an attacker who gained access to your account could remove the data and only restore it after a hefty Bitcoin donation.

“Why won’t you JUST use correct IAM security policies?”

I’ve covered this topic before in the context of redundant PostgreSQL database backups, and I’ve received comments like:

“It’s not a problem unless you grant untrustworthy actors admin access.”

“It is your fault for not following the principle of least privilege for IAM policies.”

But even with more experience, it’s not straightforward to fine-tune IAM policies for every single developer and application process. Especially when you’re working on more obscure parts of the AWS stack. It’s understandable that unless you’re hiring a full-time AWS security specialist applying admin access for every developer is a sane default.

You can check my other blog post for more details about configuring safe IAM policies for S3 buckets. But, even the best IAM policies will not change the fact that a single production bucket is a single source of failure for your project.

Let’s now discuss how to add cross-account replication to significantly improve the security of your data.

Replicating S3 data to your AWS bastion account

By “bastion” AWS account, I mean a separate account that’s used only for secondary backups, and only necessary users can access it. It only holds a few S3 buckets and is not used for development work. Its configuration rarely changes, so chances to introduce a security loophole are minimal.

After creating a new AWS account, make sure to configure 2FA and remove root user access credentials.

We’ll now create a new S3 bucket in the AWS bastion account. This bucket will keep all the data from your primary account production bucket. While in your new AWS account console, go to S3 > Create bucket.

In Bucket settings for Block Public Access, leave the default option Block all public access. Next, enable the Bucket versioning and Default encryption with the Amazon S3 key (SSE-S3). Now you can press Create bucket.

We’ll also want to add a so-called lifecycle rule for our backups bucket. Objects marked as deleted in the origin bucket should be hard deleted in a backup bucket after a configured delay. This setup prevents the size of your data from growing indefinitely and allows app users to delete their assets permanently. It might be critical if you care about confirming with compliance rules like GDPR.

In the Management tab of our backup bucket, select Create lifecycle rule. Give the rule any meaningful name. Next, tick and confirm the Apply to all objects in the bucket checkbox. Finally, in the Lifecycle rule actions, select Permanently delete previous versions of objects after a number of days that works for your use case.

S3 lifecycle rule configuration

S3 lifecycle rule final confirmation

S3 lifecycle configuration

Configuring cross-account replication

Now in your PRIMARY AWS account, go to S3 > your-production-bucket > Management. In Replication Rules, press Create replication rule. Give it any name, select Create new role for IAM role and choose Enabled status.

In the Source bucket panel, choose Applies to all objects in the bucket for a Rule scope.

For Destination, select Bucket in another account. Input account ID and bucket name from your SECONDARY AWS account.

Tick the Change object ownership to destination bucket owner checkbox. Also, check the Delete marker replication. Thanks to this setting, objects removed from the source will eventually be hard-deleted in the backups because of the lifecycle policy that we’ve configured before.

AWS S3 bucket removal UI

S3 replication configuration

Now in your SECONDARY AWS account go to S3 > replication-target-bucket > Management. In Replication Rules, press Actions > Receive replicated objects. Input the account number of your PRIMARY AWS account and click Generate policies. Expand View bucket policy and tick the Include permission to change object ownership to destination bucket owner checkbox. Now press Apply settings. You don’t have to apply KMS related policy.

That’s it. All objects uploaded to your primary S3 bucket will be automatically replicated to a bucket in your secondary account.

Synchronizing data between cross-account S3 buckets

The described approach works for buckets that hold up to ~5 million objects or ~1 TB of data. If your bucket has more objects, you should avoid using AWS CLI sync and check out S3 Batch Operations instead.

With the replication in place, all the new data will get copied over to the secondary bucket. However, to complete our setup, we still have to synchronize data that was added to our source bucket before we’ve enabled replication. AWS CLI sync method is a perfect tool for that.

AWS provides excellent docs on copying objects cross-accounts, so I won’t be duplicating the same content here.

Just instead of aws s3 cp command from the docs:

aws s3 cp s3://source-bucket/object.txt s3://destination-bucket/object.txt --acl bucket-owner-full-control

you have use to sync:

aws s3 sync s3://source-bucket s3://destination-bucket --acl bucket-owner-full-control

AWS CLI is smart enough only to copy the objects that are missing in the destination bucket.

A quick overview of the whole process goes as follows:

In the source AWS account provision, an EC2 instance from Amazon Linux AMI with AWS CLI preinstalled
Still, in the source account, create an IAM role with necessary permissions and assign it to the EC2
In the bastion account assign a correct Bucket policy to the destination bucket to allow write-only access for an external IAM role
Run an AWS CLI sync command from the EC2

If you have a lot of data, you can use a bash screen command to background the process and leave it running overnight. BTW such migrations are a perfect occasion to accidentally corrupt or obliterate the data. Always make sure to use IAM roles with minimal permissions. If you’re running a data migration using AmazonS3FullAccess or AdministratorAccess policies, you’re a single fat-fingered typo away from disaster.

Once the synchronization is over, you can run the following commands to compare object count in source and target buckets:

aws s3 ls s3://source-bucket --recursive | wc -l
aws s3 ls s3://destination-bucket --recursive | wc -l

Just watch out for the potential API calls cost of running these commands if your bucket holds millions of objects! Alternatively, you can wait one day for web console Metrics to display the new object’s size and count data.

And that’s it! Your production bucket is now safely backed up and replicating to a bastion account. So even if an attacker or intern messes up your primary data storage, you’ll still be able to restore your project.

Summary

I hope this blog post will help you secure your S3 data from random incidents. Secondary “panic” backups are a simple best practice that can save your company from disaster.