How to use Amazon S3 to serve your assets (and set policies for teams)

Written by Christophe Limpalair on 08/29/2015

Now that we've created an EC2 instance, let's setup our S3 for file storage. These can be static or dynamic files.

With something like AWS S3, we can store terabytes of files in one central location for all of our applications without worrying about latency, security, and reliability. Not only does this drastically reduce complexity but it also reduces the need to worry about server failure and loss of data. AWS takes care of user credentials, roles, and permissions, and you don't have to worry about creating redundancy or many other intricacies involved in rolling out your own backup infrastructure.

This is far from the only scenario S3 can handle. S3 can host applications (like software upgrades for example), host video or audio content, and can be used to deploy applications to your servers.

This guide will use the online interface to create a bucket, set policies and permissions for multiple contributors, explain Versioning and Lifecycle, and walk you through uploading a file and setting caching metadata.

This article is aimed more for people who haven't used S3 yet or who want to learn how to set metadata on their files (like caching, for example).

Buckets

AWS Main Dashboard

Let's first create a bucket for our applications. Select S3 in your main dashboard and click on Create Bucket.

AWS Create Bucket Screen

A popup will ask you to choose a bucket name and Region. Be aware that bucket names must be unique accross all existing buckets in Amazon S3. This name will also be used in the URL that points to your data (referenced to as objects) so it needs to be DNS compliant. Read more about naming restrictions here. This is quite important because bucket names cannot be changed.

I'm going to name mine scaleyourcode.

When choosing a region, it probably makes sense to locate it closest to your customer base to save on your transfer rates bill and reduce latency (and also perhaps for regulatory compliance). If you're using a free 1-year account, you shouldn't get charged for any of this. If you get a lot of traffic you might. As of right now, the free tier is up to 5GB of Standard Storage, 20,000 GET and 2,000 PUT.

Keep in mind that you could also use CloudFront or CloudFlare as a CDN to deliver S3 content if you end up having customers in different locations.

Another option we have allows us to setup logging. More on that here if you are interested.

AWS Bucket Options

Now that we have our bucket, you will see a list of options on the right side. These options can be quite useful depending on your needs. The first one I'm going to cover is: Permissions.


Permissions and Policies
By default, all resources are private and only accessible by the AWS account that created them. Obviously sometimes we need more people to have access to our resources, and we can finely tune who can do what.

If you read my previous post on setting up an EC2 machine, then you're already familiar with IAM users and roles. We're going to use a similar approach for S3 so if you'd like to setup another account to access S3, please follow the IAM directions in that article.

To better explain the concept of allowing others who work with you to manage resources on S3, let me sidestep just for a minute and explain the different types of resources.

When looking at S3, there are two different resource types: Objects and Buckets. Objects belong to Buckets.

I think of Buckets in terms of applications. For example I have a Bucket named scaleyourcode and inside that I have multiple directories with different permissions. You could separate this in various ways-- for example I have a backups, deploy, and public directory. Inside the public directory you can have images, videos, files, etc... Simple, right?

The reason for this sidestep is because I want to show you the logic in figuring out permissions. If you think of Buckets in terms of applications, then you may have different people in charge of different applications (or Buckets). You may also have someone who can access backups, and someone else who is in charge of deploying, etc...

Say you have one app with someone else helping you manage it. We accomplish this with Bucket policies which use JSON to define rules.
{
"Version":"2012-10-17",
"Statement":[
{
"Sid":"11",
"Effect":"Allow",
"Principal":{"AWS":"50"},
"Action":"s3:PutObject",
"Resource":"arn:aws:s3:::scaleyourcode/*"
}
]
}



The version date is not your revision date. It is an Amazon defined policy version date. This is the newest one.

Sid is a unique ID and Principal is the user ID.

What if you want to allow others to upload Objects, but you want your account to be the owner of those Objects?
{
"Version":"2012-10-17",
"Statement":[
{
"Sid":"11",
"Effect":"Allow",
"Principal":{"AWS":"50"},
"Action":"s3:PutObject",
"Resource":"arn:aws:s3:::scaleyourcode/*"
},
{
"Sid":"12",
"Effect":"Deny",
"Principal":{"AWS":"50" },
"Action":"s3:PutObject",
"Resource":"arn:aws:s3:::scaleyourcode/*",
"Condition": {
"StringNotEquals": {"s3:x-amz-grant-full-control":["chris@scaleyourcode.com"]}
}
}
]
}



In this policy, we allow user with ID 50 to upload objects in our scaleyourcode Bucket, but only with the condition that this user gives me full control of the resource.

What about for your backups directory? You'll want this to be very secure and only accessible by a select few. Let's set up Multi-factor authentication.

{
"Version": "2012-10-17",
"Id": "123",
"Statement": [
{
"Sid": "",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": "arn:aws:s3:::scaleyourcode/backups/*",
"Condition": {"Null": {"aws:MultiFactorAuthAge": true }}
},
{
"Sid": "",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": "arn:aws:s3:::scaleyourcode/backups/*",
"Condition": {"NumericGreaterThan": {"aws:MultiFactorAuthAge": 3600 }}
},
{
"Sid": "",
"Effect": "Allow",
"Principal": "*",
"Action": ["s3:GetObject"],
"Resource": "arn:aws:s3:::scaleyourcode/*"
}
]
}



This policy enables Multi-factor authentication for the scaleyourcode bucket and the backups directory. When S3 receives a request with MFA authentication, the aws:MultiFactorAuthAge key gives a numeric key indicating how long ago (in seconds) the temporary credential was created. If the credential was not created with the MFA device, this key will be null (absent). In other words, the Null Condition evaluates to true if our MultiFactorAuthAge key value is null, meaning that the temporary security credentials in the request were created without the MFA key.

The second statement limits the duration for which the MultiFactorAuthAge key is valid (in seconds). I threw this in here to show how easy it is to add on multiple different statements in a policy, even for the same resources.

The third and last statement allows anyone to GET objects in our scaleyourcode Bucket. This gives you the ability to give overall permissions and then, as we just saw, we can fine tune other directories.

For a full explanation of the different elements, visit here.

You can also add specific IP, HTTP referral, and other restrictions. See more here. In addition, there are a host of other ways of securing resources and managing access. See more here.


Versioning and Lifecycle
Versioning isn't a new concept to you if you use Git or SVN for your application code. It makes sense to keep different versions of code in case you screw up or lose something. Versioning Objects is the same thing.

Even if you overwrite an Object, Versioning in S3 will keep a copy of the overwritten Object in case you need it.

This is not enabled by default.

Another feature that can work nicely with Versioning is Lifecycle. Instead of having to go through your Buckets every year to delete old and unused files, you can have Lifecycle take care of storing all these files in something called Glacier.

A great example of this is logs. Do you want 3+ years worth of logs? Maybe for some things but probably not. Enable Lifecycle to store log files that are more than 3 years old in Glacier.

Wait, what's Glacier? Glacier is cheap storage for files that are very rarely used. It basically takes the stress away from doing your own backups. Glacier pricing.


What can you do with Objects?
Now that we've covered how Bucket permissions work, let's dive in to Objects.

First, create a folder. Let's name it public. Click on this folder, and expand its Properties

Object Encryption and storage properties

For Storage Class, you've got two options: Standard and Reduced Redundancy. The default is Standard, and it provides 99.999999999% durability and 99.99% availability over a given year. This is designed to sustain the concurrent loss of data in two facilities. Not bad.

Reduced Redundancy is a cheaper alternative that can be used for data that is noncritical. Odds of losing data are still really low with 99.99% durability and availability, and an expected loss of 0.01% in a given year. That's a really low percentage if you don't have a lot of data, but make sure you have backups somewhere else.

For Server Side Encryption, you also have two options: None and AES-256. Server Side Encryption is about encrypting data at rest, before writing it to disk, and decrypting when requested. Server Side Encryption encrypts only the Object, and not its metadata (which I'm about to cover).

Remember the policies we just looked at? Here's one to force Server Side Encryption:
{
"Version":"2012-10-17",
"Id":"PutObjPolicy",
"Statement":[{
"Sid":"DenyUnEncryptedObjectUploads",
"Effect":"Deny",
"Principal":"*",
"Action":"s3:PutObject",
"Resource":"arn:aws:s3:::scaleyourcode/*",
"Condition":{
"StringNotEquals":{
"s3:x-amz-server-side-encryption":"AES256"
}
}
}
]
}


Let's create a new folder called images, and upload an image. As I said earlier this image will, by default, have restricted access to the owner only. If we want this image to go on our website and be accessible by everyone, we need to change this.

You can either change this in the upload popup window, or you can hit "Start Upload" and change it after. Once the upload is complete, go ahead and select it. Make sure you expand the Properties, and then click on Permissions. Select the Add more permissions button, and select "Everyone" from the dropdown. Since we want the general public to have access to this image, tick the "Open/Download" option. That's it, save these changes. You'll see a "Link", and you'll be able to share this link with anyone so they can view your image. You can also, obviously, use this URL in your web application.

There's more we can and should do with this image. For example, we need to set metadata such as Cache-Control to allow browsers to cache our image. This will not only drastically speed up load time, but also save your bandwidth and cost less.

Below Permissions, expand the Metadata tab. In this tab, we can set different headers that give different results.

Object metadata information

The ones I always set for images are Content-Type and Cache-Control. Keep in mind Cache-Control is in seconds.

Cache-Control    max-age=2592000, public


This will cache your image for 30 days. I've seen many recommend very long (maximum) cache times, but the benefit is not worth it to me, especially with a rapidly changing application.

You can also set an Expires tag, of course.

On top of setting Cache-Control and expires tags, you can also set your own metadata tags. Just be sure to prefix these with "x-amz-meta-"

When not to use S3?
I hope this article has helped you get started with Amazon S3. It's a great way to decouple your assets from your servers. You can slim down disk space, reduce the amount of disk usage, and lower bandwidth usage.

Keep in mind that S3 does charge for bandwidth, so the more content you server, the more expensive it gets. That's why some companies prefer using services with a fixed price, instead of the "pay for what you use" model. This surprised me one month when I opened my bill and saw a much bigger charge than usual for having served hundreds of GB of data due to my interviews becoming more popular. This is one of the reasons I am moving videos over to another service, but I still highly recommend S3 for images and the like. With aggressive caching, you can really lower your bills and enjoy the benefits.

Conclusion
I do plan on going deeper into these topics and providing more use cases in the future, but I want to lay the foundations first and get people who have never used the service up to speed.

Stay tuned for more and thanks for reading!