Lessons Learned in Shipping a Static Site to S3

Posted by Rachel Donovan, Steven Purr on June 29, 2016 in Engineering

header illustration for Lessons Learned in Shipping a Static Site post

A few months ago, the web development team at Articulate shipped a small microsite about our company culture, Life at Articulate. Given that it was just a few static pages, we decided to build it with good old fashion HTML/CSS and host it on AWS S3. “It will be easy,” we thought.

Minutes into development, we remembered the pitfalls of classic static development: CSS without Sass is horrible.

There are a ton of static site generators available for this kind of project. We considered the list of options on Static Gen, but ultimately they seemed like overkill. We landed on a simple compilation of gulp scripts to aid in our dev and deployment process.

The compilation included scripts for:

Compiling Sass and compressing JavaScript into a distribution folder
Configuring a simple local webserver
Publishing to AWS S3

Once we’d chosen our tools, we buckled down and got started.

Working with Cloudfront Distribution

In addition to uploading Life at Articulate content to an S3 bucket as a static site, we decided to use a Cloudfront distribution (CF distro) to do two things:

Handle the redirection of traffic from HTTP to HTTPS
Manage our certificate for the site

Working with a CF distro also had other benefits—such as caching to improve the site’s performance.

We ran into a handful of issues, a few of which we’ll dig into here. At a high level, we had some trouble with the CF distro’s cache updating itself with freshly deployed content right away. We also ran into issues with default index page behavior within subdirectories through CF. Finally, we wanted to simplify the deployment process of the site for our developers as much as possible.

In this post, I’ll show you how our team worked through each of these points. Hopefully, you’ll learn something that you can use to build your own static site projects more quickly and easily.

Tweaking AWS Settings to Update Cached Content Faster

During the initial creation of the CF distro, we noticed that the cache wasn’t updating during subsequent deploys to either of the buckets for the prod and stage site. We could see the new content when we visited the S3 bucket endpoint, life.articulate.com.s3-website-us-east-1.amazonaws.com, which led us to determine that the problem was with the cache itself.

A quick check of the CF distro showed that the headers weren’t being forwarded to the origin on each request. Only the cached content was returned to the viewer right away. Not forwarding headers to the origin on each request improves caching and overall performance, but it isn’t desirable if the content is subject to frequent change, such as on, say, a regularly updated company blog.

We changed the settings to forward headers on each request. This helped get new content out to the cache on each request after we’d updated the S3 bucket. Here’s what these settings look like on the AWS console:

screenshot of the AWS console settings

Forwarding the headers back to the origin on each request allowed our CF distro to check the origin for updated content and adjust the cache as needed. This allowed us to continue utilizing a CF distro without waiting for the “time to live” (TTL) to expire for each object before newly deployment content was displayed.

Cloudfront Origins: An Unexpected “Gotcha”

Defaulting to an index.html page within subdirectories through the cloudfront distribution seems simple enough, right? It ended up being a “gotcha” for us. There were a couple of issues we came across when trying to set this up.

The first and simplest problem to fix was directing the S3 bucket to use the index.html as its default index document in each of its directories. Here’s how that looks in the AWS console:

S3 respects that setting not only in its root directory but also throughout its subdirectories, which is what we wanted. Like I said, a pretty quick fix.

Accessing the site through the CF distro, however, was a bit trickier. We noticed that only the index.html from the root directory was displayed without specifying it directly in the URL. So, both life.articulate.com and life.articulate.com/index.html would display index.html site content.

Pages on the site that had subdirectory index.html pages weren’t displaying them, though. For instance, life.articulate.com/careers/ would display an error message. But when we visited the S3 bucket’s website endpoint directly, the /careers/ page loaded just fine. In other words, S3 was able to find the index.html page in subdirectories by default, but Cloudfront wasn’t able to.

It turns out that the type of origin we selected for the Cloudfront distribution was tripping us up. We had originally selected the S3 bucket to be the origin because, well, that’s the site from which the site was being served. Makes sense, right?

Unfortunately, by selecting the S3 bucket as our origin, the subdirectory index.html default document behavior wasn’t respected. To fix this, we had to create a custom origin that hit the S3 static site endpoint instead of directly hitting the S3 bucket. The origin configuration through our cloudformation repo changed from:

S3 Origin –

"Origins" : [ {

"Id" : "S3-life.articulate.com",

"DomainName" : "life-bucket.s3.amazonaws.com",

"S3OriginConfig" : {

"OriginAccessIdentity" : "origin-access-identity/cloudfront/XXXXXXXXXXXXX"

}

} ],

to:

Custom Origin –

"Origins" : [ {

"Id" : "Custom-life.articulate.com",

"DomainName" : "life.articulate.com.s3-website-us-east-1.amazonaws.com",

"CustomOriginConfig" : {

"OriginProtocolPolicy": "http-only"

}

} ],

The combination of setting the index.html through the S3 bucket and using the custom origin through Cloudfront resolved that issue for us.

Deploying to S3 Locally with Jenkins Jobs and Hubot

During the early stages of development, we deployed to S3 locally via the gulp command. Eventually, we decided to move to a solution that was more secure and would cause less friction on the developer’s side. We created a Jenkins job and a hubot (we call ours “Botzo” at Articulate) command to do this.

The Jenkins job we set up clones the Github repo down to the build agent (based on whatever git ref was passed to it), runs an npm install to pull all the gulp modules down, and then works through each of the gulp steps to get the code ready for publishing to the S3 bucket.

Our job configuration has the following Execute Shell script:

echo S3_BUCKET=life.articulate.com > .env
npm install gulp
npm install

#concatenate and minify JS
${WORKSPACE}/node_modules/gulp/bin/gulp.js js

#compile compressed CSS from Sass
${WORKSPACE}/node_modules/gulp/bin/gulp.js styles

#copy and compile application to dist folder
${WORKSPACE}/node_modules/gulp/bin/gulp.js package

#push application to S3
${WORKSPACE}/node_modules/gulp/bin/gulp.js publish

We built several commands that we use to trigger the Jenkins deploy job in Slack. Here are some examples:

botzo deploy life to production
botzo deploy life to stage from dev_branch

The build agents that handle the Life at Articulate site deployment Jenkins job needed to be attached to an IAM role. That IAM role needed enough permissions for it to delete/upload files to the S3 bucket, and some other permissions that the gulp module required. The minimum S3 Action permissions needed for the gulp publish job were:

"Action": [

"s3:AbortMultipartUpload",

"s3:DeleteObject",

"s3:GetObject",

"s3:GetObjectAcl",

"s3:ListBucket",

"s3:ListMultipartUploadParts",

"s3:PutObject",

"s3:PutObjectAcl"

],

Once that was set, gulp publish was able to do its thing.

Our simple setup had a few gotchas, but in the end it’s turned out to be great. We’ve already decided to use this setup in other small projects.

What We Learned

So, what’d we learn? What can you take away from this blog post and apply to the production of your own static site?

In summary, there were four main issues that we had to overcome to get the Life at Articulate site up and running. Here’s how we fixed them:

Problem	Solution
1. Working with modern tools without a bloated framework	We implemented gulp scripts.
2. Getting the S3 cache to update as soon as we deployed new content	We made sure the headers were forwarded to origin. Doing this allowed for the default behavior that was set in the S3 bucket’s settings to trickle down into the Cloudfront distro.
3. index.html rendering on root and subdirectories	We set a custom origin against the static site’s endpoint (S3 change and Cloudfront change).
4. Making sure we had automated builds	We created a Jenkins script that’s activated by our Slackbot, Botzo.

We hope these insights and lessons learned throughout our process are helpful. We really believe in providing developers with the tools they need to get the job done themselves as much as possible, and this project allowed us to explore and develop some new tools. It’s a great example of how Articulate’s focus on empowerment and autonomy can help developers get work done efficiently.