Alternate title: How to be master of your domain.
The basic idea of this post is to demonstrate how CloudFront can be utilized as a serverless reverse-proxy, allowing you to host all of your application’s content and services from a single domain. This minimizes a project’s TLD footprint while providing project organization and performance along the way.
Why
Within large organizations, bureaucracy can make it a challenge to obtain a subdomain for a project. This means that utilizing multiple service-specific subdomains (e.g. api.my-project.big-institution.gov
or thumbnails.my-project.big-institution.gov
) is an arduous process. To avoid this in a recent project, we settled on adopting a pattern where we use CloudFront to proxy all of our domain’s incoming requests to their appropriate service.
How it works
CloudFront has the ability to support multiple origin configurations (i.e. multiple sources of content). We can utilize the Path Pattern setting to direct web requests by URL path to their appropriate service. CloudFront behaves like a typical router libraries, wherein it routes traffic to the first path with a pattern matching the incoming request and routes requests that don’t match route patterns to a default route. For example, our current infrastructure looks like this:
my-project.big-institution.gov/
├── api/* <- Application Load Balancer (ALB) that distributes traffic to order
│ management API service running on Elastic Container Service (ECS).
│
├── stac/* <- ALB that distributes traffic to STAC API service running on ECS.
│
├── storage/* <- Private S3 bucket storing private data. Only URLs that have been
│ signed with our CloudFront keypair will be successful.
│
├── thumbnails/* <- Public S3 bucket storing thumbnail imagery.
│
└── * <- Public S3 website bucket storing our single page application frontend.
Single Page Applications
An S3 bucket configured for website hosting acts as the origin for our default route. If an incoming request’s path does not match routes specified elsewhere within the CloudFront distribution, it is routed to the single page application. To configure the single page application to handle any requests provided (i.e. not just requests sent to paths of existing files within the bucket, such as index.html
or app.js
), the bucket should be configured with a custom error page in response to 404
errors, returning the applications HTML entrypoint (index.html
).
Requirements
To enable the usage of a custom error page, the S3 bucket’s website endpoint (i.e. <bucket-name>.s3-website-<region>.amazonaws.com
, not <bucket-name>.s3.<region>.amazonaws.com
) must be configured as a custom origin for the distribution. Additionally, the bucket must be configured for public access. More information: Using Amazon S3 Buckets Configured as Website Endpoints for Your Origin. Being that the S3 website endpoint does not support SSL, the custom origin’s Protocol Policy should be set to HTTP Only.
My bucket is private. Can CloudFront serve a website from this bucket?
If your bucket is private, the website endpoint will not work (source). You could configure CloudFront to send traffic to the buckets REST API endpoint, however this will prevent you from being able to utilize S3’s custom error document feature which may be essential for hosting single page applications on S3. Tools like Next.js and Gatsby.js support rendering HTML documents for all routes, which can avoid the need for custom error pages; however care must be given to ensure that any dynamic portion of the page’s routes (e.g. /docs/3
, where 3
is the ID of a record to be fetched from an API) must be specified as either a query parameter (e.g. /docs?3
) or a hash (e.g. /docs#3
).
CloudFront itself has support for custom error pages. Why can’t I use that to enable hosting private S3 buckets as websites?
While it is true that CloudFront can route error responses to custom pages (e.g. sending all 404
responses the contents of s3://my-website-bucket/index.html
), these custom error pages apply to the entirety of your CloudFront distribution. This is likely undesirable for any API services hosted by your CloudFront distribution. For example, if a user accesses a RESTful API at http://my-website.com/api/notes/12345
and the API server responds with a 404
of {"details": "Record not found"}
, the response body will be re-written to contain the contents of s3://my-website-bucket/index.html
. At time of writing, I am unaware of any capability of applying custom error pages to only certain content-types. A feature such as this might make distribution-wide custom error pages a viable solution.
APIs
APIs are served as custom origins, with their Domain Name settings pointing to their an ALB’s DNS name.
Does this work with APIs run with Lambda or EC2?
Assuming that the service has a DNS name, it can be set up as an origin for CloudFront. This means that for an endpoint handled by a Lambda function, you would need to have it served behind an API Gateway or an ALB.
Recommended configuration
- Disable caching by setting the default, minimum, and maximum TTL to
0
seconds. - Set AllowedMethods to forward all requests (i.e.
GET
,HEAD
,OPTIONS
,PUT
,PATCH
,POST
, andDELETE
). - Set ForwardedValues so that querystring and the following headers are fowarded:
referer
,authorization
,origin
,accept
,host
- Origin Protocol Policy of HTTP Only.
Data from S3 Buckets
Data from a standard S3 bucket can be configured by pointing to the bucket’s REST endpoint (e.g. <bucket-name>.s3.<region>.amazonaws.com
). More information: Using Amazon S3 Buckets for Your Origin.
This can be a public bucket, in which case would benefit from the CDN and caching provided by CloudFront.
When using a private bucket, CloudFront additionally can serve as a “trusted signer” to enable an application with access to the CloudFront security keys to create signed URLs/cookies to grant temporary access to particular private content. In order for CloudFront to access content within a private bucket, its Origin Access Identity must be given read privileges within the bucket’s policy. More information: Restricting Access to Amazon S3 Content by Using an Origin Access Identity
Caveats
The most substantial issue with this technique is the fact that CloudFront does not have the capability to remove portions of a path from a request’s URL. For example, if an API is configured as an origin at https://d1234abcde.cloudfront.net/api
, it should be configured to respond to URLs starting with /api
. This is often a non-issue, as many server frameworks have builtin support to support being hosted at a non-root path.
Configuring FastAPI to be served under a non-root path
|
|
Furthermore, if you have an S3 bucket serving content from https://d1234abcde.cloudfront.net/bucket
, only keys with a prefix of bucket/
will be available to that origin. In the event that keys are not prefixed with a path matching the origins configured path pattern, there are two options:
- Move all of the files, likely utilizing something like S3 Batch (see #253 for more details)
- Use a Lambda@Edge function to rewrite the path of any incoming request for a non-cached resource to conform to the key structure of the S3 bucket’s objects.
Summary
After learning this technique, it feels kind of obvious. I’m honestly not sure if this is AWS 101 level technique or something that is rarely done; however I never knew of it before this project and therefore felt it was worth sharing.
A quick summary of some of the advantages that come with using CloudFront for all application endpoints:
- It feels generally tidier to have all your endpoints placed behind a single domain. No more dealing with ugly ALB, API Gateway, or S3 URLs. This additionally pays off when you are dealing with multiple stages (e.g.
prod
anddev
) of the same service 🧹. - SSL is managed and terminated at CloudFront. Everything after that is port 80 non-SSL traffic, simplifying the management of certificates 🔒.
- All non-SSL traffic can be set to auto-redirect to SSL endpoints ↩️.
- Out of the box, AWS Shield Standard is applied to CloudFront to provide protection against DDoS attacks 🏰.
- Static content is regionally cached and served from Edge Locations closer to the viewer 🌏.
- Dynamic content is also served from Edge Locations, which connect to the origin server via AWS’ global private network. This is faster than connecting to an origin server over the public internet 🚀.
- Externally, all data is served from the same domain origin. Goodbye CORS errors 👋!
- Data egress costs are lower through CloudFront than other services. This can be ensured by only selecting Price Class 100, other price classes can be chosen if enabling a global CDN is worth the higher egress costs 💴.
Example
An example of a reverse-proxy CloudFront Distribution written with CDK in Python
|
|