a collection of notes

thoughts recorded while working as a software developer

Hosting websites from private S3 buckets

At work, we were alerted to an outage of an S3-backed frontend. The frontend was returning 403 responses. This left us scratching our head, as no deployment had occurred recently. After doing some digging, we found that AWS account administrators had applied a new policy to make all S3 buckets private (this is an account-wide setting, overriding bucket-level settings). 🆒 🆒 🆒 So how can we configure Cloudfront to access private S3 buckets?...

.env to Github Environment Variables

A script for uploading dotenv files to Github environments: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 #!/bin/bash PAGER="" # Avoid pager when using zsh # Check if the correct number of arguments are passed if [ "$#" -ne 2 ]; then echo "Usage: $0 <org/repo> <environment> < ....

An ECS -> RDS Security Group Script

Below is a simple script to allow a user to alter RDS databases security groups to allow access from an ECS Service. Useful when we have an observability tool runing in ECS that wants to add RDS data connections. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 from typing import List, Dict import boto3 from botocore....

Normalizing heterogeneous decimal Ion data in Athena

Recently, we exported data from a DynamoDB table to S3 in AWS Ion format. However, due to the fact that the DynamoDB table had varied formats for some numeric properties, the export serialized these numeric data columns in a few different formats: as a decimal (1234.), as an Ion decimal type (1234d0), and as a string ("1234"). However, we want to be able to treat these values as a bigint within our Athena queries....

Notes on Cookies 🍪

These are some notes that I have needed to write down about using cookies in web applications. Admittedly, I don’t know a lot about cookies and should probably not be considered a source of authority on this topic. Why cookies? Making authenticated anchor tags Can’t specify headers with <a> tags. Could supply token as query parameter, but that’s a security concern due to potential of token being cached with URL....

Auto-assume an IAM role before running a command

A convenience function to assume a IAM Role via STS before running a command. Add the following to your ~/.zshrc (or equivalent) file: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 function with-role { readonly role_arn=${1:?"The role_arn must be specified."} env -S $( aws sts assume-role \ --role-arn ${role_arn} \ --role-session-name ${USER} \ | \ jq -r '.Credentials | " AWS_ACCESS_KEY_ID=\(.AccessKeyId) AWS_SECRET_ACCESS_KEY=\(.SecretAccessKey) AWS_SESSION_TOKEN=\(.SessionToken) "' ) ${@:2} } This assumes that you have both the AWS CLI and jq installed....

Type-based message processing with Pydantic

When building systems to process messages, it’s not unlikely to find yourself in a situation where you need to process a number of inputted heterogeneous messages (i.e. messages of varying shapes/types). For example, consider a situation where you are processing messages from an SQS queue via a Lambda function. This post attempts to highlight how this can be achieved in a clean and elegant manner by utilizing Pydantic, Python’s typing system, and some helpers from the Python standard library....

Securing FastAPI with JWKS (AWS Cognito, Auth0)

This post is a quick capture of how to easily secure your FastAPI with any auth provider that provides JWKS. Background: RS256 RS256 is a signing algorithm used to generate and validate JSON Web Tokens (JWTs). Unlike the common HS256 algorithm that uses the same secret string to both generate and validate JWTs, RS256 uses a private key to generate JWTs and a separate public key for validating JWTs: RS256 generates an asymmetric signature, which means a private key must be used to sign the JWT and a different public key must be used to verify the signature....

Security-conscious cloud deployments from Github Actions via OpenID Connect

Goals This ticket is focused on how we can securely deploy to a major cloud provider environment (e.g. AWS, Azure, GCP) from within our Github Actions workflows. Why is this challenging? A naive solution to this problem is to generate some cloud provider credentials (e.g. AWS Access Keys) and to store them as a Github Secret. Our Github Actions can then utilize these credentials in its workflows. However, this technique contains a number of concerns:...

Roll your own PR preview CI pipeline

Goal We want a CI pipeline that will build and deploy an instance of our frontend application for every PR created in our frontend repo. Additionally, we want to be able to easily spin up applications with overridden configuration to allow developers to test the frontend against experimental backends. Finally, we want a reporting mechanism to inform developers when and where these deployed environments are available. Other Options Before you jump into this, consider that there are out-of-the-box solutions to solve this problem mentioned in the followup at the bottom of this page....

Putting animated SVGs of Terminal Output into Github READMEs

Have you just written a new ✨fancy CLI✨ and want to demo it in your Github Readme? Recording your terminal output is a nice way to demonstrate the experience. Here’s an example of what we’re going to make: Steps Install Dependencies asciinema: brew install asciinema svg-term-cli: npm install -g svg-term-cli Setup your terminal Some tips: Font/screen size matters. The asciinema output will look just as it does in your terminal. You’ll probably want to bump up the font-size and shrink down the terminal so that the text is legible in your README....

SSH tunnels in Python

At times, a developer may need to access infrastructure not available on the public internet. A common example of this is accessing a database located in a private subnet, as described in the VPC Scenario docs: Instances in the private subnet are back-end servers that don’t need to accept incoming traffic from the internet and therefore do not have public IP addresses; however, they can send requests to the internet using the NAT gateway....

Getting area of WGS-84 geometries in SqKm

Getting area of geometries in WGS-84/EPSG:4326 in square kilometers: 1 2 3 4 SELECT ST_Area(geometry, false) / 10^6 sq_km FROM my_table

Concurrent Python Example Script

Below is a very simple example of a script that I write and re-write more often than I would like to admit. It reads input data from a CSV and processes each row concurrently. A progress bar provides updates. Honestly, it’s pretty much just the concurrent.futures ThreadPoolExecutor example plus a progress bar.

An ECR Deployment Script

Below is a simple script to deploy a Docker image to ECR… 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 set -e log () { local bold=$(tput bold) local normal=$(tput sgr0) echo "${bold}${1}${normal}" 1>&2; } if [ -z "${AWS_ACCOUNT}" ]; then log "Missing a valid AWS_ACCOUNT env variable"; exit 1; else log "Using AWS_ACCOUNT '${AWS_ACCOUNT}'"; fi AWS_REGION=${AWS_REGION:-us-east-1} REPO_NAME=${REPO_NAME:-my/repo} log "🔑 Authenticating....

Using CloudFront as a Reverse Proxy

Alternate title: How to be master of your domain. The basic idea of this post is to demonstrate how CloudFront can be utilized as a serverless reverse-proxy, allowing you to host all of your application’s content and services from a single domain. This minimizes a project’s TLD footprint while providing project organization and performance along the way. Why Within large organizations, bureaucracy can make it a challenge to obtain a subdomain for a project....

How to generate a database URI from an AWS Secret

A quick note about how to generate a database URI (or any other derived string) from an AWS SecretsManager SecretTargetAttachment (such as what’s provided via a RDS DatabaseInstance’s secret property). 1 2 3 4 5 6 7 8 9 10 11 db = rds.DatabaseInstance( # ... ) db_val = lambda field: db.secret.secret_value_from_json(field).to_string() task_definition.add_container( environment=dict( # ... PGRST_DB_URI=f"postgres://{db_val('username')}:{db_val('password')}@{db_val('host')}:{db_val('port')}/", ), # ... )

Tips for working with a large number of files in S3

I would argue that S3 is basically AWS’ best service. It’s super cheap, it’s basically infinitely scalable, and it never goes down (except for when it does). Part of its beauty is its simplicity. You give it a file and a key to identify that file, you can have faith that it will store it without issue. You give it a key, you can have faith that it will return the file represented by that key, assuming there is one....

Boilerplate for S3 Batch Operation Lambda

S3 Batch Operation provide a simple way to process a number of files stored in an S3 bucket with a Lambda function. However, the Lambda function must return particular Response Codes. Below is an example of a Lambda function written in Python that works with AWS S3 Batch Operations.

Parsing S3 Inventory CSV output in Python

S3 Inventory is a great way to access a large number of keys in an S3 Bucket. Its output is easily parsed by AWS Athena, enabling queries across the key names (e.g. find all keys ending with .png) However, sometimes you just need to list all of the keys mentioned in the S3 Inventory output (e.g. populating an SQS queue with every keyname mentioned in an inventory output). The following code is an example of doing such task in Python:

A PIL-friendly class for S3 objects

Here’s a quick example of creating an file-like object in Python that represents an object on S3 and plays nicely with PIL. This ended up being overkill for my needs but I figured somebody might get some use out of it.

Using CloudFormation's Fn::Sub with Bash parameter substitution

Let’s say that you need to inject a large bash script into a CloudFormation AWS::EC2::Instance Resource’s UserData property. CloudFormation makes this easy with the Fn::Base64 intrinsic function: 1 2 3 4 5 6 7 8 9 10 11 12 AWSTemplateFormatVersion: '2010-09-09' Resources: VPNServerInstance: Type: AWS::EC2::Instance Properties: ImageId: ami-efd0428f InstanceType: m3.medium UserData: Fn::Base64: | #!/bin/sh echo "Hello world" In your bash script, you may even want to reference a parameter created elsewhere in the CloudFormation template....

Serve an Esri Web AppBuilder web app from HTTP

When an Esri Web AppBuilder web app is configured with a portalUrl value served from HTTPS, the web app automatically redirects users to HTTPS when visited via HTTP. While this is best-practice in production, it can be a burden in development when you want to quickly run a local version of the web app. Below is a quick script written with Python standard libraries to serve a web app over HTTP....

Hosting Jupyter at a subdomain via Cloudflare

Full Disclosure: I am NOT an expert at Jupyter or Anaconda (which I am using in this project), there may be some bad habits below… Below is a quick scratchpad of the steps I took to serve Jupyter from a subdomain. Jupyter is running behind NGINX on an OpenStack Ubuntu instance and the domain’s DNS is set up to use Cloudflare to provides convenient SSL support. I was suprised by the lack of documentation for this process, prompting me to document my steps taken here....

Django Admin Fu, part 2

Continuing with the Django Admin Fu post part 1. Action with Intermediate Page Sometimes you may need an admin action that, when submitted, takes the user to a form where they provides some additional detail. The docs mention a bit about providing intermediate pages, but not a lot. It states: Generally, something like [writing a intermediate page through the admin] isn’t considered a great idea. Most of the time, the best practice will be to return an HttpResponseRedirect and redirect the user to a view you’ve written, passing the list of selected objects in the GET query string....