Running Spot Instances Effectively With Amazon EKS

Since we started working on HEY, one of the things that I’ve been a big proponent of was keeping as much of the app-side compute infrastructure on spot instances as possible (front-end and async job processing; excluding the database, Redis, and Elasticsearch). Coming out of our first two weeks running the app with a real production traffic load, we’re sitting at ~90% of our compute running on spot instances.

A good rundown of how Hey are running 90% of their compute workload on spot instances. It’s interesting to see where it has and hasn’t worked for them along with a number of gotchas they have hit along the way.