Scaling to 25M Monthly API Calls on AWS for $150/Month: A Deep Dive
As a developer of a popular geolocation API, one of our main challenges has been building an infrastructure that can handle our rapid growth in usage without costs spiraling out of control. After a major outage on Black Friday 2021 that impacted many of our customers, it became clear that our existing server-based architecture was not going to cut it going forward.
We set out to design a new infrastructure that could meet the following goals:
- Handle spiky, unpredictable traffic patterns seamlessly
- Maintain low latency for users worldwide
- Minimize costs
After extensive research and testing, we landed on a solution leveraging AWS API Gateway, Lambda, and DynamoDB that has allowed us to achieve all of these goals. We‘re now reliably serving over 25 million API calls per month from 10 global endpoints with median response times under 30 ms – all for around $150 per month.
In this post, I‘ll deep dive into the details of this setup, sharing key lessons learned, benchmarks, cost breakdowns, and areas for further optimization. Whether you‘re running an API that‘s just starting to gain traction or one that‘s already at scale, I hope this provides some valuable insights that you can apply to your own infrastructure.
The Limitations of Our Server-Based Stack
Our API started out running on a fairly standard server-based stack:
- Python (using the Japronto framework)
- Redis for caching
- AWS EC2 instances behind Elastic Load Balancers (ELBs)
- Route53 for DNS with latency-based routing
This setup served us well in the early days, but as our traffic grew, a few key limitations became apparent:
-
Scaling was manual and slow. Whenever traffic spiked, we‘d have to manually spin up new EC2 instances, which could take several minutes. This often meant we‘d be struggling to keep up with demand.
-
Costs were unpredictable and high. EC2 and ELB costs scaled more or less linearly with traffic. While not terrible at low-to-moderate scale, these costs started to balloon as our usage grew. Estimating our monthly spend became difficult.
-
Outages in one region impacted all users. With all requests routing through one primary region, any issues there (like the Black Friday outage) meant downtime for everyone. We needed better isolation between regions.
-
Maintaining the EC2 instances was tedious. While we automated most of the deployment and configuration with tools like Ansible, managing a fleet of EC2 instances still required significant manual work.
We knew we needed an architecture that could scale instantly and automatically with demand, provide predictable and lower costs, deliver a great user experience globally, and abstract away server management.
Why API Gateway + Lambda + DynamoDB?
In evaluating alternatives, a few key things drew us to the combination of AWS API Gateway, Lambda, and DynamoDB:
-
Automatic scaling to any request volume. API Gateway and Lambda scale instantly to handle any level of traffic, with no need for manual intervention. You‘re only limited by your account-level safety quotas.
-
Very granular, usage-based pricing. With API Gateway and Lambda you pay only for what you use at a very granular level ($3.50 per million API calls and $0.20 per million Lambda requests at the time of writing). This makes costs very predictable and directly proportional to usage.
-
Multi-region deployment. API Gateway allows easily deploying your API to any number of AWS regions for improved latency and redundancy. Edge-optimized API endpoints ensure requests are served from the location closest to the user.
-
Serverless. No servers to manage. AWS handles all the underlying infrastructure, freeing us up to focus purely on our application code.
-
Scalable NoSQL database. DynamoDB provides a highly scalable, low-latency NoSQL datastore for persisting and serving our API usage data with minimal operational overhead.
However, while the basic building blocks were clear, designing the full multi-region architecture to meet our needs around usage tracking, rate limiting, and authentication posed some interesting challenges.
Architecture Overview
Here‘s a high-level view of the architecture we landed on:
[Architecture Diagram]Key components:
- API Gateway for the API frontend, deployed to 10 global regions
- Lambda function in each region to handle the actual API logic and respond to requests
- DynamoDB table in each region to store usage data (API keys, rate limits, usage counts)
- CloudWatch in each region for Lambda logs
- Kinesis stream to collect log data from all regions
- Separate "aggregator" Lambda to process the Kinesis stream and update the "master" DynamoDB table
- DynamoDB cross-region replication to sync the master table to all regional tables
The basic flow for an API request is:
- Request comes in to the closest API Gateway endpoint
- API Gateway routes the request to the Lambda function in the same region
- Lambda function checks the regional DynamoDB table for API key validation and rate limiting
- If validated and not rate-limited, Lambda does its processing and returns a response
- Lambda logs the request details to CloudWatch
- CloudWatch subscription filter pushes the log to the Kinesis stream
- Aggregator Lambda processes the Kinesis events and updates the master DynamoDB table
- DynamoDB streams replicate the updates to all regional tables
Let‘s dive into some of the key pieces in more detail.
Real-Time Usage Tracking and Rate Limiting
One of the most critical parts of the system is the real-time tracking of API usage for each API key. We need to know, at any given moment, how many requests each API key has made in each region so that we can enforce rate limits accurately.
We achieve this by having each Lambda log the details of each request it processes (timestamp, API key, origin IP, etc.) to CloudWatch Logs. A CloudWatch Subscription Filter then forwards these logs in real-time to a Kinesis Data Stream.
A separate "aggregator" Lambda function reads from this Kinesis stream, processes each log event, and updates the usage counters for the corresponding API key in a "master" DynamoDB table.
To make this usage data available in all regions with low latency, we use DynamoDB Global Tables to replicate the master table to a "local" table in each region. So when a Lambda function in any region needs to validate an API key or check its current usage, it can query the local DynamoDB table and get a response in single-digit milliseconds.
Here‘s a more detailed look at this flow:
[Diagram of usage tracking flow]A few key points:
-
By using a separate Kinesis stream and aggregator Lambda, we decouple the API processing from the usage tracking. This ensures that any issues or delays in the usage tracking system don‘t impact the responsiveness of the API itself.
-
Kinesis provides ordering guarantees and "at-least-once" delivery, ensuring we don‘t miss any usage events. In the case of duplicates, our aggregator Lambda is designed to be idempotent.
-
DynamoDB‘s strong consistency guarantees ensure that each Lambda always sees an up-to-date view of the usage counts, preventing any race conditions that could allow a user to exceed their rate limit.
API Key Validation
In addition to tracking usage, we also need to validate API keys on each request. To do this without adding latency, we replicate all API keys to the DynamoDB table in each region whenever a new key is created.
So the full flow for API key validation is:
- User includes their API key in the
X-API-KEY
header of the request - Lambda function extracts the key from the header
- Lambda does a
GetItem
on the local DynamoDB table to check if the key exists and is valid - If the key is valid, Lambda checks the key‘s usage against its rate limit
- If under the rate limit, request is allowed to proceed
By storing the keys in DynamoDB tables replicated to each region, we can perform this validation with very low latency, typically 1-2 ms.
Performance Optimization
With the core architecture in place, we spent a lot of time tuning and optimizing for performance. Some key optimizations:
-
Increasing Lambda memory size. Lambda allocates CPU proportional to memory. By increasing memory from the default 128MB up to 1GB, we saw a 3-4x improvement in Lambda execution time. The tradeoff is that you pay more per request, but for us it was worth it to ensure a consistently fast experience.
-
Enabling HTTP keep-alives. By reusing connections between API Gateway and Lambda, we shaved off ~50 ms per request.
-
Careful selection of Lambda and DynamoDB instance sizes in each region based on traffic. More traffic means more Lambda concurrency and more DynamoDB throughput required.
After these optimizations, we‘re seeing median response times around 30 ms globally, with p99 latencies typically under 200 ms. Here‘s a snapshot from a typical day:
[Screenshot of latency metrics from CloudWatch]Cost Breakdown
So what does it cost to run this all-in on AWS? Here‘s a breakdown of our typical monthly bill:
- API Gateway: ~$85 ($3.50 per million requests * 25 million)
- Lambda: ~$50 ($0.20 per million requests * 25 million)
- DynamoDB: ~$10 (very low read/write throughput)
- CloudWatch, Kinesis, misc.: ~$5
Total: ~$150/month
The biggest cost drivers by far are API Gateway and Lambda requests, which scale linearly with our usage. DynamoDB, CloudWatch, and Kinesis are almost negligible at our scale – the bulk of our DynamoDB usage falls under the free tier.
This comes out to about $0.000006 (6 ten-thousandths of a penny) in infrastructure costs per API call. A tiny fraction of what we used to pay for EC2 instances and a cost structure that scales perfectly with usage.
And of course, these cost components are all independent and can be optimized separately. If our DynamoDB costs start to rise, we can tweak the provisioned capacity or consider moving to on-demand mode. If API Gateway costs get out of hand, we could consider moving some traffic to AWS‘s new HTTP APIs which have a lower per-request cost.
Monitoring and Logging
With a distributed system like this, good monitoring and logging is critical. We rely heavily on CloudWatch for this.
For monitoring, we track key metrics like Lambda invocations, API Gateway latency and errors, and DynamoDB consumption. CloudWatch alarms alert us if any of these metrics go out of expected ranges.
For logging, we make extensive use of structured logs from our Lambdas, which we then analyze and query using CloudWatch Log Insights. This allows us to quickly investigate issues, track down specific requests, and get visibility into how the system is behaving.
Here‘s a snippet of a typical structured log from one of our Lambdas:
{
"requestId": "abc123",
"apiKey": "xyz789",
"origin": "1.2.3.4",
"path": "/v1/geolocation",
"responseStatus": 200,
"responseTime": 25,
"usageCount": 15000
}
Being able to easily query and analyze these logs has been invaluable in understanding our system‘s behavior and quickly debugging issues.
Future Optimizations and Improvements
While we‘re quite happy with this setup, there are always areas for further optimization and improvement. A few things on our radar:
-
HTTP API migration. AWS‘s newer HTTP API offering for API Gateway promises lower latency and costs compared to the REST API we‘re currently using. We‘re planning to test this out and potentially migrate some of our traffic.
-
DynamoDB DAX. If our DynamoDB read latencies start to creep up, we could consider adding DAX (DynamoDB Accelerator) to the mix. This in-memory cache can serve reads in microseconds.
-
Regional API keys. Currently, an API key is global – it works across all regions. But we could potentially move to a model where keys are region-specific, which would simplify our cross-region replication logic.
-
More granular usage tracking. With a bit of additional engineering, we could track usage not just by API key, but by specific endpoint, HTTP method, response code, etc. This would give us (and potentially our users) much more detailed insights into usage patterns.
-
HTTP caching. For endpoints that serve cacheable responses, we could potentially leverage API Gateway‘s built-in caching to significantly reduce the load on our Lambdas.
As with any system, there‘s no such thing as "done". We‘ll continue to iterate, measure, and optimize as our scale and needs evolve.
Conclusion
Building a highly scalable, performant, and cost-effective API in the cloud is no small feat. There are a lot of moving pieces to consider, and the "right" architecture will depend heavily on your specific needs and constraints.
For us, the combination of API Gateway, Lambda, and DynamoDB has proven to be a powerful one, allowing us to achieve our goals of near-infinite scalability, consistently low latency, and a very predictable and manageable cost structure.
But the real key has been continual measurement, iteration, and optimization. By instrumenting our system well and consistently analyzing our metrics and logs, we‘ve been able to steadily improve performance while keeping costs under control.
If you‘re embarking on a similar journey, I hope this deep dive has provided some valuable food for thought. While the specifics of your solution may look different, the general principles of decoupling, replication, careful resource sizing, and continuous optimization are widely applicable.
Wishing you all the best in your own API scaling adventures!