Skip to content
Open to board advisory and board seats — 2H 2026, then CY 2027–2028.
See details →
Writing

AWS Cost Levers That Actually Moved the Needle

Cutting ~35% off a multi-region AWS footprint with no capability loss — the levers in the order they paid back, best first.

May 5, 2026 6 min read 679 words All postsTable of contents

Cut ~35% off a multi-region AWS footprint over two quarters, no capability loss, no "right-size your instances" nonsense. Here are the levers in the order they paid back, best first.

This is intentionally opinionated. If your workload is different, the order will shift. But the categories are almost always the same.

1. NAT Gateway traffic (the quiet killer)

Every VPC with private subnets and a NAT gateway is paying $0.045/GB for data out, plus transit. Most orgs discover their NAT gateway bill is 2–4x their EC2 bill and assume it's fine because "networking."

Three moves:

  • VPC endpoints for S3, DynamoDB, ECR, Secrets Manager, SSM. These are free (gateway endpoints) or cheap (interface endpoints). Most orgs have one or two. Add the rest.
  • VPC endpoints for CloudWatch Logs. This is the one people miss. Log shipping through NAT is brutal at scale.
  • ECR pull-through cache + interface endpoint. Container image pulls on deploy are a massive NAT line item on a busy cluster.

Pay-back: usually first billing cycle.

2. CloudWatch Logs retention & ingest

CloudWatch Logs is the AWS bill people ignore until it's the biggest line item. Ingest is $0.50/GB. Storage is $0.03/GB/month. Retention defaults to "never expire."

  • Set retention on every log group. Most don't need more than 30 days.
  • Compliance-required retention: ship to S3 Glacier / S3 IA, not CloudWatch. Your auditor doesn't care where it lives, only that it exists and is immutable.
  • Sample verbose application logs before ingest. A 90% sample on DEBUG/INFO is usually fine.

3. Forgotten RDS snapshots, EBS snapshots, unused AMIs

These don't show up on anyone's dashboard because they're not running. They're just billed, forever.

  • AWS Backup lifecycle rules for anything with a backup policy.
  • Delete AMIs older than N versions — most teams keep 50+ and use 2.
  • Audit manual RDS snapshots annually. You'll find snapshots from employees who left two years ago.

4. Reserved Instances / Savings Plans — but honestly

Everyone recommends RIs/SPs first. I put them fourth because the previous three are bigger levers and don't lock you into a commit.

If you buy SPs, buy compute SPs, not EC2-specific ones. The flexibility is almost always worth the couple-percent lower discount. And don't buy 3-year commits on anything that might move to Graviton, Fargate, or serverless in the next year.

5. Dev/staging shutdown schedules

Surprisingly manual but surprisingly effective. Most non-prod environments run 168 hours/week and are used during ~50.

  • Instance Scheduler or a Lambda + EventBridge for EC2/RDS/ECS.
  • The hard part is political, not technical. Someone always has a reason their dev env needs 24/7. Make them write the reason down.

6. S3 lifecycle + Intelligent-Tiering

S3 Intelligent-Tiering is a no-brainer for any bucket where access patterns are unknown. Lifecycle rules are still better when you do know the access pattern.

  • Application logs → Glacier after 30d, delete after 1y.
  • Old builds and artifacts → lifecycle to delete.
  • Data lake raw zone → Intelligent-Tiering.

7. Stuff that looks like a lever but mostly isn't

  • Right-sizing EC2. Useful at the margin but usually a single-digit percentage unless you have a specific over-provisioning problem.
  • Spot instances. High variance, operationally expensive unless you've already built for it. Good for batch. Dangerous for anything with state.
  • Graviton migration. Real savings — 20% is the number I've seen on right-sized workloads, and more is possible on the right ones. But it requires per-workload validation. Worth planning for on a 12-month horizon, not a 30-day one. If you already have a mature build/test matrix across architectures, move it up.

The meta-lever

Per-team cost visibility. Until an engineering manager can see their team's AWS bill weekly and owns that number, nothing you do centrally will stick. The biggest cost reduction in the list above is the one that happens automatically when the team that spends the money is the team that sees the invoice.

Tag aggressively. Use Cost & Usage Reports to Athena, not the Cost Explorer UI, for anything serious. Send per-team weekly digests. Make the bill a first-class KPI for the people who can actually move it.

That's the real lever.

AWSCloudFinOpsDevOps