Cutting AWS Costs Without Cutting Performance

Category: Cloud Published: March 28, 2026 Read time: 8 min

Every AWS audit we have run in the last two years has produced at least 30% in savings without any performance impact. The savings are not exotic. They come from applying a short list of boring optimizations consistently. Here they are.

Before you touch a single instance, get your visibility right. AWS Cost Explorer (free) and the Cost and Usage Report (CUR) loaded into Athena or QuickSight are the two tools we live in during the first week of any engagement. They tell you where the money actually goes — and the answer is rarely where the team thinks it is. In a typical mid-size account we find that three or four line items (EC2, RDS, NAT Gateway data processing, and S3 plus egress) account for roughly 80% of the bill. Optimize those first and the rest is noise. This discipline of measuring, attributing, then cutting is the core of FinOps, and it is exactly how our cloud and DevOps team approaches every account we inherit.

1. Right-size EC2 and RDS

The majority of EC2 and RDS instances we audit are running at under 15% CPU. Move them down one size and watch the bill drop. CloudWatch's Compute Optimizer gives you the recommendations for free.

The practical workflow looks like this:

Enable AWS Compute Optimizer and let it gather at least 14 days of metrics so recommendations reflect real peaks, not a quiet weekend.
For each over-provisioned instance, drop one size at a time (for example m5.2xlarge to m5.xlarge) rather than two — halving vCPU and RAM in one jump is how you cause an incident.
For RDS, look at the buffer cache hit ratio and provisioned IOPS before shrinking; a memory-bound database needs RAM more than vCPU, and moving from io1 to gp3 alone can cut storage cost 20% with the same throughput.
Schedule the change during a low-traffic window and watch p99 latency for 48 hours before declaring victory.

2. Move to Graviton where possible

Graviton (arm64) instances are typically 20% cheaper and often faster for common workloads. Most modern Node.js, Python, and Java services run on Graviton without any code change.

3. Savings Plans and Reserved Instances

If you have any steady baseline usage, a 1-year Compute Savings Plan at no upfront saves 25–30% immediately. This is the single highest-leverage action in any audit.

The mental model that keeps teams out of trouble: treat your usage as a layer cake. The bottom layer is the steady baseline that runs 24/7 every month — commit that to Savings Plans or Reserved Instances. The middle layer is predictable daytime or business-hours load — cover part of it, but stay conservative. The top layer is spiky, seasonal, or experimental capacity — leave that on on-demand or Spot. A useful rule of thumb is to commit to roughly 70–80% of your trailing baseline, never 100%, so a workload migration or a deprecated service never leaves you paying for a commitment you no longer use.

Compute Savings Plans — flexible across EC2, Fargate, and Lambda, and across instance family and Region. This is our default for almost everyone.
EC2 Instance Savings Plans — a slightly deeper discount in exchange for locking to a family in a Region.
Reserved Instances — still the right tool for RDS, ElastiCache, OpenSearch, and Redshift, which Savings Plans do not cover.

Only commit to what you are confident will run for 12+ months. Leave burst capacity on on-demand pricing. A 3-year all-upfront plan looks tempting on the spreadsheet, but in a fast-moving startup the architecture you commit to today is rarely the one you run in two years.

4. Kill idle resources

Unused EBS volumes attached to deleted instances
Old RDS snapshots and EBS snapshots past retention
Elastic IPs not attached to a running resource
NAT Gateways in dev/staging that should be shut off overnight

5. S3 lifecycle policies

Move data older than 30 days to Standard-IA, older than 90 to Glacier Instant Retrieval, and truly archival to Glacier Deep Archive. Set it once in Terraform and forget.

6. CloudWatch log retention

The default retention for CloudWatch Logs is "Never Expire." On a busy system this gets expensive fast. Set retention to 30–90 days for app logs and ship long-term logs to S3.

7. NAT Gateway data transfer

NAT Gateway charges per GB processed are the silent killer of AWS bills. Use VPC endpoints for S3, DynamoDB, and ECR. For cross-AZ traffic, audit which services actually need NAT and route the rest through VPC endpoints or private subnets correctly.

8. Spot for non-critical workloads

CI runners, batch jobs, and stateless background workers should run on Spot. Savings of 60–90% versus on-demand, with very low interruption rates for most instance families.

9. CloudFront + caching

Pushing static assets and cacheable API responses through CloudFront reduces both compute load and egress bandwidth cost. Measure your cache hit ratio — under 80% usually means the cache policy needs tuning.

10. Tag everything, then track it

Cost allocation tags + Cost Explorer let you see which service, environment, or team is spending what. You cannot optimize what you cannot attribute. Enforce tags via SCPs or Terraform policy.

At minimum, standardize on a small set of mandatory keys — Environment, Team, Service, and CostCenter — and reject any resource that ships without them. Activate the tags as cost allocation tags in the Billing console (they are not retroactive, so do this early), then build an AWS Budgets alert per team so a runaway cost surfaces in Slack within hours instead of on next month's invoice. Once attribution is in place, monthly cost review becomes a ten-minute conversation instead of an archaeology project.

Don't forget the AI and data line items

The fastest-growing surprise on modern AWS bills is machine learning. SageMaker notebook instances and real-time inference endpoints left running overnight, idle GPU capacity, and unbatched model calls quietly add up. If you run training or inference at scale, batch your jobs, use Spot for training, scale endpoints to zero when idle, and right-size GPU instances the same way you would any other compute. Our AI and machine learning team treats inference cost as a first-class design constraint, not an afterthought — the cheapest token is the one you never had to compute twice.

Putting it all together

Do not try to do all ten at once. Run Cost Explorer for a week, pick the three biggest line items, and apply the relevant optimizations. Savings compound — every audit we run finds another 10% after the last one. The order we usually recommend:

Quick wins (week 1): delete idle resources, fix CloudWatch log retention, set S3 lifecycle rules. Zero risk, immediate savings.
Structural wins (weeks 2–4): right-size compute, add VPC endpoints, move eligible workloads to Graviton and Spot.
Commitment wins (month 2): once usage is stable and right-sized, buy Savings Plans against the new, lower baseline — never against the bloated old one.

The biggest mistake we see is buying a three-year commitment before right-sizing, which locks in the waste. Optimize the architecture first, then commit to what is left. If you want a second pair of eyes on your bill, our team is happy to talk — we will tell you the truth even when the answer is “there is nothing left to cut.”

Frequently Asked Questions

How much can AWS cost optimization typically save?

In our experience, a first audit of an account that has never been optimized usually finds 30 to 50 percent in savings without any performance impact. The biggest levers are right-sizing over-provisioned compute, buying Savings Plans against a stable baseline, and cutting NAT Gateway and data transfer waste. Accounts that are already well managed see smaller but still meaningful gains, often around 10 to 15 percent.

Will reducing AWS costs hurt application performance?

Done correctly, no. The optimizations that drive most of the savings are zero-risk: deleting idle resources, setting log retention, adding S3 lifecycle rules, and committing to Savings Plans. Right-sizing carries a little more risk, so we change one instance size at a time and watch p99 latency for 48 hours before continuing. The goal is to remove waste, not capacity you actually use.

What is the difference between Savings Plans and Reserved Instances?

Savings Plans give you a discount in exchange for committing to a steady dollar-per-hour spend, and Compute Savings Plans stay flexible across EC2, Fargate, and Lambda, plus instance family and Region. Reserved Instances lock you to a specific configuration but are still the right tool for services Savings Plans do not cover, such as RDS, ElastiCache, OpenSearch, and Redshift. For most teams a 1-year no-upfront Compute Savings Plan is the best starting point.

How long does an AWS cost audit take?

We collect at least one to two weeks of Cost Explorer and CloudWatch data so recommendations reflect real traffic peaks rather than quiet periods. The quick wins can be applied in the first week, structural changes like right-sizing and VPC endpoints over the following two to four weeks, and commitment purchases such as Savings Plans in month two once usage has stabilized at its new lower baseline.

← Back to blog