ScaledByDesign/Insights
ServicesPricingAboutContact
Book a Call
Scaled By Design

Fractional CTO + execution partner for revenue-critical systems.

Company

  • About
  • Services
  • Contact

Resources

  • Insights
  • Pricing
  • FAQ

Legal

  • Privacy Policy
  • Terms of Service

© 2026 ScaledByDesign. All rights reserved.

contact@scaledbydesign.com

On This Page

$28K/Month on AWS for a Mid-Stage StartupWhere the Money Was GoingThe Caching PyramidLayer 1: Browser Cache HeadersLayer 2: CDN Edge CachingLayer 3: Application Cache (Redis)Cache InvalidationThe ResultsPerformance Improvement (Bonus)Common Caching MistakesImplementation Priority
  1. Insights
  2. Architecture
  3. The Caching Strategy That Cut Our Client's AWS Bill by 60%

The Caching Strategy That Cut Our Client's AWS Bill by 60%

January 1, 2026·ScaledByDesign·
cachingawsperformanceinfrastructure

$28K/Month on AWS for a Mid-Stage Startup

Our client was processing 2M requests per day with a straightforward stack: Next.js frontend, Node.js API, PostgreSQL database, S3 for assets. Their AWS bill had crept from $4K to $28K over 18 months as traffic grew. The reflexive answer was "optimize the code." The actual answer was caching.

Where the Money Was Going

Monthly AWS breakdown (before):
  RDS (PostgreSQL):    $8,200  (29%)  ← database was the bottleneck
  EC2/ECS (compute):   $7,400  (26%)
  CloudFront + S3:     $4,100  (15%)
  ElastiCache:         $0      (0%)   ← no caching at all
  Data transfer:       $3,800  (14%)
  Other (monitoring):  $4,500  (16%)
  Total:               $28,000

The database was handling 12,000 queries per second. Most of them were identical reads being repeated thousands of times per hour.

The Caching Pyramid

Layer 1: Browser Cache (free, immediate)
  ├── Static assets: Cache for 1 year (immutable hashes)
  ├── API responses: Cache for 60-300 seconds (stale-while-revalidate)
  └── Impact: -30% of requests never hit your servers

Layer 2: CDN Cache (CloudFront/Vercel Edge)
  ├── HTML pages: Cache for 60 seconds + stale-while-revalidate
  ├── API responses: Cache for 30-300 seconds by route
  └── Impact: -50% of remaining requests never hit origin

Layer 3: Application Cache (Redis/ElastiCache)
  ├── Database query results: Cache for 60-3600 seconds
  ├── Computed values: Cache for hours/days
  └── Impact: -80% of database queries eliminated

Layer 4: Database Query Optimization
  ├── Only queries that MUST hit the database reach it
  └── Impact: Remaining queries are fast and efficient

Layer 1: Browser Cache Headers

// Set proper cache headers for different content types
 
// Static assets (JS, CSS, images with hashed filenames)
// Cache forever — the hash changes when content changes
res.setHeader("Cache-Control", "public, max-age=31536000, immutable");
 
// API responses that change infrequently (product catalog)
res.setHeader(
  "Cache-Control",
  "public, max-age=60, stale-while-revalidate=300"
);
// Serves cached version for 60s, then revalidates in background
// User always gets a fast response
 
// User-specific data (cart, account)
res.setHeader("Cache-Control", "private, no-cache");
// Never cache — always fresh
 
// HTML pages
res.setHeader(
  "Cache-Control",
  "public, max-age=0, s-maxage=60, stale-while-revalidate=300"
);
// Browser always checks, CDN caches for 60s

Impact: 30% of requests eliminated. Browser serves from local cache without any network request.

Layer 2: CDN Edge Caching

CloudFront behaviors configured per path:

/api/products/*     → Cache 5 min, vary by query string
/api/categories/*   → Cache 1 hour, vary by nothing
/api/search?*       → Cache 2 min, vary by full query string
/api/cart/*          → No cache (user-specific)
/api/user/*          → No cache (user-specific)
/_next/static/*      → Cache 1 year (immutable)
/images/*            → Cache 1 year (immutable, transformed)
/*.html              → Cache 60s, stale-while-revalidate 5 min

Impact: 50% of remaining requests served from CDN edge. Origin server load drops dramatically.

Layer 3: Application Cache (Redis)

This is where the biggest savings happen:

// Generic caching wrapper with automatic invalidation
async function cached<T>(
  key: string,
  ttlSeconds: number,
  fetcher: () => Promise<T>
): Promise<T> {
  // Check Redis first
  const cachedValue = await redis.get(key);
  if (cachedValue) return JSON.parse(cachedValue);
 
  // Cache miss — fetch from database
  const value = await fetcher();
 
  // Store in Redis with TTL
  await redis.setex(key, ttlSeconds, JSON.stringify(value));
 
  return value;
}
 
// Usage: Product catalog (changes rarely)
async function getProduct(id: string) {
  return cached(`product:${id}`, 3600, async () => {
    return db.query("SELECT * FROM products WHERE id = $1", [id]);
  });
}
 
// Usage: Category listing (changes daily)
async function getCategories() {
  return cached("categories:all", 1800, async () => {
    return db.query("SELECT * FROM categories ORDER BY sort_order");
  });
}
 
// Usage: Search results (changes frequently but can be stale for 60s)
async function searchProducts(query: string, page: number) {
  const cacheKey = `search:${query}:${page}`;
  return cached(cacheKey, 60, async () => {
    return db.query("SELECT * FROM products WHERE ...", [query]);
  });
}

Cache Invalidation

// When data changes, invalidate affected cache keys
async function updateProduct(id: string, data: ProductUpdate) {
  await db.query("UPDATE products SET ... WHERE id = $1", [id, ...]);
 
  // Invalidate specific product cache
  await redis.del(`product:${id}`);
 
  // Invalidate category listing (product might affect it)
  await redis.del("categories:all");
 
  // Invalidate search cache (pattern delete)
  const searchKeys = await redis.keys("search:*");
  if (searchKeys.length > 0) await redis.del(...searchKeys);
}

Impact: Database queries dropped from 12,000/sec to 2,400/sec. 80% reduction.

The Results

Monthly AWS breakdown (after):
  RDS (PostgreSQL):    $3,200  (-61%)  ← downsized instance
  EC2/ECS (compute):   $3,100  (-58%)  ← fewer instances needed
  CloudFront + S3:     $2,400  (-41%)  ← better cache hit ratio
  ElastiCache (Redis): $1,200  (new)   ← small Redis instance
  Data transfer:       $800    (-79%)  ← CDN serves most traffic
  Other (monitoring):  $300    (-93%)
  Total:               $11,000 (-61%)

  Monthly savings: $17,000
  Annual savings: $204,000
  Implementation cost: ~$15,000 (2 weeks of engineering)
  ROI: 13.6x in year one

Performance Improvement (Bonus)

Average API response time:
  Before: 340ms (p50), 1,200ms (p99)
  After:  12ms (p50, cache hit), 180ms (p99, cache miss)

Page load time:
  Before: 2.8s
  After:  0.9s

Database CPU utilization:
  Before: 78% average (spikes to 95%)
  After:  22% average (spikes to 45%)

Common Caching Mistakes

❌ Caching everything with the same TTL
   → Different data needs different freshness guarantees

❌ No cache invalidation strategy
   → Stale data is worse than slow data

❌ Caching user-specific data in shared cache
   → User A sees User B's cart (security incident)

❌ Not monitoring cache hit rates
   → You don't know if caching is actually working

❌ Cache stampede on expiration
   → 1,000 requests hit the DB simultaneously when cache expires
   → Fix: Use stale-while-revalidate or cache locking

Implementation Priority

Week 1: Browser cache headers (free, immediate impact)
Week 2: CDN configuration (CloudFront/Vercel behaviors)
Week 3: Redis for top 10 most-hit database queries
Week 4: Cache invalidation and monitoring dashboard

Total effort: 4 weeks of focused engineering
Expected savings: 40-60% of current infrastructure costs

Caching isn't glamorous. But it's the highest-ROI infrastructure investment most startups can make. Before you scale up your database, add more servers, or rewrite your application — add a caching layer. The math almost always works in your favor.

Previous
Observability That Actually Helps You Sleep at Night
Next
The Real Cost of Microservices at Your Scale
Insights
Why You Should Start With a MonolithEvent-Driven Architecture for the Rest of UsThe Real Cost of Microservices at Your ScaleThe Caching Strategy That Cut Our Client's AWS Bill by 60%API Design Mistakes That Will Haunt You for YearsMulti-Tenant Architecture: The Decisions You Can't UndoCI/CD Pipelines That Actually Make You FasterThe Rate Limiting Strategy That Saved Our Client's APIWhen to Rewrite vs Refactor: The Decision Framework

Ready to Ship?

Let's talk about your engineering challenges and how we can help.

Book a Call