ScaledByDesign/Insights
ServicesPricingAboutContact
Book a Call
Scaled By Design

Fractional CTO + execution partner for revenue-critical systems.

Company

  • About
  • Services
  • Contact

Resources

  • Insights
  • Pricing
  • FAQ

Legal

  • Privacy Policy
  • Terms of Service

© 2026 ScaledByDesign. All rights reserved.

contact@scaledbydesign.com

On This Page

The Rewrite That Almost Killed the CompanyWhy Big-Bang Rewrites FailThe Strangler Fig PatternStep 1: Put a Router in FrontStep 2: Identify the Migration OrderStep 3: Build, Shadow, SwitchStep 4: Strangle IncrementallyThe Database ProblemThe ResultsWhen the Strangler Fig Doesn't WorkWhat This Means for Your Legacy System
  1. Insights
  2. Architecture
  3. The Strangler Fig Migration That Saved a 10-Year-Old Monolith

The Strangler Fig Migration That Saved a 10-Year-Old Monolith

February 11, 2026·ScaledByDesign·
migrationmonolithmicroserviceslegacy

The Rewrite That Almost Killed the Company

A Series B e-commerce company came to us with a familiar story: their 10-year-old PHP monolith was "unmaintainable." The previous CTO had started a ground-up rewrite in Node.js. Eighteen months and $2M later, the rewrite was 40% complete, had zero production traffic, and the original monolith was still getting worse.

They were about to double down on the rewrite. We talked them out of it.

If you've ever inherited a legacy codebase and thought "we should just rewrite this from scratch," you're not alone. It's one of the most tempting — and most dangerous — decisions in software engineering. Let us show you what we did instead.

Why Big-Bang Rewrites Fail

Here's the uncomfortable truth: most ground-up rewrites of production systems fail or dramatically exceed their timeline. And the reasons aren't what you'd expect — they're structural, not technical:

  • The moving target problem: The old system keeps getting new features while you rewrite. Feature parity is a finish line that keeps moving.
  • Hidden complexity: That "ugly" legacy code handles thousands of edge cases discovered over a decade. Clean-room rewrites rediscover each one painfully.
  • Team fatigue: Rewrite projects feel exciting in month 1 and soul-crushing by month 12. No new features, no user impact, just endless catch-up.
  • The 80/20 trap: You can rewrite 80% of functionality in 20% of the time. The last 20% — the weird edge cases, integrations, and business rules — takes forever.

The Strangler Fig Pattern

So what's the alternative? It's called the strangler fig pattern, named after the tropical fig that grows around a host tree and eventually replaces it. (Nature's been doing incremental migrations for millions of years — we might as well learn from it.)

The idea is simple: you put a routing layer in front of the legacy system, then redirect traffic to new implementations one endpoint at a time. The old system keeps handling everything you haven't migrated yet.

The key insight — and this is what makes it psychologically different from a rewrite — you never turn off the old system. There's no big-bang cutover. No "go/no-go" meeting at 2am on a Saturday.

Step 1: Put a Router in Front

Before writing a single line of new application code, we put an nginx reverse proxy in front of the PHP monolith:

upstream legacy {
  server php-monolith:80;
}
 
upstream new_api {
  server node-api:3000;
}
 
server {
  # Default: everything to legacy
  location / {
    proxy_pass http://legacy;
  }
 
  # Migrated endpoints go to new service
  # (empty for now — we add these incrementally)
}

At this point, nothing changed about how the system worked. The proxy was completely transparent. But it gave us something powerful: the ability to redirect any endpoint to a new service with a single config change. That's the foundation everything else builds on.

Step 2: Identify the Migration Order

Here's where teams often go wrong — they start migrating the easiest endpoints, not the most valuable ones. We scored every endpoint on three dimensions:

FactorQuestionWeight
Business valueHow critical is this endpoint to revenue?40%
Change frequencyHow often does this code change?35%
ComplexityHow tangled is this with other code?25%

Priority matrix:

  • High value + High change + Low complexity = Migrate FIRST
  • High value + Low change + Any complexity = Migrate LATER
  • Low value + Any change + High complexity = Migrate LAST (or never)

The product catalog API was our first target: high traffic, changed weekly, and relatively self-contained.

Step 3: Build, Shadow, Switch

This is where it gets interesting. For each endpoint, we didn't just build and ship. We ran all three phases in parallel — and the shadow phase is what saved us from countless production issues:

// Phase A: Build the new implementation
// New service reads from same database as legacy
app.get("/api/products/:id", async (req, res) => {
  const product = await productService.getById(req.params.id);
  res.json(product);
});
 
// Phase B: Shadow traffic — both handle requests, compare results
app.get("/api/products/:id", async (req, res) => {
  const [newResult, legacyResult] = await Promise.all([
    productService.getById(req.params.id),
    legacyProxy.get(`/api/products/${req.params.id}`),
  ]);
 
  if (!deepEqual(newResult, legacyResult)) {
    logger.warn("Response mismatch", {
      endpoint: `/api/products/${req.params.id}`,
      diff: generateDiff(legacyResult, newResult),
    });
  }
 
  // Still serve legacy response during shadow phase
  res.json(legacyResult);
});
 
// Phase C: Switch traffic to new service via nginx config
// One config change. Instant. Reversible. Zero downtime.

The shadow phase was the secret weapon. We ran it for 2 weeks per endpoint, comparing every response between old and new. And here's the fun part — most of the mismatches we caught? They were bugs in the legacy system that nobody knew about. The migration actually improved data quality as a side effect.

Step 4: Strangle Incrementally

With the pattern established, we settled into a rhythm. Over 8 months, we migrated endpoints in priority order:

Month 1: Product catalog API (read-only, high traffic)
Month 2: Search API (complex but self-contained)
Month 3: User authentication (needed modern JWT)
Month 4: Cart and checkout (highest business value)
Month 5: Order management (complex, many edge cases)
Month 6: Inventory sync (integration-heavy)
Month 7: Reporting and analytics (new data models)
Month 8: Admin tools (lowest priority, highest complexity)

Each migration followed build → shadow → switch. Each was independently deployable and reversible.

The Database Problem

Now, everything we've talked about so far is the easy part. Seriously. The hardest part of any migration isn't the code — it's the data. During the transition period, both systems need to read and write the same data without stepping on each other.

We used CDC (Change Data Capture) to keep databases in sync during the transition:

const cdcStream = createCDCStream({
  source: "legacy_postgres",
  tables: ["products", "categories", "inventory"],
  target: "new_postgres",
  transform: (record) => schemaMapper.transform(record),
  onConflict: "source_wins", // Legacy is truth until cutover
});

The hybrid approach added complexity, but it let us migrate databases one domain at a time — just like the code.

The Results

So, did it work? After 8 months of incremental migration:

MetricBeforeAfter
Deploy frequencyEvery 2 weeksMultiple times/day
Deploy time45 minutes3 minutes
Incidents/month3-4~1
Developer onboarding6 weeks2 weeks
Rollbacks neededN/A3 (resolved in <5 min each)

Total cost: ~$400K over 8 months. The failed rewrite had already burned $2M with nothing to show for it.

Zero downtime during the entire migration. Not a single customer-facing outage caused by the migration itself.

When the Strangler Fig Doesn't Work

We're not going to pretend this is a silver bullet. It's not always the right choice, and we'd be doing you a disservice if we didn't say so:

Use the strangler fig when:

  • The legacy system is in production with real traffic
  • The business can't afford downtime or feature freezes
  • The system is large enough that a rewrite would take 6+ months
  • You can put a routing layer in front of the legacy system

Consider a rewrite when:

  • The system is small (under 10K lines of code)
  • No production traffic depends on it
  • The technology is truly dead (no security patches available)
  • You're changing the fundamental architecture, not just the language

What This Means for Your Legacy System

Here's the uncomfortable truth that nobody talks about: every system becomes legacy eventually. The PHP monolith we strangled was someone's clean, modern architecture 10 years ago. The Node.js services we built will be someone else's legacy in 2036. Probably sooner.

The goal isn't to build a system that never needs replacing — that's a fantasy. It's to build systems that can be replaced incrementally when the time comes. Clear boundaries. Well-defined APIs. Services that are independent enough to swap out one at a time.

The strangler fig doesn't just replace old code — it teaches you how to build systems that are replaceable by design. And honestly? That's a more valuable lesson than any technology choice you'll ever make.

Previous
Prompt Engineering Is Dead — Context Engineering Is What Matters
Insights
The Strangler Fig Migration That Saved a 10-Year-Old MonolithWhy You Should Start With a MonolithEvent-Driven Architecture for the Rest of UsThe Real Cost of Microservices at Your ScaleThe Caching Strategy That Cut Our Client's AWS Bill by 60%API Design Mistakes That Will Haunt You for YearsMulti-Tenant Architecture: The Decisions You Can't UndoCI/CD Pipelines That Actually Make You FasterThe Rate Limiting Strategy That Saved Our Client's APIWhen to Rewrite vs Refactor: The Decision Framework

Ready to Ship?

Let's talk about your engineering challenges and how we can help.

Book a Call