ScaledByDesign/Insights
ServicesPricingAboutContact
Book a Call
Scaled By Design

Fractional CTO + execution partner for revenue-critical systems.

Company

  • About
  • Services
  • Contact

Resources

  • Insights
  • Pricing
  • FAQ

Legal

  • Privacy Policy
  • Terms of Service

© 2026 ScaledByDesign. All rights reserved.

contact@scaledbydesign.com

On This Page

You're Probably Already Event-Driven (Badly)The Problem With Synchronous EverythingThe Event-Driven VersionChoosing Your Event InfrastructureOption 1: Database-Backed Queue (Simplest)Option 2: Redis + BullMQ (Sweet Spot)Option 3: SQS/SNS or Cloud Pub/Sub (Managed)Option 4: Kafka (You Probably Don't Need This)The Event Design ChecklistRule 1: Events Are Facts, Not CommandsRule 2: Include Enough Data (But Not Too Much)Rule 3: Make Events IdempotentRule 4: Handle Failures GracefullyThe Migration PathWhen Not to Use EventsThe Payoff
  1. Insights
  2. Architecture
  3. Event-Driven Architecture for the Rest of Us

Event-Driven Architecture for the Rest of Us

January 6, 2026·ScaledByDesign·
architectureeventsbackendscaling

You're Probably Already Event-Driven (Badly)

If your system sends an email when an order is placed, updates inventory when a shipment ships, or syncs data when a customer signs up — congratulations, you have event-driven architecture. It's just hidden inside synchronous API calls and cron jobs that are slowly becoming unmaintainable.

Let's make it intentional.

The Problem With Synchronous Everything

Here's what a typical order placement looks like in a "synchronous-first" system:

// The monolithic order handler that does everything
async function placeOrder(orderData: OrderInput) {
  const order = await createOrder(orderData);        // 50ms
  await chargePayment(order);                         // 800ms
  await updateInventory(order);                       // 100ms
  await sendConfirmationEmail(order);                 // 300ms
  await notifyWarehouse(order);                       // 200ms
  await updateAnalytics(order);                       // 150ms
  await syncToCRM(order);                             // 400ms
  await triggerLoyaltyPoints(order);                  // 100ms
  return order; // Total: ~2100ms
}

Problems with this approach:

  1. Customer waits 2+ seconds for all downstream operations
  2. If email service is down, the entire order fails
  3. Adding a new integration means modifying the order handler
  4. Testing requires mocking 7 external services
  5. One slow service slows down everything

Why this costs you: One client had synchronous order processing with 8 downstream integrations. When their CRM provider had a 90-second timeout issue, every order took 90+ seconds to complete. Checkout abandonment spiked from 68% to 89% during the 45-minute incident. Orders lost: 680. Revenue impact: $67K. The CRM sync wasn't even customer-facing — it was internal analytics that could have been async.

The Event-Driven Version

// The order handler does ONE thing: create the order and emit an event
async function placeOrder(orderData: OrderInput) {
  const order = await createOrder(orderData);
  await chargePayment(order);
  
  // Emit the event — everything else happens asynchronously
  await eventBus.emit("order.placed", {
    orderId: order.id,
    customerId: order.customerId,
    total: order.total,
    items: order.items,
  });
 
  return order; // Total: ~850ms
}
 
// Separate handlers that react to the event
eventBus.on("order.placed", sendConfirmationEmail);
eventBus.on("order.placed", updateInventory);
eventBus.on("order.placed", notifyWarehouse);
eventBus.on("order.placed", updateAnalytics);
eventBus.on("order.placed", syncToCRM);
eventBus.on("order.placed", triggerLoyaltyPoints);

What changed:

  1. Customer waits < 1 second (only payment + order creation)
  2. If email is down, the order still succeeds — email retries later
  3. Adding a new integration = adding a new handler (no changes to order code)
  4. Each handler is testable in isolation
  5. Slow services don't block the critical path

The impact:

  • Order completion time: 2,100ms → 850ms (60% faster)
  • Checkout conversion improvement: 3-5% (every 100ms matters)
  • For a $3M/month business: $90K-150K annual revenue from faster checkout
  • Service failures no longer cascade: email down ≠ orders down
  • New integration time: 2 days → 4 hours (no coordination needed)

Choosing Your Event Infrastructure

You don't need Kafka. Seriously. Here's the decision tree:

Option 1: Database-Backed Queue (Simplest)

Best for: Teams of 1-5, < 1000 events/minute

CREATE TABLE events (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  type VARCHAR(255) NOT NULL,
  payload JSONB NOT NULL,
  status VARCHAR(50) DEFAULT 'pending',
  created_at TIMESTAMP DEFAULT NOW(),
  processed_at TIMESTAMP,
  retry_count INT DEFAULT 0,
  error TEXT
);
 
CREATE INDEX idx_events_pending 
  ON events (status, created_at) 
  WHERE status = 'pending';

A cron job or worker process polls for pending events and processes them. Simple, reliable, and you can inspect the queue with a SQL query.

Option 2: Redis + BullMQ (Sweet Spot)

Best for: Teams of 3-15, 1000-50,000 events/minute

import { Queue, Worker } from "bullmq";
 
// Producer
const orderQueue = new Queue("order-events");
 
await orderQueue.add("order.placed", {
  orderId: order.id,
  total: order.total,
}, {
  attempts: 3,
  backoff: { type: "exponential", delay: 1000 },
});
 
// Consumer
const worker = new Worker("order-events", async (job) => {
  switch (job.name) {
    case "order.placed":
      await handleOrderPlaced(job.data);
      break;
  }
}, { concurrency: 10 });

Built-in retries, rate limiting, delayed jobs, and a dashboard (Bull Board). Handles most startup-scale workloads.

Option 3: SQS/SNS or Cloud Pub/Sub (Managed)

Best for: Teams that don't want to manage infrastructure, 10,000+ events/minute

Option 4: Kafka (You Probably Don't Need This)

Best for: Teams of 20+, 100,000+ events/minute, need event replay and stream processing

Real cost of premature Kafka: One startup with 8 engineers and 5,000 events/minute chose Kafka because "it scales." They spent 4 months getting it production-ready (learning, tuning, monitoring). Engineering cost: $120K. Ongoing ops burden: 1 engineer spending 30% of their time on Kafka ops. When we helped them migrate to BullMQ, the migration took 1 week and freed up that engineer entirely. Kafka was solving a problem they didn't have.

If you're reading this article, you don't need Kafka yet.

The Event Design Checklist

Rule 1: Events Are Facts, Not Commands

✅ "order.placed" (something happened)
❌ "send.email" (telling someone what to do)

✅ "payment.failed" (a fact)
❌ "retry.payment" (a command)

Events describe what happened. Commands tell someone what to do. Keep them separate.

Rule 2: Include Enough Data (But Not Too Much)

// Too little — every handler needs to fetch the order
{ type: "order.placed", orderId: "123" }
 
// Too much — includes data most handlers don't need
{ type: "order.placed", order: { /* entire order object */ } }
 
// Just right — key facts that most handlers need
{
  type: "order.placed",
  orderId: "123",
  customerId: "456",
  total: 99.00,
  itemCount: 3,
  isFirstOrder: true,
}

Rule 3: Make Events Idempotent

Handlers will sometimes process the same event twice (retries, at-least-once delivery). Design for it:

async function handleOrderPlaced(event: OrderPlacedEvent) {
  // Check if we already processed this
  const existing = await db.confirmationEmails.findOne({
    orderId: event.orderId,
  });
  if (existing) return; // Already sent, skip
 
  await sendEmail(event);
  await db.confirmationEmails.insert({ orderId: event.orderId });
}

Rule 4: Handle Failures Gracefully

// Retry strategy per handler
const HANDLER_CONFIG = {
  "sendConfirmationEmail": {
    maxRetries: 5,
    backoff: "exponential",
    deadLetterAfter: 5,    // Move to DLQ after 5 failures
  },
  "updateAnalytics": {
    maxRetries: 3,
    backoff: "linear",
    deadLetterAfter: 3,
  },
  "notifyWarehouse": {
    maxRetries: 10,         // Critical — try harder
    backoff: "exponential",
    deadLetterAfter: 10,
    alertAfter: 3,          // Page someone after 3 failures
  },
};

The Migration Path

You don't rewrite everything at once. Extract one flow at a time:

Month 1: Pick your most painful synchronous flow
  → Usually: order processing or inventory sync
  → Add event emission alongside existing code
  → New handler processes events in parallel
  → Verify results match, then remove synchronous call

Month 2: Extract the next flow
  → Repeat for email notifications or CRM sync
  → Build monitoring for event lag and failures

Month 3+: Add new features as event handlers
  → New integrations are just new handlers
  → No changes to existing code

When Not to Use Events

Events aren't always the answer:

  • User-facing reads — don't make the user wait for eventual consistency
  • Simple CRUD — if it's a single database write, just write it
  • Strong consistency required — if systems must agree immediately (payments)
  • Team of 1 — the overhead might not be worth it yet

The Payoff

Event-driven architecture done right gives you:

  • Faster user-facing operations — only the critical path is synchronous
  • Resilient systems — one service being down doesn't cascade
  • Easy extensibility — new features = new handlers, no changes to existing code
  • Observable systems — events are a natural audit log
  • Testable components — each handler is independently testable

The ROI:

Synchronous architecture costs:
  - 60% slower checkout (2,100ms vs 850ms)
  - 3-5% lower conversion rate
  - Revenue lost to slow checkout: $90K-150K/year
  - Cascading failures during third-party outages
  - 2-3 days to add new integration

Event-driven architecture:
  - Implementation time: 2-4 weeks
  - Cost: $12K-20K in engineering
  - Ongoing cost: $50-200/month (queue infrastructure)
  - ROI: 4-7x in year one from faster checkout alone
  - Resilience benefit: Prevents 80% of cascading failures

Start simple. Use the database queue. Graduate to Redis when you need to. And save Kafka for the day you actually need it.

Previous
The Inventory Forecasting System That Stopped Our Client From Overselling
Next
Headless Commerce: When It's Worth It and When It's a Trap
Insights
Why You Should Start With a MonolithEvent-Driven Architecture for the Rest of UsThe Real Cost of Microservices at Your ScaleThe Caching Strategy That Cut Our Client's AWS Bill by 60%API Design Mistakes That Will Haunt You for YearsMulti-Tenant Architecture: The Decisions You Can't UndoCI/CD Pipelines That Actually Make You FasterThe Rate Limiting Strategy That Saved Our Client's APIWhen to Rewrite vs Refactor: The Decision Framework

Ready to Ship?

Let's talk about your engineering challenges and how we can help.

Book a Call