Event-Driven Architecture for the Rest of Us

January 6, 2026·ScaledByDesign·

architectureeventsbackendscaling

You're Probably Already Event-Driven (Badly)

If your system sends an email when an order is placed, updates inventory when a shipment ships, or syncs data when a customer signs up — congratulations, you have event-driven architecture. It's just hidden inside synchronous API calls and cron jobs that are slowly becoming unmaintainable.

Let's make it intentional.

The Problem With Synchronous Everything

Here's what a typical order placement looks like in a "synchronous-first" system:

// The monolithic order handler that does everything
async function placeOrder(orderData: OrderInput) {
  const order = await createOrder(orderData);        // 50ms
  await chargePayment(order);                         // 800ms
  await updateInventory(order);                       // 100ms
  await sendConfirmationEmail(order);                 // 300ms
  await notifyWarehouse(order);                       // 200ms
  await updateAnalytics(order);                       // 150ms
  await syncToCRM(order);                             // 400ms
  await triggerLoyaltyPoints(order);                  // 100ms
  return order; // Total: ~2100ms
}

Problems with this approach:

Customer waits 2+ seconds for all downstream operations
If email service is down, the entire order fails
Adding a new integration means modifying the order handler
Testing requires mocking 7 external services
One slow service slows down everything

Why this costs you: One client had synchronous order processing with 8 downstream integrations. When their CRM provider had a 90-second timeout issue, every order took 90+ seconds to complete. Checkout abandonment spiked from 68% to 89% during the 45-minute incident. Orders lost: 680. Revenue impact: $67K. The CRM sync wasn't even customer-facing — it was internal analytics that could have been async.

The Event-Driven Version

// The order handler does ONE thing: create the order and emit an event
async function placeOrder(orderData: OrderInput) {
  const order = await createOrder(orderData);
  await chargePayment(order);
  
  // Emit the event — everything else happens asynchronously
  await eventBus.emit("order.placed", {
    orderId: order.id,
    customerId: order.customerId,
    total: order.total,
    items: order.items,
  });
 
  return order; // Total: ~850ms
}
 
// Separate handlers that react to the event
eventBus.on("order.placed", sendConfirmationEmail);
eventBus.on("order.placed", updateInventory);
eventBus.on("order.placed", notifyWarehouse);
eventBus.on("order.placed", updateAnalytics);
eventBus.on("order.placed", syncToCRM);
eventBus.on("order.placed", triggerLoyaltyPoints);

What changed:

Customer waits < 1 second (only payment + order creation)
If email is down, the order still succeeds — email retries later
Adding a new integration = adding a new handler (no changes to order code)
Each handler is testable in isolation
Slow services don't block the critical path

The impact:

Order completion time: 2,100ms → 850ms (60% faster)
Checkout conversion improvement: 3-5% (every 100ms matters)
For a $3M/month business: $90K-150K annual revenue from faster checkout
Service failures no longer cascade: email down ≠ orders down
New integration time: 2 days → 4 hours (no coordination needed)

Choosing Your Event Infrastructure

You don't need Kafka. Seriously. Here's the decision tree:

Option 1: Database-Backed Queue (Simplest)

Best for: Teams of 1-5, < 1000 events/minute

CREATE TABLE events (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  type VARCHAR(255) NOT NULL,
  payload JSONB NOT NULL,
  status VARCHAR(50) DEFAULT 'pending',
  created_at TIMESTAMP DEFAULT NOW(),
  processed_at TIMESTAMP,
  retry_count INT DEFAULT 0,
  error TEXT
);
 
CREATE INDEX idx_events_pending 
  ON events (status, created_at) 
  WHERE status = 'pending';

A cron job or worker process polls for pending events and processes them. Simple, reliable, and you can inspect the queue with a SQL query.

Option 2: Redis + BullMQ (Sweet Spot)

Best for: Teams of 3-15, 1000-50,000 events/minute

import { Queue, Worker } from "bullmq";
 
// Producer
const orderQueue = new Queue("order-events");
 
await orderQueue.add("order.placed", {
  orderId: order.id,
  total: order.total,
}, {
  attempts: 3,
  backoff: { type: "exponential", delay: 1000 },
});
 
// Consumer
const worker = new Worker("order-events", async (job) => {
  switch (job.name) {
    case "order.placed":
      await handleOrderPlaced(job.data);
      break;
  }
}, { concurrency: 10 });

Built-in retries, rate limiting, delayed jobs, and a dashboard (Bull Board). Handles most startup-scale workloads.

Option 3: SQS/SNS or Cloud Pub/Sub (Managed)

Best for: Teams that don't want to manage infrastructure, 10,000+ events/minute

Option 4: Kafka (You Probably Don't Need This)

Best for: Teams of 20+, 100,000+ events/minute, need event replay and stream processing

Real cost of premature Kafka: One startup with 8 engineers and 5,000 events/minute chose Kafka because "it scales." They spent 4 months getting it production-ready (learning, tuning, monitoring). Engineering cost: $120K. Ongoing ops burden: 1 engineer spending 30% of their time on Kafka ops. When we helped them migrate to BullMQ, the migration took 1 week and freed up that engineer entirely. Kafka was solving a problem they didn't have.

If you're reading this article, you don't need Kafka yet.

The Event Design Checklist

Rule 1: Events Are Facts, Not Commands

✅ "order.placed" (something happened)
❌ "send.email" (telling someone what to do)

✅ "payment.failed" (a fact)
❌ "retry.payment" (a command)

Events describe what happened. Commands tell someone what to do. Keep them separate.

Rule 2: Include Enough Data (But Not Too Much)

// Too little — every handler needs to fetch the order
{ type: "order.placed", orderId: "123" }
 
// Too much — includes data most handlers don't need
{ type: "order.placed", order: { /* entire order object */ } }
 
// Just right — key facts that most handlers need
{
  type: "order.placed",
  orderId: "123",
  customerId: "456",
  total: 99.00,
  itemCount: 3,
  isFirstOrder: true,
}

Rule 3: Make Events Idempotent

Handlers will sometimes process the same event twice (retries, at-least-once delivery). Design for it:

async function handleOrderPlaced(event: OrderPlacedEvent) {
  // Check if we already processed this
  const existing = await db.confirmationEmails.findOne({
    orderId: event.orderId,
  });
  if (existing) return; // Already sent, skip
 
  await sendEmail(event);
  await db.confirmationEmails.insert({ orderId: event.orderId });
}

Rule 4: Handle Failures Gracefully

// Retry strategy per handler
const HANDLER_CONFIG = {
  "sendConfirmationEmail": {
    maxRetries: 5,
    backoff: "exponential",
    deadLetterAfter: 5,    // Move to DLQ after 5 failures
  },
  "updateAnalytics": {
    maxRetries: 3,
    backoff: "linear",
    deadLetterAfter: 3,
  },
  "notifyWarehouse": {
    maxRetries: 10,         // Critical — try harder
    backoff: "exponential",
    deadLetterAfter: 10,
    alertAfter: 3,          // Page someone after 3 failures
  },
};

The Migration Path

You don't rewrite everything at once. Extract one flow at a time:

Month 1: Pick your most painful synchronous flow
  → Usually: order processing or inventory sync
  → Add event emission alongside existing code
  → New handler processes events in parallel
  → Verify results match, then remove synchronous call

Month 2: Extract the next flow
  → Repeat for email notifications or CRM sync
  → Build monitoring for event lag and failures

Month 3+: Add new features as event handlers
  → New integrations are just new handlers
  → No changes to existing code

When Not to Use Events

Events aren't always the answer:

User-facing reads — don't make the user wait for eventual consistency
Simple CRUD — if it's a single database write, just write it
Strong consistency required — if systems must agree immediately (payments)
Team of 1 — the overhead might not be worth it yet

The Payoff

Event-driven architecture done right gives you:

Faster user-facing operations — only the critical path is synchronous
Resilient systems — one service being down doesn't cascade
Easy extensibility — new features = new handlers, no changes to existing code
Observable systems — events are a natural audit log
Testable components — each handler is independently testable

The ROI:

Synchronous architecture costs:
  - 60% slower checkout (2,100ms vs 850ms)
  - 3-5% lower conversion rate
  - Revenue lost to slow checkout: $90K-150K/year
  - Cascading failures during third-party outages
  - 2-3 days to add new integration

Event-driven architecture:
  - Implementation time: 2-4 weeks
  - Cost: $12K-20K in engineering
  - Ongoing cost: $50-200/month (queue infrastructure)
  - ROI: 4-7x in year one from faster checkout alone
  - Resilience benefit: Prevents 80% of cascading failures

Start simple. Use the database queue. Graduate to Redis when you need to. And save Kafka for the day you actually need it.

The Inventory Forecasting System That Stopped Our Client From Overselling

Headless Commerce: When It's Worth It and When It's a Trap

Event-Driven Architecture for the Rest of Us

January 6, 2026·ScaledByDesign·

architectureeventsbackendscaling

You're Probably Already Event-Driven (Badly)

Let's make it intentional.

The Problem With Synchronous Everything

Here's what a typical order placement looks like in a "synchronous-first" system:

// The monolithic order handler that does everything
async function placeOrder(orderData: OrderInput) {
  const order = await createOrder(orderData);        // 50ms
  await chargePayment(order);                         // 800ms
  await updateInventory(order);                       // 100ms
  await sendConfirmationEmail(order);                 // 300ms
  await notifyWarehouse(order);                       // 200ms
  await updateAnalytics(order);                       // 150ms
  await syncToCRM(order);                             // 400ms
  await triggerLoyaltyPoints(order);                  // 100ms
  return order; // Total: ~2100ms
}

Problems with this approach:

Customer waits 2+ seconds for all downstream operations
If email service is down, the entire order fails
Adding a new integration means modifying the order handler
Testing requires mocking 7 external services
One slow service slows down everything

The Event-Driven Version

// The order handler does ONE thing: create the order and emit an event
async function placeOrder(orderData: OrderInput) {
  const order = await createOrder(orderData);
  await chargePayment(order);
  
  // Emit the event — everything else happens asynchronously
  await eventBus.emit("order.placed", {
    orderId: order.id,
    customerId: order.customerId,
    total: order.total,
    items: order.items,
  });
 
  return order; // Total: ~850ms
}
 
// Separate handlers that react to the event
eventBus.on("order.placed", sendConfirmationEmail);
eventBus.on("order.placed", updateInventory);
eventBus.on("order.placed", notifyWarehouse);
eventBus.on("order.placed", updateAnalytics);
eventBus.on("order.placed", syncToCRM);
eventBus.on("order.placed", triggerLoyaltyPoints);

What changed:

Customer waits < 1 second (only payment + order creation)
If email is down, the order still succeeds — email retries later
Adding a new integration = adding a new handler (no changes to order code)
Each handler is testable in isolation
Slow services don't block the critical path

The impact:

Order completion time: 2,100ms → 850ms (60% faster)
Checkout conversion improvement: 3-5% (every 100ms matters)
For a $3M/month business: $90K-150K annual revenue from faster checkout
Service failures no longer cascade: email down ≠ orders down
New integration time: 2 days → 4 hours (no coordination needed)

Choosing Your Event Infrastructure

You don't need Kafka. Seriously. Here's the decision tree:

Option 1: Database-Backed Queue (Simplest)

Best for: Teams of 1-5, < 1000 events/minute

CREATE TABLE events (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  type VARCHAR(255) NOT NULL,
  payload JSONB NOT NULL,
  status VARCHAR(50) DEFAULT 'pending',
  created_at TIMESTAMP DEFAULT NOW(),
  processed_at TIMESTAMP,
  retry_count INT DEFAULT 0,
  error TEXT
);
 
CREATE INDEX idx_events_pending 
  ON events (status, created_at) 
  WHERE status = 'pending';

A cron job or worker process polls for pending events and processes them. Simple, reliable, and you can inspect the queue with a SQL query.

Option 2: Redis + BullMQ (Sweet Spot)

Best for: Teams of 3-15, 1000-50,000 events/minute

import { Queue, Worker } from "bullmq";
 
// Producer
const orderQueue = new Queue("order-events");
 
await orderQueue.add("order.placed", {
  orderId: order.id,
  total: order.total,
}, {
  attempts: 3,
  backoff: { type: "exponential", delay: 1000 },
});
 
// Consumer
const worker = new Worker("order-events", async (job) => {
  switch (job.name) {
    case "order.placed":
      await handleOrderPlaced(job.data);
      break;
  }
}, { concurrency: 10 });

Built-in retries, rate limiting, delayed jobs, and a dashboard (Bull Board). Handles most startup-scale workloads.

Option 3: SQS/SNS or Cloud Pub/Sub (Managed)

Best for: Teams that don't want to manage infrastructure, 10,000+ events/minute

Option 4: Kafka (You Probably Don't Need This)

Best for: Teams of 20+, 100,000+ events/minute, need event replay and stream processing

If you're reading this article, you don't need Kafka yet.

The Event Design Checklist

Rule 1: Events Are Facts, Not Commands

✅ "order.placed" (something happened)
❌ "send.email" (telling someone what to do)

✅ "payment.failed" (a fact)
❌ "retry.payment" (a command)

Events describe what happened. Commands tell someone what to do. Keep them separate.

Rule 2: Include Enough Data (But Not Too Much)

// Too little — every handler needs to fetch the order
{ type: "order.placed", orderId: "123" }
 
// Too much — includes data most handlers don't need
{ type: "order.placed", order: { /* entire order object */ } }
 
// Just right — key facts that most handlers need
{
  type: "order.placed",
  orderId: "123",
  customerId: "456",
  total: 99.00,
  itemCount: 3,
  isFirstOrder: true,
}

Rule 3: Make Events Idempotent

Handlers will sometimes process the same event twice (retries, at-least-once delivery). Design for it:

async function handleOrderPlaced(event: OrderPlacedEvent) {
  // Check if we already processed this
  const existing = await db.confirmationEmails.findOne({
    orderId: event.orderId,
  });
  if (existing) return; // Already sent, skip
 
  await sendEmail(event);
  await db.confirmationEmails.insert({ orderId: event.orderId });
}

Rule 4: Handle Failures Gracefully

// Retry strategy per handler
const HANDLER_CONFIG = {
  "sendConfirmationEmail": {
    maxRetries: 5,
    backoff: "exponential",
    deadLetterAfter: 5,    // Move to DLQ after 5 failures
  },
  "updateAnalytics": {
    maxRetries: 3,
    backoff: "linear",
    deadLetterAfter: 3,
  },
  "notifyWarehouse": {
    maxRetries: 10,         // Critical — try harder
    backoff: "exponential",
    deadLetterAfter: 10,
    alertAfter: 3,          // Page someone after 3 failures
  },
};

The Migration Path

You don't rewrite everything at once. Extract one flow at a time:

Month 1: Pick your most painful synchronous flow
  → Usually: order processing or inventory sync
  → Add event emission alongside existing code
  → New handler processes events in parallel
  → Verify results match, then remove synchronous call

Month 2: Extract the next flow
  → Repeat for email notifications or CRM sync
  → Build monitoring for event lag and failures

Month 3+: Add new features as event handlers
  → New integrations are just new handlers
  → No changes to existing code

When Not to Use Events

Events aren't always the answer:

User-facing reads — don't make the user wait for eventual consistency
Simple CRUD — if it's a single database write, just write it
Strong consistency required — if systems must agree immediately (payments)
Team of 1 — the overhead might not be worth it yet

The Payoff

Event-driven architecture done right gives you:

Faster user-facing operations — only the critical path is synchronous
Resilient systems — one service being down doesn't cascade
Easy extensibility — new features = new handlers, no changes to existing code
Observable systems — events are a natural audit log
Testable components — each handler is independently testable

The ROI:

Synchronous architecture costs:
  - 60% slower checkout (2,100ms vs 850ms)
  - 3-5% lower conversion rate
  - Revenue lost to slow checkout: $90K-150K/year
  - Cascading failures during third-party outages
  - 2-3 days to add new integration

Event-driven architecture:
  - Implementation time: 2-4 weeks
  - Cost: $12K-20K in engineering
  - Ongoing cost: $50-200/month (queue infrastructure)
  - ROI: 4-7x in year one from faster checkout alone
  - Resilience benefit: Prevents 80% of cascading failures

Start simple. Use the database queue. Graduate to Redis when you need to. And save Kafka for the day you actually need it.

The Inventory Forecasting System That Stopped Our Client From Overselling

Headless Commerce: When It's Worth It and When It's a Trap

Event-Driven Architecture for the Rest of Us

You're Probably Already Event-Driven (Badly)

The Problem With Synchronous Everything

The Event-Driven Version

Choosing Your Event Infrastructure

Option 1: Database-Backed Queue (Simplest)

Option 2: Redis + BullMQ (Sweet Spot)

Option 3: SQS/SNS or Cloud Pub/Sub (Managed)

Option 4: Kafka (You Probably Don't Need This)

The Event Design Checklist

Rule 1: Events Are Facts, Not Commands

Rule 2: Include Enough Data (But Not Too Much)

Rule 3: Make Events Idempotent

Rule 4: Handle Failures Gracefully

The Migration Path

When Not to Use Events

The Payoff

Ready to Ship?

Event-Driven Architecture for the Rest of Us

You're Probably Already Event-Driven (Badly)

The Problem With Synchronous Everything

The Event-Driven Version

Choosing Your Event Infrastructure

Option 1: Database-Backed Queue (Simplest)

Option 2: Redis + BullMQ (Sweet Spot)

Option 3: SQS/SNS or Cloud Pub/Sub (Managed)

Option 4: Kafka (You Probably Don't Need This)

The Event Design Checklist

Rule 1: Events Are Facts, Not Commands

Rule 2: Include Enough Data (But Not Too Much)

Rule 3: Make Events Idempotent

Rule 4: Handle Failures Gracefully

The Migration Path

When Not to Use Events

The Payoff

Ready to Ship?