Event-Driven Architecture for the Rest of Us
You're Probably Already Event-Driven (Badly)
If your system sends an email when an order is placed, updates inventory when a shipment ships, or syncs data when a customer signs up — congratulations, you have event-driven architecture. It's just hidden inside synchronous API calls and cron jobs that are slowly becoming unmaintainable.
Let's make it intentional.
The Problem With Synchronous Everything
Here's what a typical order placement looks like in a "synchronous-first" system:
// The monolithic order handler that does everything
async function placeOrder(orderData: OrderInput) {
const order = await createOrder(orderData); // 50ms
await chargePayment(order); // 800ms
await updateInventory(order); // 100ms
await sendConfirmationEmail(order); // 300ms
await notifyWarehouse(order); // 200ms
await updateAnalytics(order); // 150ms
await syncToCRM(order); // 400ms
await triggerLoyaltyPoints(order); // 100ms
return order; // Total: ~2100ms
}Problems with this approach:
- Customer waits 2+ seconds for all downstream operations
- If email service is down, the entire order fails
- Adding a new integration means modifying the order handler
- Testing requires mocking 7 external services
- One slow service slows down everything
Why this costs you: One client had synchronous order processing with 8 downstream integrations. When their CRM provider had a 90-second timeout issue, every order took 90+ seconds to complete. Checkout abandonment spiked from 68% to 89% during the 45-minute incident. Orders lost: 680. Revenue impact: $67K. The CRM sync wasn't even customer-facing — it was internal analytics that could have been async.
The Event-Driven Version
// The order handler does ONE thing: create the order and emit an event
async function placeOrder(orderData: OrderInput) {
const order = await createOrder(orderData);
await chargePayment(order);
// Emit the event — everything else happens asynchronously
await eventBus.emit("order.placed", {
orderId: order.id,
customerId: order.customerId,
total: order.total,
items: order.items,
});
return order; // Total: ~850ms
}
// Separate handlers that react to the event
eventBus.on("order.placed", sendConfirmationEmail);
eventBus.on("order.placed", updateInventory);
eventBus.on("order.placed", notifyWarehouse);
eventBus.on("order.placed", updateAnalytics);
eventBus.on("order.placed", syncToCRM);
eventBus.on("order.placed", triggerLoyaltyPoints);What changed:
- Customer waits < 1 second (only payment + order creation)
- If email is down, the order still succeeds — email retries later
- Adding a new integration = adding a new handler (no changes to order code)
- Each handler is testable in isolation
- Slow services don't block the critical path
The impact:
- Order completion time: 2,100ms → 850ms (60% faster)
- Checkout conversion improvement: 3-5% (every 100ms matters)
- For a $3M/month business: $90K-150K annual revenue from faster checkout
- Service failures no longer cascade: email down ≠ orders down
- New integration time: 2 days → 4 hours (no coordination needed)
Choosing Your Event Infrastructure
You don't need Kafka. Seriously. Here's the decision tree:
Option 1: Database-Backed Queue (Simplest)
Best for: Teams of 1-5, < 1000 events/minute
CREATE TABLE events (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
type VARCHAR(255) NOT NULL,
payload JSONB NOT NULL,
status VARCHAR(50) DEFAULT 'pending',
created_at TIMESTAMP DEFAULT NOW(),
processed_at TIMESTAMP,
retry_count INT DEFAULT 0,
error TEXT
);
CREATE INDEX idx_events_pending
ON events (status, created_at)
WHERE status = 'pending';A cron job or worker process polls for pending events and processes them. Simple, reliable, and you can inspect the queue with a SQL query.
Option 2: Redis + BullMQ (Sweet Spot)
Best for: Teams of 3-15, 1000-50,000 events/minute
import { Queue, Worker } from "bullmq";
// Producer
const orderQueue = new Queue("order-events");
await orderQueue.add("order.placed", {
orderId: order.id,
total: order.total,
}, {
attempts: 3,
backoff: { type: "exponential", delay: 1000 },
});
// Consumer
const worker = new Worker("order-events", async (job) => {
switch (job.name) {
case "order.placed":
await handleOrderPlaced(job.data);
break;
}
}, { concurrency: 10 });Built-in retries, rate limiting, delayed jobs, and a dashboard (Bull Board). Handles most startup-scale workloads.
Option 3: SQS/SNS or Cloud Pub/Sub (Managed)
Best for: Teams that don't want to manage infrastructure, 10,000+ events/minute
Option 4: Kafka (You Probably Don't Need This)
Best for: Teams of 20+, 100,000+ events/minute, need event replay and stream processing
Real cost of premature Kafka: One startup with 8 engineers and 5,000 events/minute chose Kafka because "it scales." They spent 4 months getting it production-ready (learning, tuning, monitoring). Engineering cost: $120K. Ongoing ops burden: 1 engineer spending 30% of their time on Kafka ops. When we helped them migrate to BullMQ, the migration took 1 week and freed up that engineer entirely. Kafka was solving a problem they didn't have.
If you're reading this article, you don't need Kafka yet.
The Event Design Checklist
Rule 1: Events Are Facts, Not Commands
✅ "order.placed" (something happened)
❌ "send.email" (telling someone what to do)
✅ "payment.failed" (a fact)
❌ "retry.payment" (a command)
Events describe what happened. Commands tell someone what to do. Keep them separate.
Rule 2: Include Enough Data (But Not Too Much)
// Too little — every handler needs to fetch the order
{ type: "order.placed", orderId: "123" }
// Too much — includes data most handlers don't need
{ type: "order.placed", order: { /* entire order object */ } }
// Just right — key facts that most handlers need
{
type: "order.placed",
orderId: "123",
customerId: "456",
total: 99.00,
itemCount: 3,
isFirstOrder: true,
}Rule 3: Make Events Idempotent
Handlers will sometimes process the same event twice (retries, at-least-once delivery). Design for it:
async function handleOrderPlaced(event: OrderPlacedEvent) {
// Check if we already processed this
const existing = await db.confirmationEmails.findOne({
orderId: event.orderId,
});
if (existing) return; // Already sent, skip
await sendEmail(event);
await db.confirmationEmails.insert({ orderId: event.orderId });
}Rule 4: Handle Failures Gracefully
// Retry strategy per handler
const HANDLER_CONFIG = {
"sendConfirmationEmail": {
maxRetries: 5,
backoff: "exponential",
deadLetterAfter: 5, // Move to DLQ after 5 failures
},
"updateAnalytics": {
maxRetries: 3,
backoff: "linear",
deadLetterAfter: 3,
},
"notifyWarehouse": {
maxRetries: 10, // Critical — try harder
backoff: "exponential",
deadLetterAfter: 10,
alertAfter: 3, // Page someone after 3 failures
},
};The Migration Path
You don't rewrite everything at once. Extract one flow at a time:
Month 1: Pick your most painful synchronous flow
→ Usually: order processing or inventory sync
→ Add event emission alongside existing code
→ New handler processes events in parallel
→ Verify results match, then remove synchronous call
Month 2: Extract the next flow
→ Repeat for email notifications or CRM sync
→ Build monitoring for event lag and failures
Month 3+: Add new features as event handlers
→ New integrations are just new handlers
→ No changes to existing code
When Not to Use Events
Events aren't always the answer:
- User-facing reads — don't make the user wait for eventual consistency
- Simple CRUD — if it's a single database write, just write it
- Strong consistency required — if systems must agree immediately (payments)
- Team of 1 — the overhead might not be worth it yet
The Payoff
Event-driven architecture done right gives you:
- Faster user-facing operations — only the critical path is synchronous
- Resilient systems — one service being down doesn't cascade
- Easy extensibility — new features = new handlers, no changes to existing code
- Observable systems — events are a natural audit log
- Testable components — each handler is independently testable
The ROI:
Synchronous architecture costs:
- 60% slower checkout (2,100ms vs 850ms)
- 3-5% lower conversion rate
- Revenue lost to slow checkout: $90K-150K/year
- Cascading failures during third-party outages
- 2-3 days to add new integration
Event-driven architecture:
- Implementation time: 2-4 weeks
- Cost: $12K-20K in engineering
- Ongoing cost: $50-200/month (queue infrastructure)
- ROI: 4-7x in year one from faster checkout alone
- Resilience benefit: Prevents 80% of cascading failures
Start simple. Use the database queue. Graduate to Redis when you need to. And save Kafka for the day you actually need it.