ScaledByDesign/Insights
ServicesPricingAboutContact
Book a Call
Scaled By Design

Fractional CTO + execution partner for revenue-critical systems.

Company

  • About
  • Services
  • Contact

Resources

  • Insights
  • Pricing
  • FAQ

Legal

  • Privacy Policy
  • Terms of Service

© 2026 ScaledByDesign. All rights reserved.

contact@scaledbydesign.com

On This Page

The Notebook-to-Production GapThe Pipeline ArchitectureStep 1: Idempotent IngestionStep 2: Schema ValidationStep 3: Batched AI InferenceStep 4: Output ValidationMonitoring and Alerting
  1. Insights
  2. AI & Automation
  3. Building AI Pipelines That Don't Break at Scale

Building AI Pipelines That Don't Break at Scale

June 17, 2026·ScaledByDesign·
aidata-pipelinemlopsproductionengineering

The Notebook-to-Production Gap

A data scientist built an AI pipeline in a Jupyter notebook. It ingested customer data, generated embeddings, clustered users into segments, and produced recommendations. It worked beautifully on 10,000 records. In production with 2 million records, it ran out of memory. When an API call failed mid-pipeline, there was no retry logic. When the model produced bad outputs, nobody knew for hours.

The gap between "works in a notebook" and "works in production" is where most AI projects die.

The Pipeline Architecture

┌─────────────┐    ┌──────────────┐    ┌───────────────┐
│  Data        │───▶│  Transform   │───▶│  AI Model     │
│  Ingestion   │    │  & Validate  │    │  Inference     │
└─────────────┘    └──────────────┘    └───────────────┘
       │                   │                    │
       ▼                   ▼                    ▼
  ┌─────────┐       ┌──────────┐        ┌───────────┐
  │  Dead   │       │  Schema  │        │  Output   │
  │  Letter │       │  Errors  │        │  Validation│
  │  Queue  │       │  Log     │        │  & Store  │
  └─────────┘       └──────────┘        └───────────┘

Step 1: Idempotent Ingestion

// Every pipeline step should be safely re-runnable
async function ingestBatch(batchId: string, source: DataSource) {
  // Check if this batch was already processed
  const existing = await db.pipelineRuns.findUnique({
    where: { batchId_step: { batchId, step: "ingest" } },
  });
  
  if (existing?.status === "completed") {
    console.log(`Batch ${batchId} already ingested, skipping`);
    return existing.outputPath;
  }
 
  // Mark as in-progress
  await db.pipelineRuns.upsert({
    where: { batchId_step: { batchId, step: "ingest" } },
    create: { batchId, step: "ingest", status: "in_progress", startedAt: new Date() },
    update: { status: "in_progress", startedAt: new Date() },
  });
 
  try {
    const data = await source.fetchBatch(batchId);
    const outputPath = await storage.write(`ingested/${batchId}.parquet`, data);
    
    await db.pipelineRuns.update({
      where: { batchId_step: { batchId, step: "ingest" } },
      data: { status: "completed", outputPath, completedAt: new Date() },
    });
    
    return outputPath;
  } catch (error) {
    await db.pipelineRuns.update({
      where: { batchId_step: { batchId, step: "ingest" } },
      data: { status: "failed", error: error.message },
    });
    throw error;
  }
}

Step 2: Schema Validation

import { z } from "zod";
 
const CustomerRecordSchema = z.object({
  id: z.string().uuid(),
  email: z.string().email(),
  purchaseHistory: z.array(z.object({
    productId: z.string(),
    amount: z.number().positive(),
    date: z.string().datetime(),
  })),
  createdAt: z.string().datetime(),
});
 
async function validateBatch(records: unknown[]): Promise<ValidationResult> {
  const valid: CustomerRecord[] = [];
  const invalid: { record: unknown; errors: string[] }[] = [];
 
  for (const record of records) {
    const result = CustomerRecordSchema.safeParse(record);
    if (result.success) {
      valid.push(result.data);
    } else {
      invalid.push({
        record,
        errors: result.error.errors.map(e => `${e.path.join(".")}: ${e.message}`),
      });
    }
  }
 
  // Alert if error rate exceeds threshold
  const errorRate = invalid.length / records.length;
  if (errorRate > 0.05) {
    await alerting.send({
      severity: "warning",
      message: `Validation error rate ${(errorRate * 100).toFixed(1)}% exceeds 5% threshold`,
      details: { totalRecords: records.length, invalidCount: invalid.length },
    });
  }
 
  return { valid, invalid, errorRate };
}

Step 3: Batched AI Inference

// Process in chunks to avoid OOM and enable progress tracking
async function runInference(
  records: CustomerRecord[],
  batchSize: number = 100
): Promise<InferenceResult[]> {
  const results: InferenceResult[] = [];
  const totalBatches = Math.ceil(records.length / batchSize);
 
  for (let i = 0; i < records.length; i += batchSize) {
    const batch = records.slice(i, i + batchSize);
    const batchNum = Math.floor(i / batchSize) + 1;
 
    try {
      const batchResults = await withRetry(
        () => aiModel.predict(batch),
        { maxRetries: 3, backoff: "exponential", initialDelay: 1000 }
      );
 
      // Validate model outputs
      for (const result of batchResults) {
        if (!isValidOutput(result)) {
          await deadLetterQueue.push({ record: batch[result.index], reason: "invalid_output" });
          continue;
        }
        results.push(result);
      }
 
      // Progress tracking
      await updateProgress(batchNum, totalBatches);
    } catch (error) {
      // Log failed batch, continue with remaining
      await deadLetterQueue.pushBatch(
        batch.map(r => ({ record: r, reason: "inference_failed", error: error.message }))
      );
    }
  }
 
  return results;
}

Step 4: Output Validation

// Don't blindly trust model outputs
function validateRecommendations(
  recommendations: Recommendation[],
  constraints: BusinessRules
): ValidatedRecommendation[] {
  return recommendations.filter(rec => {
    // Business rule: don't recommend out-of-stock products
    if (!constraints.inStockProducts.has(rec.productId)) return false;
    
    // Business rule: confidence threshold
    if (rec.confidence < constraints.minConfidence) return false;
    
    // Business rule: don't recommend products customer already bought
    if (constraints.purchasedProducts.has(rec.productId)) return false;
    
    // Sanity check: price should be within reasonable range
    if (rec.predictedValue < 0 || rec.predictedValue > 10000) {
      logger.warn("Suspicious predicted value", { rec });
      return false;
    }
    
    return true;
  });
}

Monitoring and Alerting

What to monitor in AI pipelines:

Input quality:
  → Schema validation error rate (alert if > 5%)
  → Data freshness (alert if source data is stale)
  → Record count anomalies (alert if ±30% from expected)

Model performance:
  → Inference latency (p50, p95, p99)
  → Prediction confidence distribution
  → Output distribution shift (are outputs changing unexpectedly?)
  → Error rate per batch

Pipeline health:
  → End-to-end runtime
  → Step-level success/failure rates
  → Dead letter queue size (growing = problem)
  → Memory and CPU utilization per step

AI pipelines aren't special — they're data pipelines with a model in the middle. Apply the same engineering discipline you'd use for any production system: idempotency, validation, retries, monitoring, and graceful failure handling. The model is the easy part. The pipeline around it is what makes it production-ready.

Previous
Technical Interviews That Actually Predict Job Performance
Insights
Building AI Pipelines That Don't Break at ScaleA Practical Guide to AI Embeddings — Beyond the HypeFine-Tuning vs. RAG — The Decision Framework for Production AIAI Agent Tool Calling Patterns That Actually Work in ProductionRAG Pipeline Optimization — From 8s to 400msLLM Cost Optimization — How We Cut a Client's AI Bill by 73%AI Hallucination Detection in Production — What Actually WorksWe Built an AI Code Review Bot — Here's What It Actually Catches (And What It Misses)Prompt Engineering Is Dead — Context Engineering Is What MattersYour AI Agent Isn't Working Because You Skipped the GuardrailsRAG vs Fine-Tuning: When to Use What in ProductionHow to Cut Your LLM Costs by 70% Without Losing QualityThe AI Implementation Playbook for Non-Technical FoundersWhy Most AI Chatbots Fail (And What Production-Grade Looks Like)Building AI Agents That Know When to Hand Off to HumansVibe Coding Is Destroying Your CodebaseAI Won't Fix Your Broken Data Pipeline

Ready to Ship?

Let's talk about your engineering challenges and how we can help.

Book a Call