ScaledByDesign/Insights
ServicesPricingAboutContact
Book a Call
Scaled By Design

Fractional CTO + execution partner for revenue-critical systems.

Company

  • About
  • Services
  • Contact

Resources

  • Insights
  • Pricing
  • FAQ

Legal

  • Privacy Policy
  • Terms of Service

© 2026 ScaledByDesign. All rights reserved.

contact@scaledbydesign.com

On This Page

The Temptation to BuildWhen to BuyWhen to BuildThe Hybrid ApproachOwn the Analysis LayerThe Assignment ArchitectureStatistical Rigor ChecklistThe Real Cost of Building
  1. Insights
  2. Split Testing & Tracking
  3. Experimentation Platforms — Build vs Buy Decision Framework

Experimentation Platforms — Build vs Buy Decision Framework

June 3, 2026·ScaledByDesign·
experimentationab-testingfeature-flagsbuild-vs-buyplatform

The Temptation to Build

Every engineering team that gets serious about experimentation eventually asks: "Why are we paying $50K/year for a platform when we could build this ourselves?" The answer is usually more nuanced than either camp admits.

A series B startup built their own experimentation platform. It took 3 engineers 4 months. It worked for simple A/B tests. Then they needed multivariate tests, statistical significance calculations, mutual exclusion groups, and integration with their data warehouse. Eighteen months later, they had 2 full-time engineers maintaining the platform instead of building product.

When to Buy

Buy an experimentation platform when:
  ✓ You run fewer than 50 experiments per quarter
  ✓ Your engineering team is under 30 people
  ✓ You need results quickly (weeks, not quarters to get started)
  ✓ You don't have a dedicated data science team
  ✓ Your experiments are primarily UI/UX changes
  ✓ You need compliance features (audit logs, approval workflows)

Platform comparison:
  LaunchDarkly:  Best for feature flags first, experiments second
  Optimizely:    Best for marketing/content teams running experiments
  Statsig:       Best for product-led growth with statistical rigor
  Eppo:          Best for warehouse-native experimentation
  GrowthBook:    Best open-source option (self-hosted)

When to Build

Build your own when:
  ✓ You run 100+ experiments per quarter
  ✓ You have a dedicated experimentation or data science team
  ✓ You need deep integration with proprietary data systems
  ✓ You need sub-millisecond assignment latency
  ✓ Your experiments involve backend algorithms, not just UI
  ✓ You're spending > $200K/year on a vendor platform

The Hybrid Approach

Most mature teams land on a hybrid — buy the SDK, own the analysis:

// Use a vendor SDK for assignment and bucketing
import { GrowthBook } from "@growthbook/growthbook";
 
const gb = new GrowthBook({
  apiHost: "https://cdn.growthbook.io",
  clientKey: process.env.GROWTHBOOK_KEY,
  trackingCallback: (experiment, result) => {
    // Send to YOUR data warehouse, not the vendor's
    analytics.track("experiment_viewed", {
      experimentId: experiment.key,
      variationId: result.variationId,
      userId: getCurrentUserId(),
      timestamp: Date.now(),
      sessionId: getSessionId(),
      // Add your own context
      cartValue: getCartTotal(),
      userSegment: getUserSegment(),
    });
  },
});

Own the Analysis Layer

-- Run your own analysis in your data warehouse
-- This gives you full control over metrics and methodology
 
WITH experiment_users AS (
  SELECT
    user_id,
    variation_id,
    MIN(timestamp) AS first_exposure
  FROM experiment_events
  WHERE experiment_id = 'checkout_redesign_v2'
  GROUP BY user_id, variation_id
),
conversions AS (
  SELECT
    eu.user_id,
    eu.variation_id,
    COALESCE(SUM(o.revenue), 0) AS revenue,
    COUNT(DISTINCT o.order_id) AS orders
  FROM experiment_users eu
  LEFT JOIN orders o ON eu.user_id = o.user_id
    AND o.created_at >= eu.first_exposure
    AND o.created_at <= eu.first_exposure + INTERVAL '14 days'
  GROUP BY eu.user_id, eu.variation_id
)
SELECT
  variation_id,
  COUNT(*) AS users,
  AVG(revenue) AS avg_revenue_per_user,
  SUM(orders)::float / COUNT(*) AS conversion_rate,
  STDDEV(revenue) AS revenue_stddev
FROM conversions
GROUP BY variation_id;

The Assignment Architecture

If you do build, the assignment layer is the critical piece:

// Deterministic assignment using hashing
// Same user always gets the same variation
function assignVariation(
  userId: string,
  experimentId: string,
  variations: string[],
  trafficAllocation: number = 1.0
): string | null {
  const hash = murmurhash3(`${experimentId}:${userId}`);
  const normalized = hash / 0xFFFFFFFF; // 0 to 1
 
  // Traffic allocation: only include a percentage of users
  if (normalized > trafficAllocation) return null;
 
  // Assign to variation
  const bucket = Math.floor((normalized / trafficAllocation) * variations.length);
  return variations[Math.min(bucket, variations.length - 1)];
}
 
// Mutual exclusion: prevent users from being in conflicting experiments
function checkMutualExclusion(
  userId: string,
  experimentId: string,
  exclusionGroups: Map<string, string[]>
): boolean {
  for (const [groupId, experiments] of exclusionGroups) {
    if (!experiments.includes(experimentId)) continue;
    
    // User is assigned to one experiment per exclusion group
    const groupHash = murmurhash3(`exclusion:${groupId}:${userId}`);
    const assignedExperiment = experiments[groupHash % experiments.length];
    
    if (assignedExperiment !== experimentId) return false;
  }
  return true;
}

Statistical Rigor Checklist

Whether you build or buy, ensure your platform handles:

  ✓ Sample size calculation (how long to run the test)
  ✓ Sequential testing (peeking without inflating false positives)
  ✓ Multiple comparison correction (testing many metrics)
  ✓ Novelty/primacy effects (results change over time)
  ✓ Sample Ratio Mismatch detection (uneven assignment = bug)
  ✓ Minimum Detectable Effect size (what's worth detecting?)

  Common mistakes:
  ✗ Stopping tests early when results "look significant"
  ✗ Running tests on segments after seeing overall results
  ✗ Not accounting for day-of-week effects
  ✗ Using overall conversion rate instead of per-user metrics

The Real Cost of Building

Year 1 costs of building an experimentation platform:
  3 engineers × 4 months = $200K in salary
  Infrastructure (feature flag service, analytics): $20K
  Opportunity cost (features not built): $300K+
  Total: ~$520K

Year 2+ ongoing costs:
  1-2 engineers maintaining and improving: $150-300K/year
  Infrastructure: $30K/year

Vendor platform costs:
  Year 1: $30-100K (depending on scale)
  Year 2+: $50-150K/year (grows with usage)

Break-even: Usually 3-4 years at scale (100+ experiments/quarter)

The right answer for most teams: buy a platform, own your analysis, and invest engineering time in building product features that need testing — not the testing infrastructure itself. Build only when experimentation is a core competency that differentiates your business.

Previous
Attribution Modeling Beyond Last-Click — What DTC Brands Actually Need
Insights
Experimentation Platforms — Build vs Buy Decision FrameworkAttribution Modeling Beyond Last-Click — What DTC Brands Actually NeedA/B Testing: Server-Side vs. Client-Side — The Technical Trade-offsThe GA4 Data Layer Implementation That E-Commerce Brands Actually NeedYour A/B Test Isn't Statistically Significant — Here's What to Do About ItServer-Side Tracking in a Cookieless World — The Implementation GuideYour Analytics Are Double-Counting Revenue — And Nobody NoticedA/B Testing Is Lying to You — Statistical Significance Isn't EnoughServer-Side Split Testing: Why Client-Side Tools Are Costing You RevenueThe Tracking Stack That Survives iOS, Ad Blockers, and Cookie DeathHow to Run Pricing Experiments Without Destroying TrustYour Conversion Rate Is a Vanity Metric — Here's What to Track InsteadBuilding a Feature Flag System That Doesn't Become Technical DebtThe Data Layer Architecture That Makes Every Test Trustworthy

Ready to Ship?

Let's talk about your engineering challenges and how we can help.

Book a Call