Experimentation Platforms — Build vs Buy Decision Framework
The Temptation to Build
Every engineering team that gets serious about experimentation eventually asks: "Why are we paying $50K/year for a platform when we could build this ourselves?" The answer is usually more nuanced than either camp admits.
A series B startup built their own experimentation platform. It took 3 engineers 4 months. It worked for simple A/B tests. Then they needed multivariate tests, statistical significance calculations, mutual exclusion groups, and integration with their data warehouse. Eighteen months later, they had 2 full-time engineers maintaining the platform instead of building product.
When to Buy
Buy an experimentation platform when:
✓ You run fewer than 50 experiments per quarter
✓ Your engineering team is under 30 people
✓ You need results quickly (weeks, not quarters to get started)
✓ You don't have a dedicated data science team
✓ Your experiments are primarily UI/UX changes
✓ You need compliance features (audit logs, approval workflows)
Platform comparison:
LaunchDarkly: Best for feature flags first, experiments second
Optimizely: Best for marketing/content teams running experiments
Statsig: Best for product-led growth with statistical rigor
Eppo: Best for warehouse-native experimentation
GrowthBook: Best open-source option (self-hosted)
When to Build
Build your own when:
✓ You run 100+ experiments per quarter
✓ You have a dedicated experimentation or data science team
✓ You need deep integration with proprietary data systems
✓ You need sub-millisecond assignment latency
✓ Your experiments involve backend algorithms, not just UI
✓ You're spending > $200K/year on a vendor platform
The Hybrid Approach
Most mature teams land on a hybrid — buy the SDK, own the analysis:
// Use a vendor SDK for assignment and bucketing
import { GrowthBook } from "@growthbook/growthbook";
const gb = new GrowthBook({
apiHost: "https://cdn.growthbook.io",
clientKey: process.env.GROWTHBOOK_KEY,
trackingCallback: (experiment, result) => {
// Send to YOUR data warehouse, not the vendor's
analytics.track("experiment_viewed", {
experimentId: experiment.key,
variationId: result.variationId,
userId: getCurrentUserId(),
timestamp: Date.now(),
sessionId: getSessionId(),
// Add your own context
cartValue: getCartTotal(),
userSegment: getUserSegment(),
});
},
});Own the Analysis Layer
-- Run your own analysis in your data warehouse
-- This gives you full control over metrics and methodology
WITH experiment_users AS (
SELECT
user_id,
variation_id,
MIN(timestamp) AS first_exposure
FROM experiment_events
WHERE experiment_id = 'checkout_redesign_v2'
GROUP BY user_id, variation_id
),
conversions AS (
SELECT
eu.user_id,
eu.variation_id,
COALESCE(SUM(o.revenue), 0) AS revenue,
COUNT(DISTINCT o.order_id) AS orders
FROM experiment_users eu
LEFT JOIN orders o ON eu.user_id = o.user_id
AND o.created_at >= eu.first_exposure
AND o.created_at <= eu.first_exposure + INTERVAL '14 days'
GROUP BY eu.user_id, eu.variation_id
)
SELECT
variation_id,
COUNT(*) AS users,
AVG(revenue) AS avg_revenue_per_user,
SUM(orders)::float / COUNT(*) AS conversion_rate,
STDDEV(revenue) AS revenue_stddev
FROM conversions
GROUP BY variation_id;The Assignment Architecture
If you do build, the assignment layer is the critical piece:
// Deterministic assignment using hashing
// Same user always gets the same variation
function assignVariation(
userId: string,
experimentId: string,
variations: string[],
trafficAllocation: number = 1.0
): string | null {
const hash = murmurhash3(`${experimentId}:${userId}`);
const normalized = hash / 0xFFFFFFFF; // 0 to 1
// Traffic allocation: only include a percentage of users
if (normalized > trafficAllocation) return null;
// Assign to variation
const bucket = Math.floor((normalized / trafficAllocation) * variations.length);
return variations[Math.min(bucket, variations.length - 1)];
}
// Mutual exclusion: prevent users from being in conflicting experiments
function checkMutualExclusion(
userId: string,
experimentId: string,
exclusionGroups: Map<string, string[]>
): boolean {
for (const [groupId, experiments] of exclusionGroups) {
if (!experiments.includes(experimentId)) continue;
// User is assigned to one experiment per exclusion group
const groupHash = murmurhash3(`exclusion:${groupId}:${userId}`);
const assignedExperiment = experiments[groupHash % experiments.length];
if (assignedExperiment !== experimentId) return false;
}
return true;
}Statistical Rigor Checklist
Whether you build or buy, ensure your platform handles:
✓ Sample size calculation (how long to run the test)
✓ Sequential testing (peeking without inflating false positives)
✓ Multiple comparison correction (testing many metrics)
✓ Novelty/primacy effects (results change over time)
✓ Sample Ratio Mismatch detection (uneven assignment = bug)
✓ Minimum Detectable Effect size (what's worth detecting?)
Common mistakes:
✗ Stopping tests early when results "look significant"
✗ Running tests on segments after seeing overall results
✗ Not accounting for day-of-week effects
✗ Using overall conversion rate instead of per-user metrics
The Real Cost of Building
Year 1 costs of building an experimentation platform:
3 engineers × 4 months = $200K in salary
Infrastructure (feature flag service, analytics): $20K
Opportunity cost (features not built): $300K+
Total: ~$520K
Year 2+ ongoing costs:
1-2 engineers maintaining and improving: $150-300K/year
Infrastructure: $30K/year
Vendor platform costs:
Year 1: $30-100K (depending on scale)
Year 2+: $50-150K/year (grows with usage)
Break-even: Usually 3-4 years at scale (100+ experiments/quarter)
The right answer for most teams: buy a platform, own your analysis, and invest engineering time in building product features that need testing — not the testing infrastructure itself. Build only when experimentation is a core competency that differentiates your business.