Beyond Online Experimentation: Generative Software That Optimizes Itself

Martin Tingley

Beyond Online Experimentation: Generative Software That Optimizes Itself

Spotify Apple Podcasts YouTube

Martin Tingley, Head of Windows Experimentation at Microsoft and former experimentation leader at Netflix, discusses why humans are the bottleneck in experimentation and how a five-level maturity framework points the way toward self-optimizing software.

Guest

Martin Tingley

Head of Windows Experimentation at Microsoft

Key Takeaways

Experimentation as Commodity.

Experimentation capability is no longer a competitive edge. With vendor solutions proliferated, true advantage comes from climbing a five-level experimentation maturity ladder toward automated generative optimization.

The Success Paradox.

Organizations stuck at level two—shipping high-investment features based on hypotheses—don't recognize this represents complacency rather than excellence. Most of the world is at level two.

Parameter Space Optimization.

Shift from testing discrete variants to "hill-climbing" by adding optionality at decision points and iteratively optimizing across parameter spaces.

Human Bottleneck.

Humans are too expensive for micro-optimization. Machines handle contextual bandits for high-frequency, low-stakes decisions like artwork or subject lines.

Generative AI Frontier.

Level 5 creates closed loops: GenAI generates variants, experimentation platforms evaluate, results feed back for improvement. Coframe demonstrates this with Fortune 500 e-commerce landing pages.

Experimentation Programs Framework.

Plot treatment effect distributions across product areas to inform capital allocation based on experiment patterns and customer sensitivity.

Heterogeneous Treatment Effects.

Looking at the mean is not enough. Examine segment-specific results across power users, regions, and customer tiers.

Failed Experiment Mining.

Non-working experiments reveal segment successes, user confusion patterns, and unmet customer needs—opportunities hidden in apparent failures.

Product Permission Constraints.

Different products tolerate change differently. Windows users prioritize task completion; Netflix users embrace UI novelty. Experimentation velocity must align with product utility.

Incentive Alignment.

Reward throughput and learning over perfect wins. High-volume experimentation builds institutional capacity surfacing non-obvious, high-impact opportunities.

You can read the full transcript here.

Ready to unleash your data?

Discover how Delphina can transform your data science.

Book a demo