The 100-Year Lead: What Baseball Teaches Us About the Future of AI
Chris Fonnesbeck
Hugo Bowne-Anderson
Duncan Gilchrist
Chris Fonnesbeck, veteran analyst for the Yankees and Mets and creator of the open-source Bayesian modeling library PyMC, joins to unpack why baseball has been a leading indicator for data science and analytics for over a century, and why builders and AI leaders need to pay attention now. The reason it has led is simple: huge incentives and a culture that treats decisions as quantifiable. With wins worth about eight to ten million dollars apiece and front offices built around probabilistic reasoning, baseball has had every reason to push the methods further and faster than industry. The skillset and culture that built this lead is what AI teams now need to adopt more of: probabilistic thinking, hierarchical models, integrating expert judgment, reasoning carefully under uncertainty, and increasingly causal inference. The conversation traces the throughline from those early statistical innovations to the decisions driving multi-million dollar contracts today, with concrete patterns AI builders can take back to their own work: how to handle small samples and high stakes, why outcomes are the wrong thing to measure, what changes when you push uncertainty all the way through your model, and why robust causal inference needs to be the next frontier.
Guest
Chris Fonnesbeck
Principal Data Scientist at PyMC Labs
Key Takeaways
Baseball leads data and AI by a decade. Pay attention now.
Baseball has been ahead of industry on data work for more than 100 years, from FC Lane's linear models in the early 1900s to Bill James to Moneyball to today. The techniques Fonnesbeck talks about in this conversation (process-based metrics, Bayesian hierarchical models, causal inference, integrating expert knowledge into probabilistic models) are what baseball front offices are running in 2026. If the pattern holds, this is where data and AI work in industry is heading next. If you want to stay relevant, learn these now.
Measure decision quality, not outcomes.
Modern baseball cares less about what happened (outcome) and more about what should have happened given the inputs (process). With granular sensor data you can evaluate the quality of every pitch and swing independently of whether it produced a hit. AI teams should be making the same move in eval: judge whether the system made the right call at each step, not whether the final answer happened to land.
Share strength across a population to handle small samples.
High-stakes decisions often rely on limited data, like a rookie's first few appearances. Bayesian hierarchical models let you combine a strong prior (historical data on similar players) with a small sample (current performance) to get reliable predictions without overfitting to a hot streak. The same pattern applies anywhere you need a prediction for a new user, customer, or product with little individual data.
Causal modeling is what you need when you are going to act on the model.
A pitcher throwing a specific pitch might correlate with strikeouts, but that doesn't mean the pitch causes them. Fonnesbeck argues the next leap in analytics is asking the counterfactual: would a random pitcher who started throwing this pitch get more strikeouts? Same question applies to any intervention you might run, including adding a new product feature or marketing channel. Correlation is enough when you only want to predict. The moment you want to change something, you need causal inference.
At enough scale, the bottleneck shifts from modeling to engineering.
Baseball's move from radar (TrackMan) to high-frequency optical cameras (Hawkeye) took the data from discrete events to six or seven terabytes per game of skeletal pose points and pitch trajectories. At that scale the hard problem stops being "what model do I fit" and becomes "how do I store, process, and serve this data." Same shift hits any AI team the moment they start logging full system traces and tool calls instead of summaries.
Integrate uncertainty through the model, then collapse it at the decision.
The GM wants a list of players to draft. You still hand over a point prediction. The right way is to push a full distribution through the model and only collapse at the last step. Same problem AI engineers face shipping product recommendations.
Simple baselines still compete.
Tom Tango's Marcel the monkey model (fixed weights, regression to the mean) still holds up against sophisticated projections decades later. Build the dumb baseline first.
Causal tooling lags behind Bayesian tooling.
Bayesian methods have mature off-the-shelf libraries. Causal inference doesn't. If you want to do counterfactuals on observational data, you are mostly writing it from scratch. Methods at the frontier are always like this. The teams that get there first have an edge.
Build processes to integrate expert information into your models.
Bayesian methods give you two natural places to bring in domain experts: prior elicitation (using their knowledge to inform the model upfront) and validation (sniff-testing the outputs). Fonnesbeck used both at the Yankees. For AI teams, the lesson is to design your modeling process so seasoned domain experts have a way in at both ends, not just at review time.