Why AutoML failed to live up to the hype

Why AutoML failed to live up to the hype

Hint: It doesn't actually automate ML

Duncan Gilchrist
Jeremy Hermann
September 11, 2024

The mid-2010s were an exciting time for data science. Big data technologies like Hadoop and Spark were being adopted at scale, cloud computing was making unprecedented computational power accessible, and the field of machine learning was advancing rapidly. Just as every forward-thinking company was racing to adopt big data and machine learning, excitement about a bold new promise emerged: AutoML, or Automated Machine Learning.

The idea behind AutoML is conceptually simple: if you know what you’re trying to predict (like fraud or churn), then isn’t building the very best ML model just a giant optimization problem? Can’t the entire machine learning loop be automated?

Google Trends show how popularity in the idea exploded in late 2017:

The vision of AutoML was understandably appealing: automating ML itself would allow companies to leverage the power of machine learning without needing teams of expensive, hard to hire, hard to manage data scientists.

Yet AutoML didn't revolutionize data science as promised; and in fact, there is generally a sense of skepticism about AutoML in the data community. You can see in the Google Trends that the popularity of AutoML today is below its peak in 2019.

During our time at Uber, we used AutoML to help with hyperparameter tuning. We’ve seen real wins — it definitely creates value. But ultimately, we’ve come to think of AutoML as helping with only the final 10% of building models.

Don’t get us wrong, that last 10% is important! And yet, that 10% is far from enough to obviate the need for data science teams. And when we talk to data scientists, a surprising number of them actually still tune their models by hand.

So what, exactly, went wrong?

Where AutoML fell short

1. AutoML doesn't solve the hardest problems

The most glaring issue with AutoML is that it is a misnomer: it focuses on a sliver of the ML workflow and skips the hardest parts of ML altogether.

At Delphina, we think about the ML process in four steps: problem framing, data prep, building models, and deployment. AutoML tools primarily focus on a narrow set of tasks within the third “build” step, mostly around choosing, training, and tuning the model.

Specifically, most AutoML tools expect you to feed in a flat dataframe or CSV of data that includes a column you want to predict, and then they will try lots of different model types and hyperparameter configurations to best predict that outcome.

Again, this is helpful – but data scientists spend the majority of their time on other tasks, largely before model selection and building. For example, Anaconda’s 2022 survey of the state of data science found that data scientists only spend 18% of their time on model selection and training. The lion’s share goes to other work like:

  • Problem framing: The first step to impactful ML projects is making sure you're solving the right problem. If you don’t get this right, it’s easy to waste months of work and fail to deliver value.
  • Finding the right data in a large relational warehouse: Data scientists must painstakingly navigate through hundreds or thousands of data fields, often with confusing naming conventions and unclear ownership.
  • Understanding and cleaning that data: Once the right data is located, it needs to be reshaped and cleaned. This involves deciphering how the data was created and handling various data quality issues.
Photo by Pierre Bamin on Unsplash

AutoML tools focus on the cleanest part of the workflow — the constrained space of optimizing what data they've been given – and leave data scientists to manually grapple with the rest.

2. AutoML is too opaque

A second significant limitation of AutoML tools is their lack of transparency. These systems produce a fitted model as their output, but they usually don't provide visibility into how they arrived at that particular model, nor make it easy to understand why it's good. This "black box" nature might be OK if the AutoML is only picking hyperparameters, but quickly becomes problematic if it’s trying to do a lot more – even simple data cleaning or column transformations.

That’s because data scientists often need to understand the guts of their models – e.g. what patterns the model is picking up on, and how those might play out in different situations. In order to make targeted improvements to models or address specific weaknesses, data scientists need visibility into what’s actually happening under the hood.

3. AutoML isn’t easy enough to use

The solution to the opacity problem is for a scientist to crack open the AutoML tool and play with the inner workings of the system. But this is easier said than done.

This creates a paradox: unless the off-the-shelf AutoML gets the job done, practitioners need to pick up an entirely new set of skills to customize it. And so data scientists proficient with libraries like XGBoost and Pandas find it easier to stick with what’s tried and true.

Where data scientists go from here

The moral of this story: in data science, you can’t “just optimize it”. There’s no magic wand that you can plug into Snowflake to instantly solve all your data problems.

AutoML can do good work with small-scale clean datasets. Give it a 1,000-row CSV file and it will produce something reasonable. But it doesn’t come close to solving the real problems that large enterprises are dealing with.

The result is that data scientists continue to do the hard work by hand — at least until GenAI’s ever-increasing abilities start to fill these gaps. Watch this space!

Related articles

All Blog Posts