Startups

For successful AI projects, celebrate your graveyard and be prepared to fail fast

Comment

Image of an origami crane and several crumpled pieces of paper to represent success from failure.
Image Credits: Wachiwit (opens in a new window) / Getty Images

AI teams invest a lot of rigor in defining new project guidelines. But the same is not true for killing existing projects. In the absence of clear guidelines, teams let infeasible projects drag on for months.

They put up a dog and pony show during project review meetings for fear of becoming the messengers of bad news. By streamlining the process to fail fast on infeasible projects, teams can significantly increase their overall success with AI initiatives.

AI projects are different from traditional software projects. They have a lot more unknowns: availability of right datasets, model training to meet required accuracy threshold, fairness and robustness of recommendations in production, and many more.

In order to fail fast, AI initiatives should be managed as a conversion funnel analogous to marketing and sales funnels. Projects start at the top of the five-stage funnel and can drop off at any stage, either to be temporarily put on ice or permanently suspended and added to the AI graveyard. Each stage of the AI funnel defines a clear set of unknowns to be validated with a list of time-bound success criteria.

The AI project funnel has five stages:

Image Credits: Sandeep Uttamchandani

1. Problem definition: “If we build it, will they come?”

This is the top of the funnel. AI projects require significant investments not just during initial development but ongoing monitoring and refinement. This makes it important to verify that the problem being solved is truly worth solving with respect to potential business value compared to the effort to build. Even if the problem is worth solving, AI may not be required. There might be easier human-encoded heuristics to solve the problem.

Developing the AI solution is only half the battle. The other half is how the solution will actually be used and integrated. For instance, in developing an AI solution for predicting customer churn, there needs to be a clear understanding of incorporating attrition predictions in the customer support team workflow. A perfectly powerful AI project will fail to deliver business value without this level of integration clarity.

To successfully exit this stage, the following statements need to be true:

  • The AI project will produce tangible business value if delivered successfully.
  • There are no cheaper alternatives that can address the problem with the required accuracy threshold.
  • There is a clear path to incorporate the AI recommendations within the existing flow to make an impact.

In my experience, the early stages of the project have a higher ratio of aspiration compared to ground realities. Killing an ill-formed project can avoid teams from building “solutions in search of problems.”

2. Data availability : “We have the data to build it.”

At this stage of the funnel, we have verified the problem is worth solving. We now need to confirm the data availability to build the perception, learning and reasoning capabilities required in the AI project. Data needs vary based on the type of AI project  —  the requirements for a project building classification intelligence will be different from one providing recommendations or ranking.

Data availability broadly translates to having the right quality, quantity and features. Right quality refers to the fact that the data samples are an accurate reflection of the phenomenon we are trying to model  and meet properties such as independent and identically distributed. Common quality checks involve uncovering data collection errors, inconsistent semantics and errors in labeled samples.

The right quantity refers to the amount of data that needs to be available. A common misconception is that a significant amount of data is required for training machine learning models. This is not always true. Using pre-built transfer learning models, it is possible to get started with very little data. Also, more data does not always mean useful data. For instance, historic data spanning 10 years may not be a true reflection of current customer behavior. Finally, the right features need to be available to build the model. This is typically iterative and involves ML model design.

To successfully exit this stage, the following statements need to be true:

  • The datasets for the required features are available.
  • The corresponding datasets meet the quality requirements.
  • There are enough historic data samples available in those datasets.

In my experience, projects often are put on ice at this stage. The required features are missing and may take several months for the application teams to gather the datasets.

3. Model training :  “The project meets the accuracy thresholds.”

At this stage, we have confirmed the data is available and have iterated on ML model features. Now, it’s time to verify whether a model can actually be built to satisfy the required accuracy threshold.

Training is an iterative process where different combinations of ML algorithms, model configuration, datasets and input features are tried iteratively with the goal to meet the accuracy threshold. Training is resource-intensive, and given large datasets, the infrastructure capacity can become the limiting factor. This stage verifies that it is feasible to build the model using the existing infrastructure resources or within a feasible cloud budget.

5 machine learning essentials nontechnical leaders need to understand

During the training phase, there is the potential for “false alarms,” when the team has achieved significantly high accuracy numbers that are too good to be true. Before getting excited, it is important to double-check for the training and validation datasets to have duplicate samples. Also, there have been times when the initial tests might be promising but may not generalize over the entire dataset. Randomization of the dataset before training helps to avoid the roller coaster of accuracy variations.

To successfully exit this stage, the AI project is able to meet the required accuracy threshold after training.

4. Results fairness : “Generated results are  not garbage in, garbage out.”

We have confirmed the project can meet accuracy thresholds. Now, it’s time to verify that the results generated are actually fair with respect to bias, explainability, and compliance to privacy and data rights regulations.

Ensuring the fairness of AI recommendations is a topic of significant research. Most datasets are inherently biased and may not capture all the available attributes. Understanding the original purpose and assumptions of the dataset are important. Another common form of bias is underrepresentation —  for instance, a loan underwriting application not trained for a certain category of users or income range scenarios. It is important to evaluate model performance not just for overall accuracy but also across various data slices.

It is not just sufficient for the AI solution to be accurate — it needs to be explainable, i.e., how the algorithm arrived at its conclusions. Several regulated industries using automated decision-making tools are required to provide meaningful information about the generated results to their customers. Explainability can be supported in different forms: result visualization, feature correlations, what-if analysis, model cause-effect interpretability, etc.

To successfully exit this stage, the following statements need to be true:

  • Results have the appropriate checks and bounds for bias and are explainable.
  • The data used by the AI project meets user privacy and compliance regulations such as GDPR and CCPA.

5. Operational fitness: “Is it ready for production ?”

The last stage is to confirm operational fitness. Not all projects require the same operational rigor. I divide projects in a 2×2 matrix based on whether the training and inference are online versus offline. Offline training and inference are the easiest, while online training requires robust data pipelines and monitoring.

There are three core dimensions of operational fitness: model complexity, data pipelines robustness and retraining governance. Complex models are difficult to maintain and debug in production. The key is striking the right balance between simplicity and accuracy: A simple model may be less accurate, while a complex model may be more accurate but may not generalize to new data samples due to overfitting. Similarly, data pipelines are complex to manage given changing data schemas, quality issues and nonstandard business metrics. Finally, retraining needs to take into account changing accuracy due to shifts in data distribution as well as the semantics of features, aka concept drift.

To successfully exit this stage, the following statements need to be true:

  • Models have been optimized with the right balance between complexity and accuracy.
  • Data pipelines are robust with the required level of monitoring.
  • The right level of data and concept drift monitoring is implemented for model retraining.

To succeed in AI initiatives, teams need to fail fast. The five-stage conversion funnel provides a vocabulary for AI teams to communicate the status of projects to business teams replacing their black-box perception of these projects with a list of known unknowns. The funnel also helps identify common dropoff stages across projects that are potential areas of improvement. In a fail-fast culture, the AI graveyard is celebrated for the lessons learned that can be applied to future projects.

How we dodged risks and raised millions for our open-source machine learning startup

More TechCrunch

Welcome back to TechCrunch’s Week in Review — TechCrunch’s newsletter recapping the week’s biggest news. Want it in your inbox every Saturday? Sign up here. OpenAI announced this week that…

Scarlett Johansson brought receipts to the OpenAI controversy

Accurate weather forecasts are critical to industries like agriculture, and they’re also important to help prevent and mitigate harm from inclement weather events or natural disasters. But getting forecasts right…

Deal Dive: Can blockchain make weather forecasts better? WeatherXM thinks so

pcTattletale’s website was briefly defaced and contained links containing files from the spyware maker’s servers, before going offline.

Spyware app pcTattletale was hacked and its website defaced

Featured Article

Synapse, backed by a16z, has collapsed, and 10 million consumers could be hurt

Synapse’s bankruptcy shows just how treacherous things are for the often-interdependent fintech world when one key player hits trouble. 

8 hours ago
Synapse, backed by a16z, has collapsed, and 10 million consumers could be hurt

Sarah Myers West, profiled as part of TechCrunch’s Women in AI series, is managing director at the AI Now institute.

Women in AI: Sarah Myers West says we should ask, ‘Why build AI at all?’

Keeping up with an industry as fast-moving as AI is a tall order. So until an AI can do it for you, here’s a handy roundup of recent stories in the world…

This Week in AI: OpenAI and publishers are partners of convenience

Evan, a high school sophomore from Houston, was stuck on a calculus problem. He pulled up Answer AI on his iPhone, snapped a photo of the problem from his Advanced…

AI tutors are quietly changing how kids in the US study, and the leading apps are from China

Welcome to Startups Weekly — Haje‘s weekly recap of everything you can’t miss from the world of startups. Sign up here to get it in your inbox every Friday. Well,…

Startups Weekly: Drama at Techstars. Drama in AI. Drama everywhere.

Last year’s investor dreams of a strong 2024 IPO pipeline have faded, if not fully disappeared, as we approach the halfway point of the year. 2024 delivered four venture-backed tech…

From Plaid to Figma, here are the startups that are likely — or definitely — not having IPOs this year

Federal safety regulators have discovered nine more incidents that raise questions about the safety of Waymo’s self-driving vehicles operating in Phoenix and San Francisco.  The National Highway Traffic Safety Administration…

Feds add nine more incidents to Waymo robotaxi investigation

Terra One’s pitch deck has a few wins, but also a few misses. Here’s how to fix that.

Pitch Deck Teardown: Terra One’s $7.5M Seed deck

Chinasa T. Okolo researches AI policy and governance in the Global South.

Women in AI: Chinasa T. Okolo researches AI’s impact on the Global South

TechCrunch Disrupt takes place on October 28–30 in San Francisco. While the event is a few months away, the deadline to secure your early-bird tickets and save up to $800…

Disrupt 2024 early-bird tickets fly away next Friday

Another week, and another round of crazy cash injections and valuations emerged from the AI realm. DeepL, an AI language translation startup, raised $300 million on a $2 billion valuation;…

Big tech companies are plowing money into AI startups, which could help them dodge antitrust concerns

If raised, this new fund, the firm’s third, would be its largest to date.

Harlem Capital is raising a $150 million fund

About half a million patients have been notified so far, but the number of affected individuals is likely far higher.

US pharma giant Cencora says Americans’ health information stolen in data breach

Attention, tech enthusiasts and startup supporters! The final countdown is here: Today is the last day to cast your vote for the TechCrunch Disrupt 2024 Audience Choice program. Voting closes…

Last day to vote for TC Disrupt 2024 Audience Choice program

Featured Article

Signal’s Meredith Whittaker on the Telegram security clash and the ‘edge lords’ at OpenAI 

Among other things, Whittaker is concerned about the concentration of power in the five main social media platforms.

1 day ago
Signal’s Meredith Whittaker on the Telegram security clash and the ‘edge lords’ at OpenAI 

Lucid Motors is laying off about 400 employees, or roughly 6% of its workforce, as part of a restructuring ahead of the launch of its first electric SUV later this…

Lucid Motors slashes 400 jobs ahead of crucial SUV launch

Google is investing nearly $350 million in Flipkart, becoming the latest high-profile name to back the Walmart-owned Indian e-commerce startup. The Android-maker will also provide Flipkart with cloud offerings as…

Google invests $350 million in Indian e-commerce giant Flipkart

A Jio Financial unit plans to purchase customer premises equipment and telecom gear worth $4.32 billion from Reliance Retail.

Jio Financial unit to buy $4.32B of telecom gear from Reliance Retail

Foursquare, the location-focused outfit that in 2020 merged with Factual, another location-focused outfit, is joining the parade of companies to make cuts to one of its biggest cost centers –…

Foursquare just laid off 105 employees

“Running with scissors is a cardio exercise that can increase your heart rate and require concentration and focus,” says Google’s new AI search feature. “Some say it can also improve…

Using memes, social media users have become red teams for half-baked AI features

The European Space Agency selected two companies on Wednesday to advance designs of a cargo spacecraft that could establish the continent’s first sovereign access to space.  The two awardees, major…

ESA prepares for the post-ISS era, selects The Exploration Company, Thales Alenia to develop cargo spacecraft

Expressable is a platform that offers one-on-one virtual sessions with speech language pathologists.

Expressable brings speech therapy into the home

The French Secretary of State for the Digital Economy as of this year, Marina Ferrari, revealed this year’s laureates during VivaTech week in Paris. According to its promoters, this fifth…

The biggest French startups in 2024 according to the French government

Spotify is notifying customers who purchased its Car Thing product that the devices will stop working after December 9, 2024. The company discontinued the device back in July 2022, but…

Spotify to shut off Car Thing for good, leading users to demand refunds

Elon Musk’s X is preparing to make “likes” private on the social network, in a change that could potentially confuse users over the difference between something they’ve favorited and something…

X should bring back stars, not hide ‘likes’

The FCC has proposed a $6 million fine for the scammer who used voice-cloning tech to impersonate President Biden in a series of illegal robocalls during a New Hampshire primary…

$6M fine for robocaller who used AI to clone Biden’s voice

Welcome back to TechCrunch Mobility — your central hub for news and insights on the future of transportation. Sign up here for free — just click TechCrunch Mobility! Is it…

Tesla lobbies for Elon and Kia taps into the GenAI hype