Startups

A newcomer to AI data labeling, Encord looks to ride a rising tidal wave

Comment

Image Credits: Encord

Before you can even think about building an algorithm to read an X-ray or interpret a blood smear, the machine has to know what’s what in an image. All of the promise of AI in healthcare — an area that has attracted $11.3 billion in private investment in 2021, can’t be realized without carefully labeled data sets that tell machines what exactly they’re looking for.

Creating those labeled data sets is becoming an industry itself, boasting companies well north of unicorn status. Today, Encord, a small startup just out of Y Combinator, is looking to take a piece of the action. Aiming to generate labeled data sets for computer vision projects, Encord launched its own beta version of an AI-assisted labeling program called CordVision. The launch follows pilot programs at Stanford Medicine, Memorial Sloan Kettering and Kings College London. It has also been tested by  Kheiron Medical and Viz AI. 

Encord has developed a set of tools that allow radiologists to zoom in on DICOM images, a format universally used to transmit medical images. And instead of having a radiologist sit down and annotate an entire image, the software is designed to ensure that only key portions of the image are labeled.

Encord was founded in 2020 by Eric Landau, who has a background in applied physics, and Ulrik Stig Hansen. Hansen was working on a master’s thesis project at Imperial College London centered around visualizing large medical image data sets. It was Hansen who initially noticed how time-consuming it was to curate labeled data sets.

Those labeled data sets are important because they provide “ground truths” which algorithms can learn from. There are some ways to build AI that don’t require labeled data sets, but largely AI (especially in healthcare) has relied on supervised learning, which requires them.

To create a labeled data set, more than one doctor will literally go through the images one by one, drawing polygons around relevant features. Other times, it can be done with open source tools or sensors. But either way, scientific literature suggests this step is a major bottleneck in the healthcare AI world, especially when it comes to radiology, which is one area where AI has been predicted to make major strides, but has largely failed to deliver any major paradigm shifts.

“I know there’s a lot of skepticism [of AI in the medical world]. We think the progress is really slow,” Landau told TechCrunch. “We think that transitioning to an approach where you really think about the training data in the first place will help accelerate the progression of these models.”

As the authors of a 2021 paper in Frontiers in Radiology note, it takes human labelers as long as 24 years’ worth of work to label a data set of about 100,000 images. Another 2021 position statement issued by the European Association of Nuclear Medicine (EANM) and the European Association of Cardiovascular Imaging (EACVI) notes that “obtaining labeled data in medical image analysis can be time-consuming and expensive.” But it also points out that new techniques are emerging that can speed things up.

Image Credits: Encord DICOM labeling platform

Ironically, those new techniques are themselves versions of artificial intelligence. That 2021 Frontiers in Radiology paper, for instance, showed that applying an active learning approach, the process could be 87% faster. It would take just 3.2 work-years, as opposed to the 24 years, to go back to the 100,000 images example.

CordVision, basically, is a version of an active learning process called micro-modeling. That technique, broadly, works by having a team label a small, representative sample of the images. Then a specific AI is trained on those images and then applied to the wider pool, which the AI labels. Then human reviewers can check the AI’s work as opposed to doing the labeling from scratch.

Landu breaks it down well in a blog post on his Medium page: Imagine making an algorithm designed to detect The Batman in Batman movies. Your micro-model would be trained on five images depicting the Christian Bale batman. Another might be trained to recognize Ben Affleck’s Batman, and so on. All together, you build the bigger algorithm using each small part, then set it free on the series as a whole.

“That’s something that we found works quite well, because you could get away with doing very, very few annotations and bootstrapping the process,” he said.

Encord has published data to back up Landau’s claims. For instance, one study conducted in conjunction with Kings College London compared CordVision with a labeling program developed by Intel. Five labelers addressed 25,744 endoscopy video frames. The gastroenterologists who used CordVision moved 6.4 times faster.

The method was also effective when applied to a test set of 15,521 COVID-19 X-rays. People reviewed just 5% of the total images, and the final accuracy of an AI labeling model was 93.7%.

That said, Enord is far from the only company that has identified this bottleneck and sought to use AI to smooth out the labeling process. Existing companies in this space are already reporting large valuations. For instance, Scale AI reached a $7.3 billion valuation in 2021 and Snorkel has reached unicorn status.

Scale AI gets into the synthetic data game

The company’s biggest competitor, by Landau’s admission, is probably Labelbox. Labelbox boasted about 50 customers when TechCrunch covered them at Series A stage. In January the company closed a $110 million Series D putting it within spitting distance of the $1 billion mark.

CordVision is still a very small fish. But it’s caught up in a data labeling tidal wave. Landau says the company is going after places that are still using open-source or internal tools to do their own data labeling.

So far, the company has raised $17.1 in seed and Series A funding since graduating from Y Combinator. The company has grown from its two founders to a team of 20 people. Encord, Landau says, isn’t burning through cash. The company isn’t seeking fundraising right now, and believes that the current raises will be enough to get this tool through the commercialization process.

More TechCrunch

The problem is not the media, but the message.

Apple’s ‘Crush’ ad is disgusting

Ever wonder why conversational AI like ChatGPT says “Sorry, I can’t do that” or some other polite refusal? OpenAI is offering a limited look at the reasoning behind its own…

OpenAI offers a peek behind the curtain of its AI’s secret instructions

The federal government agency responsible for granting patents and trademarks is alerting thousands of filers whose private addresses were exposed following a second data spill in as many years. The…

US Patent and Trademark Office confirms another leak of filers’ address data

As part of an investigation into people involved in the pro-independence movement in Catalonia, the Spanish police obtained information from the encrypted services Wire and Proton, which helped the authorities…

Encrypted services Apple, Proton and Wire helped Spanish police identify activist

Match Group, the company that owns several dating apps, including Tinder and Hinge, released its first-quarter earnings report on Tuesday, which shows that Tinder’s paying user base has decreased for…

Match looks to Hinge as Tinder fails

Private social networking is making a comeback. Gratitude Plus, a startup that aims to shift social media in a more positive direction, is expanding its wellness-focused, personal reflections journal to…

Gratitude Plus makes social networking positive, private and personal

With venture totals slipping year-over-year in key markets like the United States, and concern that venture firms themselves are struggling to raise more capital, founders might be worried. After all,…

Can AI help founders fundraise more quickly and easily?

Google has found a way to bring a variation of its clever “Circle to Search” gesture to iPhone users. The new interaction, launched in January, allows Android users to search…

Google brings a variation on ‘Circle to Search’ to iPhone users

A new sculpture going live on Wednesday in the Flatiron South Public Plaza in New York is not your typical artwork. It combines technology, sociology, anthropology and art to let…

Always-on video portal lets people in NYC and Dublin interact in real time

Apple’s iPad event had a lot to like. New iPads with new chips and new sizes, a new Apple Pencil, and even some software updates. If you are a big…

TechCrunch Minute: When did iPads get as expensive as MacBooks?

Autonomous, AI-based players are coming to a gaming experience near you, and a new startup, Altera, is joining the fray to build this new guard of AI agents. The company announced…

Bye-bye bots: Altera’s game-playing AI agents get backing from Eric Schmidt

Google DeepMind has taken the wraps off a new version AlphaFold, their transformative machine learning model that predicts the shape and behavior of proteins. AlphaFold 3 is not only more…

Google DeepMind debuts huge AlphaFold update and free proteomics-as-a-service web app

Uber plans to deliver more perks to Uber One members, like member-exclusive events, in a bid to gain more revenue through subscriptions.  “You will see more member-exclusives coming up where…

Uber promises member exclusives as Uber One passes $1B run-rate

We’ve all seen them. The inspector with a clipboard, walking around a building, ticking off the last time the fire extinguishers were checked, or if all the lights are working.…

Checkfirst raises $1.5M pre-seed to apply AI to remote inspections and audits

Close to a decade ago, brothers Aviv and Matteo Shapira co-founded a company, Replay, that created a video format for 360-degree replays — the sorts of replays that have become…

Controversial drone company Xtend leans into defense with new $40 million round

Usually, when something starts to rot, it gets pitched in the trash. But Joanne Rodriguez wants to turn the concept of rot on its head by growing fungus on trash…

Mycocycle uses mushrooms to upcycle old tires and construction waste

Monzo has raised another £150 million ($190 million), as the challenger bank looks to expand its presence internationally — particularly in the U.S. The new round comes just two months…

UK challenger bank Monzo nabs another $190M as US expansion beckons

iRobot has announced the successor to longtime CEO, Colin Angle. Gary Cohen, who previous held chief executive role at Timex and Qualitor Automotive, will be heading up the company, marking a major…

iRobot names former Timex head Gary Cohen as CEO

Reddit — now a publicly-traded company with more scrutiny on revenue growth — is putting a big focus on boosting its international audience, starting with francophones. In their first-ever earnings…

Reddit tests automatic, whole-site translation into French using LLM-based AI

Mushrooms continue to be a big area for alternative proteins. Canada-based Maia Farms recently raised $1.7 million to develop a blend of mushroom and plant-based protein using biomass fermentation. There’s…

Meati Foods bites into another $100M amid growth to 7,000 retail locations

Cleaning the outside of buildings is a dirty job, and it’s also dangerous. Lucid Bots came on the scene in 2018 with its Sherpa line of drones to clean windows…

Lucid Bots secures $9M for drones to clean more than your windows

High interest rates and financial pressures make it more important than ever for finance teams to have a better handle on their cash flow, and several startups are hoping to…

Israeli startup Panax raises a $10M Series A for its AI-driven cash flow management platform

The European Union has deepened the investigation of Elon Musk-owned social network, X, that it opened back in December under the bloc’s online governance and content moderation rulebook, the Digital Services Act…

EU grills Elon Musk’s X about content moderation and deepfake risks

For the founders of Atlan, a data governance startup, data has always been at the heart of what they do, even before they launched the company. In fact, co-founders Prukalpa…

Atlan scores $105M for its data control plane, as LLMs boost importance of data

It is estimated that about 2 billion people, especially those in lower and middle-income countries, lack access to quality and affordable essential medicines. The situation is exacerbated by low-quality or even killer…

Axmed raises $2M from Founderful to streamline drug supply chains in underserved markets

For decades, the Global Positioning System (GPS) has maintained a de facto monopoly on positioning, navigation and timing, because it’s cheap and already integrated into billions of devices around the…

Xona Space Systems closes $19M Series A to build out ultra-accurate GPS alternative

Bankruptcy lawyers representing customers impacted by the dramatic crash of cryptocurrency exchange FTX 17 months ago say that the vast majority of victims will receive their money back — plus interest. The…

FTX crypto fraud victims to get their money back — plus interest

Google on Wednesday launched its digital wallet in India with local integrations, nearly two years after the app was relaunched as a digital wallet platform in the U.S. As TechCrunch exclusively reported last month,…

Google Wallet is now available in India

Bluesky has launched a new product roadmap for the coming months. The decentralized social network said on Tuesday that it is planning to introduce direct messages, support for videos, improved…

Bluesky to add DMs, video support and in-app custom feed curation

Samsung Medison, a medical device unit of Samsung Electronics that specializes in developing diagnostic imaging devices, said on Wednesday it plans to acquire Sonio, a Paris-based startup that makes AI-powered software…

Samsung Medison to acquire French AI ultrasound startup Sonio for $92.7M