AI

Speechmatics raises $62M for its inclusive approach to speech-to-text AI

Comment

Image Credits: Speechmatics

Last week I wrote about an AI startup that’s building technology that can alter, in real time, the accent of someone’s speech. But what if the AI goal instead is to make it possible for people speaking in whatever way they do, to be understood just as they are, and to remove some of the bias inherent in a lot of AI systems in the process? There’s a major need for that, too, and now a U.K. startup called Speechmatics — which has built AI to translate speech to text, regardless of the accent or how the person speaks — is announcing $62 million in funding to expand its business.

Susquehanna Growth Equity out of the U.S. led the round with U.K. investors AlbionVC and IQ Capital also participating. This Series B is a big step up for Speechmatics. The company was originally spun out back in 2006 of AI research in Cambridge by founder Dr. Tony Robinson, and prior to this had only raised around $10 million (Albion and IQ are among those past backers, along with the CIA-backed In-Q-Tel and others).

In the interim it has built up a customer base of some 170 — it only sells B2B, to power consumer-facing or business-facing services — and while it doesn’t disclose the full list, some of the names include what3words, 3Play Media, Veritone, Deloitte UK and Vonage, which variously use the tech not just for making transcriptions in the traditional sense; but for taking in spoken words to help other aspects of an app function, such as automatic captioning, or to power wider accessibility features.

Its engine today is able to translate speech to text in 34 languages, and in addition to using the funding both to continue improving the accuracy there, and for business development, it will be adding more languages and looking at different use cases, such as building speech to text that can be used in the more tricky environment of motor vehicles (where motor noise and vibrations impact how AIs can ingest the sounds).

“What we have done is gather millions of hours of data in our effort to tackle AI bias. Our goal is to understand any and every voice, in multiple languages,” said Katy Wigdahl, the CEO of the startup (a title she co-held with Robinson, who has since stepped back from an executive role recently).

This manifests in the company’s product focus as well as its mission, and that’s something it’s also looking to expand.

“The way we look at language is global,” Wigdahl said. “Google will have a different pack for every version of English but our one pack will understand every one.” It initially only made its tech available by way of a private API that it sold to customers; now in an effort to bring in more users and potentially more paying users, it’s also offering more open API tools to developers to play with the tech, and a drag-and-drop sampler on its site.

And indeed, if one of Speechmatics’ challenges is in training AI to be more human in its understanding of how people speak, the other is to carve out a name for itself against other major providers of speech-to-text technology.

Wigdahl said the company today competes against “Big Tech” — that is, major companies like Amazon, Google and Microsoft (which now has Nuance) that have built speech recognition engines and provide the tech as a service to third parties.

But it says it consistently scores better than these in tests for being able to comprehend when languages are spoken in the many ways that they are. (One test it cited to me was Stanford’s ‘Racial Disparities in Speech Recognition’ study, where it recorded “an overall accuracy of 82.8% for African American voices compared to Google (68.6%) and Amazon (68.6).” It said that “equates to a 45% reduction in speech recognition errors — the equivalent of three words in an average sentence. It also provided TC with a “competitor weighted average”: 

Image Credits: Speechmatics (opens in a new window)

There is indeed a massive opportunity here, though, when you consider that between smaller developers and massive, outsized technology giants like Apple, Google, Microsoft and Amazon there are hundreds of giant companies that might not be quite at the level (or interest) of building in-house AI for this purpose, but if you take for example a company like Spotify, are definitely are interested in it, and definitely would prefer not to be reliant on those huge companies, which are also sometimes their competitors, and sometimes their outright foils. (To be clear, Wigdahl did not tell me Spotify was a customer, but said that that is a typical example of the kind of size and situation in which someone might knock on Speechmatics’ door.)

That too has been partly why investors are so keen to fund this company. Susquehanna has a history of backing companies that look like they might give the power players a run for their money (it was an early and big backer of Tik Tok).

“The Speechmatics team are undoubtedly a different pedigree of technologists,” said Jonathan Klahr, managing director of Susquehanna Growth Equity, in a statement. “We started tracking Speechmatics when our portfolio companies told us that again and again Speechmatics win on accuracy against all the other options including those coming from ‘Big Tech’ players. We are primed to work with the team to ensure that more companies can get exposed to and adopt this superior technology.” Klahr is joining the board with this round.

Indeed, as tech becomes more naturalized and those making it look for more ways to reduce any and all friction that there might be around usage of that tech, voice has emerged as a major opportunity point, as well as a pain point. So having tech that works in “reading” and understanding all kinds of voices can potentially get applied in all kinds of ways.

“Our view is voice will become the increasingly dominant human-machine interface and Speechmatics are the category leaders in applying deep learning to speech, with category defining accuracy and understanding across industry use-case and requirements,” added Robert Whitby-Smith, a partner at AlbionVC. “We have witnessed the impressive growth of the team and product over the last few years since our Series A investment in 2019 and as responsible investors we are delighted to support the company’s inclusive mission to understand every voice globally.” 

More TechCrunch

Mobile app developers, including Patreon and Grammarly, are already integrating with Gemini Nano, its smallest AI model, the company announced during its I/O developer keynote on Tuesday. The companies, along…

Patreon and Grammarly are already experimenting with Gemini Nano, says Google

As part of the update, Reddit also launched a dedicated AMA tab within the web post composer.

Reddit introduces new tools for ‘Ask Me Anything,’ its Q&A feature

Here are quick hits of the biggest news from the keynote as they are announced.

Google I/O 2024: Here’s everything Google just announced

LearnLM is already powering features across Google products, including in YouTube, Google’s Gemini apps, Google Search and Google Classroom.

LearnLM is Google’s new family of AI models for education

The official launch comes almost a year after YouTube began experimenting with AI-generated quizzes on its mobile app. 

Google is bringing AI-generated quizzes to academic videos on YouTube

Around 550 employees across autonomous vehicle company Motional have been laid off, according to information taken from WARN notice filings and sources at the company.  Earlier this week, TechCrunch reported…

Motional cut about 550 employees, around 40%, in recent restructuring, sources say

The keynote kicks off at 10 a.m. PT on Tuesday and will offer glimpses into the latest versions of Android, Wear OS and Android TV.

Google I/O 2024: Watch all of the AI, Android reveals

It ran 110 minutes, but Google managed to reference AI a whopping 121 times during Google I/O 2024 (by its own count). CEO Sundar Pichai referenced the figure to wrap…

Google mentioned ‘AI’ 120+ times during its I/O keynote

Google Play has a new discovery feature for apps, new ways to acquire users, updates to Play Points, and other enhancements to developer-facing tools.

Google Play preps a new full-screen app discovery feature and adds more developer tools

Soon, Android users will be able to drag and drop AI-generated images directly into their Gmail, Google Messages and other apps.

Gemini on Android becomes more capable and works with Gmail, Messages, YouTube and more

Veo can capture different visual and cinematic styles, including shots of landscapes and timelapses, and make edits and adjustments to already-generated footage.

Google Veo, a serious swing at AI-generated video, debuts at Google I/O 2024

In addition to the body of the emails themselves, the feature will also be able to analyze attachments, like PDFs.

Gemini comes to Gmail to summarize, draft emails, and more

The summaries are created based on Gemini’s analysis of insights from Google Maps’ community of more than 300 million contributors.

Google is bringing Gemini capabilities to Google Maps Platform

Google says that over 100,000 developers already tried the service.

Project IDX, Google’s next-gen IDE, is now in open beta

The system effectively listens for “conversation patterns commonly associated with scams” in-real time. 

Google will use Gemini to detect scams during calls

The standard Gemma models were only available in 2 billion and 7 billion parameter versions, making this quite a step up.

Google announces Gemma 2, a 27B-parameter version of its open model, launching in June

This is a great example of a company using generative AI to open its software to more users.

Google TalkBack will use Gemini to describe images for blind people

Firebase Genkit is an open source framework that enables developers to quickly build AI into new and existing applications.

Google launches Firebase Genkit, a new open source framework for building AI-powered apps

This will enable developers to use the on-device model to power their own AI features.

Google is building its Gemini Nano AI model into Chrome on the desktop

Google’s Circle to Search feature will now be able to solve more complex problems across psychics and math word problems. 

Circle to Search is now a better homework helper

People can now search using a video they upload combined with a text query to get an AI overview of the answers they need.

Google experiments with using video to search, thanks to Gemini AI

A search results page based on generative AI as its ranking mechanism will have wide-reaching consequences for online publishers.

Google will soon start using GenAI to organize some search results pages

Google has built a custom Gemini model for search to combine real-time information, Google’s ranking, long context and multimodal features.

Google is adding more AI to its search results

At its Google I/O developer conference, Google on Tuesday announced the next generation of its Tensor Processing Units (TPU) AI chips.

Google’s next-gen TPUs promise a 4.7x performance boost

Google is upgrading Gemini, its AI-powered chatbot, with features aimed at making the experience more ambient and contextually useful.

Google’s Gemini updates: How Project Astra is powering some of I/O’s big reveals

Veo can generate few-seconds-long 1080p video clips given a text prompt.

Google’s image-generating AI gets an upgrade

At Google I/O, Google announced upgrades to Gemini 1.5 Pro, including a bigger context window. .

Google’s generative AI can now analyze hours of video

The AI upgrade will make finding the right content more intuitive and less of a manual search process.

Google Photos introduces an AI search feature, Ask Photos

Apple released new data about anti-fraud measures related to its operation of the iOS App Store on Tuesday morning, trumpeting a claim that it stopped over $7 billion in “potentially…

Apple touts stopping $1.8B in App Store fraud last year in latest pitch to developers

Online travel agency Expedia is testing an AI assistant that bolsters features like search, itinerary building, trip planning, and real-time travel updates.

Expedia starts testing AI-powered features for search and travel planning