Startups

Xayn is privacy-safe, personalized mobile web search powered by on-device AIs

Comment

As TC readers know, the tricky trade-off of the modern web is privacy for convenience. Online tracking is how this ‘great intimacy robbery’ is pulled off. Mass surveillance of what Internet users are looking at underpins Google’s dominant search engine and Facebook’s social empire, to name two of the highest profile ad-funded business models.

TechCrunch’s own corporate overlord, Verizon, also gathers data from a variety of end points — mobile devices, media properties like this one — to power its own ad targeting business.

Countless others rely on obtaining user data to extract some perceived value. Few if any of these businesses are wholly transparent about how much and what sort of private intelligence they’re amassing — or, indeed, exactly what they’re doing with it. But what if the web didn’t have to be like that?

Berlin-based Xayn wants to change this dynamic — starting with personalized but privacy-safe web search on smartphones.

Today it’s launching a search engine app (on Android and iOS) that offers the convenience of personalized results but without the ‘usual’ shoulder surfing. This is possible because the app runs on-device AI models that learn locally. The promise is no data is ever uploaded (though trained AI models themselves can be).

The team behind the app, which is comprised of 30% PhDs, has been working on the core privacy vs convenience problem for some six years (though the company was only founded in 2017); initially as an academic research project — going on to offer an open source framework for masked federated learning, called XayNet. The Xayn app is based on that framework.

They’ve raised some €9.5 million in early stage funding to date — with investment coming from European VC firm Earlybird; Dominik Schiener (Iota co-founder); and the Swedish authentication and payment services company, Thales AB.

Now they’re moving to commercialize their XayNet technology by applying it within a user-facing search app — aiming for what CEO and co-founder, Dr Leif-Nissen Lundbæk bills as a “Zoom”-style business model, in reference to the ubiquitous videoconferencing tool which has both free and paid users.

This means Xayn’s search is not ad-supported. That’s right; you get zero ads in search results.

Instead, the idea is for the consumer app to act as a showcase for a b2b product powered by the same core AI tech. The pitch to business/public sector customers is speedier corporate/internal search without compromising commercial data privacy.

Lundbæk argues businesses are sorely in need of better search tools to (safely) apply to their own data, saying studies have shown that search in general costs around 18% of working time globally. He also cites a study by one city authority that found staff spent 37% of their time at work searching for documents or other digital content.

“It’s a business model that Google has tried but failed to succeed,” he argues, adding: “We are solving not only a problem that normal people have but also that companies have… For them privacy is not a nice to have; it needs to be there otherwise there is no chance of using anything.”

On the consumer side there will also be some premium add-ons headed for the app — so the plan is for it to be a freemium download.

Swipe to nudge the algorithm

One key thing to note is Xayn’s newly launched web search app gives users a say in whether the content they’re seeing is useful to them (or not).

It does this via a Tinder-style swipe right (or left) mechanic that lets users nudge its personalization algorithm in the right direction — starting with a home screen populated with news content (localized by country) but also extending to the search result pages.

The news-focused homescreen is another notable feature. And it sounds like different types of homescreen feeds may be on the premium cards in future.

Another key feature of the app is the ability to toggle personalized search results on or off entirely — just tap the brain icon at the top right to switch the AI off (or back on). Results without the AI running can’t be swiped, except for bookmarking/sharing.

Elsewhere, the app includes a history page which lists searches from the past seven days (by default). The other options offered are: Today, 30 days, or all history (and a bin button to purge searches).

There’s also a ‘Collections’ feature that lets you create and access folders for bookmarks.

As you scroll through search results you can add an item to a Collection by swiping right and selecting the bookmark icon — which then opens a prompt to choose which one to add it to.

The swipe-y interface feels familiar and intuitive, if slightly laggy to load content in the TestFlight beta version TechCrunch checked out ahead of launch.

Swiping left on a piece of content opens a bright pink color-block stamped with a warning ‘x’. Keep going and you’ll send the item vanishing into the ether, presumably seeing fewer like it in future.

Whereas a swipe right affirms a piece of content is useful. This means it stays in the feed, outlined in Xayn green. (Swiping right also reveals the bookmark option and a share button.)

While there are pro-privacy/non-tracking search engines on the market already — such as US-based DuckDuckGo or France’s Qwant — Xayn argues the user experience of such rivals tends to fall short of what you get with a tracking search engine like Google, i.e. in terms of the relevance of search results and thus time spent searching.

Simply put: You probably have to spend more time ‘DDGing’ or ‘Qwanting’ to get the specific answers you need vs Googling — hence the ‘convenience cost’ associated with safeguarding your privacy when web searching.

Xayn’s contention is there’s a third, smarter way of getting to keep your ‘virtual clothes’ on when searching online. This involves implementing AI models that learn on-device and can be combined in a privacy-safe way so that results can be personalized without putting people’s data at risk.

“Privacy is the very fundament… It means that quite like other privacy solutions we track nothing. Nothing is sent to our servers; we don’t store anything of course; we don’t track anything at all. And of course we make sure that any connection that is there is basically secured and doesn’t allow for any tracking at all,” says Lundbæk, explaining the team’s AI-fuelled, decentralized/edge-computing approach.

On-device reranking

Xayn is drawing on a number of search index sources, including (but not solely) Microsoft’s Bing, per Lundbæk, who described this bit of what it’s doing as “relatively similar” to DuckDuckGo (which has its own web crawling bots).

The big difference is that it’s also applying its own reranking algorithms in order generate privacy-safe personalized search results (whereas DDG uses a contextual ads-based business model — looking at simple signals like location and keyword search to target ads without needing to profile users).

The downside to this sort of approach, according to Lundbæk, is users can get flooded with ads — as a consequence of the simpler targeting meaning the business serves more ads to try to increase chances of a click. And loads of ads in search results obviously doesn’t make for a great search experience.

“We get a lot of results on device level and we do some ad hoc indexing — so we build on the device level and on index — and with this ad hoc index we apply our search algorithms in order to filter them, and only present you what is more relevant and filter out everything else,” says Lundbæk, sketching how Xayn works. “Or basically downgrade it a bit… but we also try to keep it fresh and explore and also bump up things where they might not be super relevant for you but it gives you some guarantees that you won’t end up in some kind of bubble.”

Some of what Xayn’s doing is in the arena of federated learning (FL) — a technology Google has been dabbling in in recent years, including pushing a ‘privacy-safe’ proposal for replacing third party tracking cookies. But Xayn argues the tech giant’s interests, as a data business, simply aren’t aligned with cutting off its own access to the user data pipe (even if it were to switch to applying FL to search).

Whereas its interests — as a small, pro-privacy German startup — are markedly different. Ergo, the privacy-preserving technology it’s spent years building has a credible interest in safeguarding people’s data, is the claim.

Google wants to phase out support for third-party cookies in Chrome within two years

“At Google there’s actually [fewer] people working on federate learning than in our team,” notes Lundbæk, adding: “We’ve been criticizing TFF [Google-designed TensorFlow Federated] at lot. It is federated learning but it’s not actually doing any encryption at all — and Google has a lot of backdoors in there.

“You have to understand what does Google actually want to do with that? Google wants to replace [tracking] cookies — but especially they want to replace this kind of bumpy thing of asking for user consent. But of course they still want your data. They don’t want to give you any more privacy here; they want to actually — at the end — get your data even easier. And with purely federated learning you actually don’t have a privacy solution.

“You have to do a lot in order to make it privacy preserving. And pure TFF is certainly not that privacy-preserving. So therefore they will use this kind of tech for all the things that are basically in the way of user experience — which is, for example, cookies but I would be extremely surprised if they used it for search directly. And even if they would do that there is a lot of backdoors in their system so it’s pretty easy to actually acquire the data using TFF. So I would say it’s just a nice workaround for them.”

“Data is basically the fundamental business model of Google,” he adds. “So I’m sure that whatever they do is of course a nice step in the right direction… but I think Google is playing a clever role here of kind of moving a bit but not too much.”

So how, then, does Xayn’s reranking algorithm work?

The app runs four AI models per device, combining encrypted AI models of respective devices asynchronously — with homomorphic encryption — into a collective model. A second step entails this collective model being fed back to individual devices to personalize served content, it says. 

The four AI models running on the device are one for natural language processing; one for grouping interests; one for analyzing domain preferences; and one for computing context.

“The knowledge is kept but the data is basically always staying on your device level,” is how Lundbæk puts it.

“We can simply train a lot of different AI models on your phone and decide whether we, for example, combine some of this knowledge or whether it also stays on your device.”

“We have developed a quite complex solution of four different AI models that work in composition with each other,” he goes on, noting that they work to build up “centers of interest and centers of dislikes” per user — again, based on those swipes — which he says “have to be extremely efficient — they have to be moving, basically, also over time and with your interests”.

The more the user interacts with Xayn, the more precise its personalization engine gets as a result of on-device learning — plus the added layer of users being able to get actively involved by swiping to give like/dislike feedback.

The level of personalization is very individually focused — Lundbæk calls it “hyper personalization” — more so than a tracking search engine like Google, which he notes also compares cross-user patterns to determine which results to serve — something he says Xayn absolutely does not do.

Small data, not big data

“We have to focus entirely on one user so we have a ‘small data’ problem, rather than a big data problem,” says Lundbæk. “So we have to learn extremely fast — only from eight to 20 interactions we have to already understand a lot from you. And the crucial thing is of course if you do such a rapid learning then you have to take even more care about filter bubbles — or what is called filter bubbles. We have to prevent the engine going into some kind of biased direction.”

To avoid this echo chamber/filter bubble type effect, the Xayn team has designed the engine to function in two distinct phases which it switches between: Called ‘exploration’ and (more unfortunately) ‘exploitation’ (i.e. just in the sense that it already knows something about the user so can be pretty certain what it serves will be relevant).

“We have to keep fresh and we have to keep exploring things,” he notes — saying that’s why it developed one of the four AIs (a dynamic contextual multi-armed bandit reinforcement learning algorithm for computing context).

Aside from this app infrastructure being designed natively to protect user privacy, Xayn argues there are a bunch of other advantages — such as being able to derive potentially very clear interests signs from individuals; and avoiding the chilling effect that can result from tracking services creeping users out (to the point people they avoid making certain searches in order to prevent them from influencing future results).

“You as the user can decide whether you want the algorithm to learn — whether you want it to show more of this or less of this — by just simply swiping. So it’s extremely easy, so you can train your system very easily,” he argues.

There is potentially a slight downside to this approach, too, though — assuming the algorithm (when on) does some learning by default (i.e in the absence of any life/dislike signals from the user).

This is because it puts the burden on the user to interact (by swiping their feedback) in order to get the best search results out of Xayn. So that’s an active requirement on users, rather than the typical passive background data mining and profiling web users are used to from tech giants like Google (which is, however, horrible for their privacy).

It means there’s an ‘ongoing’ interaction cost to using the app — or at least getting the most relevant results out of it. You might not, for instance, be advised to let a bunch of organic results just scroll past if they’re really not useful but rather actively signal disinterest on each.

For the app to be the most useful it may ultimately pay to carefully weight each item and provide the AI with a utility verdict. (And in a competitive battle for online convenience every little bit of digital friction isn’t going to help.)

Asked about this specifically, Lundbæk told us: “Without swiping the AI only learns from very weak likes but not from dislikes. So the learning takes place (if you turn the AI on) but it’s very slight and does not have a big effect. These conditions are quite dynamic, so from the experience of liking something after having visited a website, patterns are learned. Also, only 1 of the 4 AI models (the domain learning one) learns from pure clicks; the others don’t.”

Xayn does seem alive to the risk of the swiping mechanic resulting in the app feeling arduous. Lundbæk says the team is looking to add “some kind of gamification aspect” in the future — to flip the mechanism from pure friction to “something fun to do”. Though it remains to be seen what they come up with on that front.

There is also inevitably a bit of lag involved in using Xayn vs Google — by merit of the former having to run on-device AI training (whereas Google merely hoovers your data into its cloud where it’s able to process it at super-speeds using dedicated compute hardware, including bespoke chipsets).

“We have been working for over a year on this and the core focus point was bringing it on the street, showing that it works — and of course it is slower than Google,” Lundbæk concedes.

“Google doesn’t need to do any of these [on-device] processes and Google has developed even its own hardware; they developed TPUs exactly for processing this kind of model,” he goes on. “If you compare this kind of hardware it’s pretty impressive that we were even able to bring [Xayn’s on-device AI processing] even on the phone. However of course it’s slower than Google.”

Lundbæk says the team is working on increasing the speed of Xayn. And anticipates further gains as it focuses more on that type of optimization — trailing a version that’s 40x faster than the current iteration.

“It won’t at the end be 40x faster because we will use this also to analyze even more content — to give you can even broader view — but it will be faster over time,” he adds.

On the accuracy of search results vs Google, he argues the latter’s ‘network effect’ competitive advantage — whereby its search reranking benefits from Google having more users — is not unassailable because of what edge AI can achieve working smartly atop ‘small data’.

Though, again, for now Google remains the search standard to beat.

“Right now we compare ourselves, mostly against Bing and DuckDuckGo and so on. Obviously there we get much better results [than compared to Google] but of course Google is the market leader and is using quite some heavy personalization,” he says, when we ask about benchmarking results vs other search engines.

“But the interesting thing is so far Google is not only using personalization but they also use kind of a network effect. PageRank is very much a network effect where the most users they have the better the results get, because they track how often people click on something and bump this also up.

“The interesting effect there is that right now, through AI technology — like for example what we use — the network effect becomes less and less important. So actually I would say that there isn’t really any network effect anymore if you really want to compete with pure AI technology. So therefore we can get almost as relevant results as Google right now and we surely can also, over time, get even better results or competing results. But we are different.”

In our (brief) tests of the beta app Xayn’s search results didn’t obviously disappoint for simple searches (and would presumably improve with use). Though, again, the slight load lag adds a modicum of friction which was instantly obvious compared to the usual search competition.

Not a deal breaker — just a reminder that performance expectations in search are no cake walk (even if you can promise a cookie-free experience).

Europe’s top court says active consent is needed for tracking cookies

An opportunity for competition?

“So far Google has so far had the advantage of a network effect — but this network effect gets less and less dominant and you see already more and more alternatives to Google popping up,” Lundbæk argues, suggesting privacy concerns are creating an opportunity for increased competition in the search space.

“It’s not anymore like Facebook or so where there’s one network where everyone has to be. And I think this is actually a nice situation because competition is always good for technical innovations and for also satisfying different customer needs.”

Of course the biggest challenge for any would-be competitor to Google search — which carves itself a marketshare in Europe in excess of 90% — is how to poach (some of) its users.

Lundbæk says the startup has no plans to splash millions on marketing at this point. Indeed, he says they want to grow usage sustainably, with the aim of evolving the product “step by step” with a “tight community” of early adopters — relying on cross-promotion from others in the pro-privacy tech space, as well as reaching out to relevant influencers.

He also reckons there’s enough mainstream media interest in the privacy topic to generate some uplift.

“I think we have such a relevant topic — especially now,” he says. “Because we want to show also not only for ourselves that you can do this for search but we think we show a real nice example that you can do this for any kind of case.

“You don’t always need the so-called ‘best’ big players from the US which are of course getting all of your data, building up profiles. And then you have these small, cute privacy-preserving solutions which don’t use any of this but then offer a bad user experience. So we want to show that this shouldn’t be the status quo anymore — and you should start to build alternatives that are really build on European values.”

And it’s certainly true EU lawmakers are big on tech sovereignty talk these days, even though European consumers mostly continue to embrace big (US) tech.

Perhaps more pertinently, regional data protection requirements are making it increasing challenging to rely on US-based services for processing data. Compliance with the GDPR data protection framework is another factor businesses need to consider. All of which is driving attention onto ‘privacy-preserving’ technologies.

Europe puts out advice on fixing international data transfers that’s cold comfort for Facebook

 

Xayn’s team is hoping to be able spread its privacy-preserving gospel to general users by growing the b2b side of the business, according to Lundbæk — so it’s hoping some home use will follow once employees get used to convenient private search via their workplaces, in a small-scale reverse of the business consumerization trend that was powered by modern smartphones (and people bringing their own device to work).

“We these kind of strategies I think we can step by step build up in our communities and spread the word — so we think we don’t even need to really spend millions of euros in marketing campaigns to get more and more users,” he adds.

While Xayn’s initial go-to-market push has been focused on getting the mobile apps out, a desktop version is also planned for Q1 next year.

The challenge there is getting the app to work as a browser extension as the team obviously doesn’t want to build its own browser to house Xayn. tl;dr: Competing with Google search is mountain enough to climb, without trying to go after Chrome (and Firefox, and so on).

“We developed our entire AI in Rust which is a safe language. We are very much driven by security here and safety. The nice thing is it can work everywhere — from embedded systems towards mobile systems, and we can compile into web assembly so it runs also as a browser extension in any kind of browser,” he adds. “Except for Internet Explorer of course.”

More TechCrunch

Government officials and AI industry executives agreed on Tuesday to apply elementary safety measures in the fast-moving field and establish an international safety research network. Nearly six months after the…

In Seoul summit, heads of states and companies commit to AI safety

Copilot, Microsoft’s brand of generative AI, will soon be far more deeply integrated into the Windows 11 experience.

Microsoft wants to make Windows an AI operating system, launches Copilot+ PCs

Some startups choose to bootstrap from the beginning while others find themselves forced into self funding by a lack of investor interest or a business model that doesn’t fit traditional…

VCs wanted FarmboxRx to become a meal kit, the company bootstrapped instead

Uber and Lyft drivers in Minnesota will see higher pay thanks to a deal between the state and the country’s two largest ride-hailing companies. The upshot: a new law that…

Uber’s and Lyft’s ride-hailing deal with Minnesota comes at a cost

Andreessen Horowitz’s American Dynamism fund has established a new fellowship program aimed at introducing top engineers and technologists to venture investing, a move that could help the firm identify less…

a16z’s American Dynamism team launches program to introduce technical minds to VC

Another fintech startup, and its customers, has been gravely impacted by the implosion of banking-as-a-service startup Synapse. Copper Banking, a digital banking service aimed at teens, notified its customers on…

Teen fintech Copper had to abruptly discontinue its banking, debit products

Autodesk — the 3D tools behemoth — has acquired Wonder Dynamics, a startup that lets creators quickly and easily make complex characters and visual effects using AI-powered image analysis. The…

Autodesk acquires AI-powered VFX startup Wonder Dynamics

Farcaster, a blockchain-based social protocol founded by two Coinbase alumni, announced on Tuesday that it closed a $150 million fundraise. Led by Paradigm, the platform also raised money from a16z…

Farcaster, a crypto-based social network, raised $150M with just 80K daily users

Microsoft announced on Tuesday during its annual Build conference that it’s bringing “Windows Volumetric Apps” to Meta Quest headsets. The partnership will allow Microsoft to bring Windows 365 and local…

Microsoft’s new ‘Volumetric Apps’ for Quest headsets extend Windows apps into the 3D space

The spam reached Bluesky by first crossing over two other decentralized networks: Mastodon and Nostr.

The ‘vote Trump’ spam that hit Bluesky in May came from decentralized rival Nostr

Welcome to TechCrunch Fintech! This week, we’re looking at the continued fallout from Synapse’s bankruptcy, how Layer wants to disrupt SMB accounting, and much more! To get a roundup of…

There’s a real appetite for a fintech alternative to QuickBooks

The company is hoping to produce electricity at $13 per megawatt hour, which would be more than 50% cheaper than traditional onshore wind.

Bill Gates-backed wind startup AirLoom is raising $12M, filings reveal

Generative AI makes stuff up. It can be biased. Sometimes it spits out toxic text. So can it be “safe”? Rick Caccia, the CEO of WitnessAI, believes it can. “Securing…

WitnessAI is building guardrails for generative AI models

It’s not often that you hear about a seed round above $10 million. H, a startup based in Paris and previously known as Holistic AI, has announced a $220 million…

French AI startup H raises $220M seed round

Hey there, Series A to B startups with $35 million or less in funding — we’ve got an exciting opportunity that’s tailor-made for your growth journey! If you’re looking to…

Boost your startup’s growth with a ScaleUp package at TC Disrupt 2024

TikTok is pulling out all the stops to prevent its impending ban in the United States. Aside from initiating legal action against the U.S. government, that means shaping up its…

As a US ban looms, TikTok announces a $1M program for socially driven creators

Microsoft wants to put its Copilot everywhere. It’s only a matter of time before Microsoft renames its annual Build developer conference to Microsoft Copilot. Hopefully, some of those upcoming events…

Microsoft’s Power Automate no-code platform adds AI flows

Build is Microsoft’s largest developer conference and of course, it’s all about AI this year. So it’s no surprise that GitHub’s Copilot, GitHub’s “AI pair programming tool,” is taking center…

GitHub Copilot gets extensions

Microsoft wants to make its brand of generative AI more useful for teams — specifically teams across corporations and large enterprise organizations. This morning at its annual Build dev conference,…

Microsoft intros a Copilot for teams

Microsoft’s big focus at this year’s Build conference is generative AI. And to that end, the tech giant announced a series of updates to its platforms for building generative AI-powered…

Microsoft upgrades its AI app-building platforms

The U.K.’s data protection watchdog has closed an almost year-long investigation of Snap’s AI chatbot, My AI — saying it’s satisfied the social media firm has addressed concerns about risks…

UK data protection watchdog ends privacy probe of Snap’s GenAI chatbot, but warns industry

U.S. cell carrier Patriot Mobile experienced a data breach that included subscribers’ personal information, including full names, email addresses, home ZIP codes and account PINs, TechCrunch has learned. Patriot Mobile,…

Conservative cell carrier Patriot Mobile hit by data breach

It’s been three years since Spotify acquired live audio startup Betty Labs, and yet the music streaming service isn’t leveraging the technology to its fullest potential — at least not…

Spotify’s ‘Listening Party’ feature falls short of expectations

Alchemist Accelerator has a new pile of AI-forward companies demoing their wares today, if you care to watch, and the program itself is making some international moves into Tokyo and…

Alchemist’s latest batch puts AI to work as accelerator expands to Tokyo, Doha

“Late Pledge” allows campaign creators to continue collecting money even after the campaign has closed.

Kickstarter now lets you pledge after a campaign closes

Stack AI’s co-founders, Antoni Rosinol and Bernardo Aceituno, were PhD students at MIT wrapping up their degrees in 2022 just as large language models were becoming more mainstream. ChatGPT would…

Stack AI wants to make it easier to build AI-fueled workflows

Pinecone, the vector database startup founded by Edo Liberty, the former head of Amazon’s AI Labs, has long been at the forefront of helping businesses augment large language models (LLMs)…

Pinecone launches its serverless vector database out of preview

Young geothermal energy wells can be like budding prodigies, each brimming with potential to outshine their peers. But like people, most decline with age. In California, for example, the amount…

Special mud helps XGS Energy get more power out of geothermal wells

Featured Article

Sonos finally made some headphones

The market play is clear from the outset: The $449 headphones are firmly targeted at an audience that would otherwise be purchasing the Bose QC Ultra or Apple AirPods Max.

11 hours ago
Sonos finally made some headphones

Adobe says the feature is up to the task, regardless of how complex of a background the object is set against.

Adobe brings Firefly AI-powered Generative Remove to Lightroom