Enterprise

This startup is setting a DALL-E 2-like AI free, consequences be damned

Comment

Stable Diffusion
Image Credits: Bryce Durbin / TechCrunch

DALL-E 2, OpenAI’s powerful text-to-image AI system, can create photos in the style of cartoonists, 19th century daguerreotypists, stop-motion animators and more. But it has an important, artificial limitation: a filter that prevents it from creating images depicting public figures and content deemed too toxic.

Now an open source alternative to DALL-E 2 is on the cusp of being released, and it’ll have few — if any — such content filters.

London- and Los Altos-based startup Stability AI this week announced the release of a DALL-E 2-like system, Stable Diffusion, to just over a thousand researchers ahead of a public launch in the coming weeks. A collaboration between Stability AI, media creation company RunwayML, Heidelberg University researchers and the research groups EleutherAI and LAION, Stable Diffusion is designed to run on most high-end consumer hardware, generating 512×512-pixel images in just a few seconds given any text prompt.

Stability AI Stable Diffusion
Stable Diffusion sample outputs. Image Credits: Stability AI

“Stable Diffusion will allow both researchers and soon the public to run this under a range of conditions, democratizing image generation,” Stability AI CEO and founder Emad Mostaque wrote in a blog post. “We look forward to the open ecosystem that will emerge around this and further models to truly explore the boundaries of latent space.”

But Stable Diffusion’s lack of safeguards compared to systems like DALL-E 2 poses tricky ethical questions for the AI community. Even if the results aren’t perfectly convincing yet, making fake images of public figures opens a large can of worms. And making the raw components of the system freely available leaves the door open to bad actors who could train them on subjectively inappropriate content, like pornography and graphic violence.

Creating Stable Diffusion

Stable Diffusion is the brainchild of Mostaque. Having graduated from Oxford with a Masters in mathematics and computer science, Mostaque served as an analyst at various hedge funds before shifting gears to more public-facing works. In 2019, he co-founded Symmitree, a project that aimed to reduce the cost of smartphones and internet access for people living in impoverished communities. And in 2020, Mostaque was the chief architect of Collective & Augmented Intelligence Against COVID-19, an alliance to help policymakers make decisions in the face of the pandemic by leveraging software.

He co-founded Stability AI in 2020, motivated both by a personal fascination with AI and what he characterized as a lack of “organization” within the open source AI community.

Stable Diffusion Obama
An image of former president Barack Obama created by Stable Diffusion. Image Credits: Stability AI

“Nobody has any voting rights except our 75 employees — no billionaires, big funds, governments or anyone else with control of the company or the communities we support. We’re completely independent,” Mostaque told TechCrunch in an email. “We plan to use our compute to accelerate open source, foundational AI.”

Mostaque says that Stability AI funded the creation of LAION 5B, an open source, 250-terabyte dataset containing 5.6 billion images scraped from the internet. (“LAION” stands for Large-scale Artificial Intelligence Open Network, a nonprofit organization with the goal of making AI, datasets and code available to the public.) The company also worked with the LAION group to create a subset of LAION 5B called LAION-Aesthetics, which contains 2 billion AI-filtered images ranked as particularly “beautiful” by testers of Stable Diffusion.

The initial version of Stable Diffusion was based on LAION-400M, the predecessor to LAION 5B, which was known to contain depictions of sex, slurs and harmful stereotypes. LAION-Aesthetics attempts to correct for this, but it’s too early to tell to what extent it’s successful.

Stable Diffusion
A collage of images created by Stable Diffusion. Image Credits: Stability AI

In any case, Stable Diffusion builds on research incubated at OpenAI as well as Runway and Google Brain, one of Google’s AI R&D divisions. The system was trained on text-image pairs from LAION-Aesthetics to learn the associations between written concepts and images, like how the word “bird” can refer not only to bluebirds but parakeets and bald eagles, as well as more abstract notions.

At runtime, Stable Diffusion — like DALL-E 2 — breaks the image generation process down into a process of “diffusion.” It starts with pure noise and refines an image over time, making it incrementally closer to a given text description until there’s no noise left at all.

Boris Johnson Stable Diffusion
Boris Johnson wielding various weapons, generated by Stable Diffusion. Image Credits: Stability AI

Stability AI used a cluster of 4,000 Nvidia A100 GPUs running in AWS to train Stable Diffusion over the course of a month. CompVis, the machine vision and learning research group at Ludwig Maximilian University of Munich, oversaw the training, while Stability AI donated the compute power.

Stable Diffusion can run on graphics cards with around 5GB of VRAM. That’s roughly the capacity of mid-range cards like Nvidia’s GTX 1660, priced around $230. Work is underway on bringing compatibility to AMD MI200’s data center cards and even MacBooks with Apple’s M1 chip (although in the case of the latter, without GPU acceleration, image generation will take as long as a few minutes).

“We have optimized the model, compressing the knowledge of over 100 terabytes of images,” Mosaque said. “Variants of this model will be on smaller datasets, particularly as reinforcement learning with human feedback and other techniques are used to take these general digital brains and make then even smaller and focused.”

Stability AI Stable Diffusion
Samples from Stable Diffusion. Image Credits: Stability AI

For the past few weeks, Stability AI has allowed a limited number of users to query the Stable Diffusion model through its Discord server, slowing increasing the number of maximum queries to stress-test the system. Stability AI says that more than 15,000 testers have used Stable Diffusion to create 2 million images a day.

Far-reaching implications

Stability AI plans to take a dual approach in making Stable Diffusion more widely available. It’ll host the model in the cloud behind tunable filters for specific content, allowing people to continue using it to generate images without having to run the system themselves. In addition, the startup will release what it calls “benchmark” models under a permissive license that can be used for any purpose — commercial or otherwise — as well as compute to train the models.

That will make Stability AI the first to release an image generation model nearly as high-fidelity as DALL-E 2. While other AI-powered image generators have been available for some time, including Midjourney, NightCafe and Pixelz.ai, none have open sourced their frameworks. Others, like Google and Meta, have chosen to keep their technologies under tight wraps, allowing only select users to pilot them for narrow use cases.

Stability AI will make money by training “private” models for customers and acting as a general infrastructure layer, Mostaque said — presumably with a sensitive treatment of intellectual property. The company claims to have other commercializable projects in the works, including AI models for generating audio, music and even video.

Stable Diffusion Harry Potter
Sand sculptures of Harry Potter and Hogwarts, generated by Stable Diffusion. Image Credits: Stability AI

“We will provide more details of our sustainable business model soon with our official launch, but it is basically the commercial open source software playbook: services and scale infrastructure,” Mostaque said. “We think AI will go the way of servers and databases, with open beating proprietary systems — particularly given the passion of our communities.”

With the hosted version of Stable Diffusion — the one available through Stability AI’s Discord server — Stability AI doesn’t permit every kind of image generation. The startup’s terms of service ban some lewd or sexual material (although not scantily-clad figures), hateful or violent imagery (such as antisemitic iconography, racist caricatures, misogynistic and misandrist propaganda), prompts containing copyrighted or trademarked material, and personal information like phone numbers and Social Security numbers. But while Stability AI has implemented a keyword filter in the server similar to OpenAI’s, which prevents the model from even attempting to generate an image that might violate the usage policy, it appears to be more permissive than most.

(A previous version of this article implied that Stability AI wasn’t using a keyword filter. That’s not the case; TechCrunch regrets the error.)

Stable Diffusion women
A Stable Diffusion generation, given the prompt: “very sexy woman with black hair, pale skin, in bikini, wet hair, sitting on the beach.” Image Credits: Stability AI

Stability AI also doesn’t have a policy against images with public figures. That presumably makes deepfakes fair game (and Renaissance-style paintings of famous rappers), though the model struggles with faces at times, introducing odd artifacts that a skilled Photoshop artist rarely would.

“Our benchmark models that we release are based on general web crawls and are designed to represent the collective imagery of humanity compressed into files a few gigabytes big,” Mostaque said. “Aside from illegal content, there is minimal filtering, and it is on the user to use it as they will.”

Stable Diffusion Hitler
An image of Hitler generated by Stable Diffusion. Image Credits: Stability AI

Potentially more problematic are the soon-to-be-released tools for creating custom and fine-tuned Stable Diffusion models. An “AI furry porn generator” profiled by Vice offers a preview of what might come; an art student going by the name of CuteBlack trained an image generator to churn out illustrations of anthropomorphic animal genitalia by scraping artwork from furry fandom sites. The possibilities don’t stop at pornography. In theory, a malicious actor could fine-tune Stable Diffusion on images of riots and gore, for instance, or propaganda.

Already, testers in Stability AI’s Discord server are using Stable Diffusion to generate a range of content disallowed by other image generation services, including images of the war in Ukraine, nude women, an imagined Chinese invasion of Taiwan and controversial depictions of religious figures like the Prophet Muhammad. Doubtless, some of these images are against Stability AI’s own terms, but the company is currently relying on the community to flag violations. Many bear the telltale signs of an algorithmic creation, like disproportionate limbs and an incongruous mix of art styles. But others are passable on first glance. And the tech will continue to improve, presumably.

Nude women Stability AI
Nude women generated by Stable Diffusion. Image Credits: Stability AI

Mostaque acknowledged that the tools could be used by bad actors to create “really nasty stuff,” and CompVis says that the public release of the benchmark Stable Diffusion model will “incorporate ethical considerations.” But Mostaque argues that — by making the tools freely available — it allows the community to develop countermeasures.

“We hope to be the catalyst to coordinate global open source AI, both independent and academic, to build vital infrastructure, models and tools to maximize our collective potential,” Mostaque said. “This is amazing technology that can transform humanity for the better and should be open infrastructure for all.”

Stability AI terrorist
A generation from Stable Diffusion, given the prompt “9/11 2.0 September 11th 2022 terrorist attack.”

Not everyone agrees, as evidenced by the controversy over “GPT-4chan,” an AI model trained on one of 4chan’s infamously toxic discussion boards. AI researcher Yannic Kilcher made GPT-4chan — which learned to output racist, antisemitic and misogynist hate speech — available earlier this year on Hugging Face, a hub for sharing trained AI models. Following discussions on social media and Hugging Face’s comment section, the Hugging Face team first “gated” access to the model before removing it altogether, but not before it was downloaded more than a thousand times.

War in Ukraine Stability AI
“War in Ukraine” images generated by Stable Diffusion. Image Credits: Stability AI

Meta’s recent chatbot fiasco illustrates the challenge of keeping even ostensibly safe models from going off the rails. Just days after making its most advanced AI chatbot to date, BlenderBot 3, available on the web, Meta was forced to confront media reports that the bot made frequent antisemitic comments and repeated false claims about former U.S. President Donald Trump winning reelection two years ago.

The publisher of AI Dungeon, Latitude, encountered a similar content problem. Some players of the text-based adventure game, which is powered by OpenAI’s text-generating GPT-3 system, observed that it would sometimes bring up extreme sexual themes, including pedophelia — the result of fine-tuning on fiction stories with gratuitous sex. Facing pressure from OpenAI, Latitude implemented a filter and started automatically banning gamers for purposefully prompting content that wasn’t allowed.

BlenderBot 3’s toxicity came from biases in the public websites that were used to train it. It’s a well-known problem in AI — even when fed filtered training data, models tend to amplify biases like photo sets that portray men as executives and women as assistants. With DALL-E 2, OpenAI has attempted to combat this by implementing techniques, including dataset filtering, that help the model generate more “diverse” images. But some users claim that they’ve made the model less accurate than before at creating images based on certain prompts.

Stable Diffusion contains little in the way of mitigations besides training dataset filtering. So what’s to prevent someone from generating, say, photorealistic images of protests, pornographic pictures of underage actors, “evidence” of fake moon landings and general misinformation? Nothing really. But Mostaque says that’s the point.

Stable Diffusion protest
Given the prompt “protests against the dilma government, brazil [sic],” Stable Diffusion created this image. Image Credits: Stability AI
“A percentage of people are simply unpleasant and weird, but that’s humanity,” Mostaque said. “Indeed, it is our belief this technology will be prevalent, and the paternalistic and somewhat condescending attitude of many AI aficionados is misguided in not trusting society … We are taking significant safety measures including formulating cutting-edge tools to help mitigate potential harms across release and our own services. With hundreds of thousands developing on this model, we are confident the net benefit will be immensely positive and as billions use this tech harms will be negated.”

Note: While the images in this article are credited to Stability AI, the company’s terms make it clear that generated images belong to the users who prompted them. In other words, Stability AI doesn’t assert rights over images created by Stable Diffusion.

More TechCrunch

The French Secretary of State for the Digital Economy as of this year, Marina Ferrari, revealed this year’s laureates during VivaTech week in Paris. According to its promoters, this fifth…

The biggest French startups in 2024 according to the French government

Spotify is notifying customers who purchased its Car Thing product that the devices will stop working after December 9, 2024. The company discontinued the device back in July 2022, but…

Spotify to shut off Car Thing for good, leading users to demand refunds

Elon Musk’s X is preparing to make “likes” private on the social network, in a change that could potentially confuse users over the difference between something they’ve favorited and something…

X should bring back stars, not hide ‘likes’

The FCC has proposed a $6 million fine for the scammer who used voice-cloning tech to impersonate President Biden in a series of illegal robocalls during a New Hampshire primary…

$6M fine for robocaller who used AI to clone Biden’s voice

Welcome back to TechCrunch Mobility — your central hub for news and insights on the future of transportation. Sign up here for free — just click TechCrunch Mobility! Is it…

Tesla lobbies for Elon and Kia taps into the GenAI hype

Crowdaa is an app that allows non-developers to easily create and release apps on the mobile store. 

App developer Crowdaa raises €1.2M and plans a US expansion

Back in 2019, Canva, the wildly successful design tool, introduced what the company was calling an enterprise product, but in reality it was more geared toward teams than fulfilling true…

Canva launches a proper enterprise product — and they mean it this time

TechCrunch Disrupt 2024 isn’t just an event for innovation; it’s a platform where your voice matters. With the Disrupt 2024 Audience Choice Program, you have the power to shape the…

2 days left to vote for Disrupt Audience Choice

The United States Department of Justice and 30 state attorneys general filed a lawsuit against Live Nation Entertainment, the parent company of Ticketmaster, for alleged monopolistic practices. Live Nation and…

Ticketmaster antitrust lawsuit could give new hope to ticketing startups

The U.K. will shortly get its own rulebook for Big Tech, after peers in the House of Lords agreed Thursday afternoon to pass the Digital Markets, Competition and Consumer bill…

‘Pro-competition’ rules for Big Tech make it through UK’s pre-election wash-up

Spotify’s addition of its AI DJ feature, which introduces personalized song selections to users, was the company’s first step into an AI future. Now, Spotify is developing an alternative version…

Spotify experiments with an AI DJ that speaks Spanish

Call Arc can help answer immediate and small questions, according to the company. 

Arc Search’s new Call Arc feature lets you ask questions by ‘making a phone call’

After multiple delays, Apple and the Paris area transportation authority rolled out support for Paris transit passes in Apple Wallet. It means that people can now use their iPhone or…

Paris transit passes now available in iPhone’s Wallet app

Redwood Materials, the battery recycling startup founded by former Tesla co-founder JB Straubel, will be recycling production scrap for batteries going into General Motors electric vehicles.  The company announced Thursday…

Redwood Materials is partnering with Ultium Cells to recycle GM’s EV battery scrap

A new startup called Auggie is aiming to give parents a single platform where they can shop for products and connect with each other. The company’s new app, which launched…

Auggie’s new app helps parents find community and shop

Andrej Safundzic, Alan Flores Lopez and Leo Mehr met in a class at Stanford focusing on ethics, public policy and technological change. Safundzic — speaking to TechCrunch — says that…

Lumos helps companies manage their employees’ identities — and access

Remark trains AI models on human product experts to create personas that can answer questions with the same style of their human counterparts.

Remark puts thousands of human product experts into AI form

ZeroPoint claims to have solved compression problems with hyper-fast, low-level memory compression that requires no real changes to the rest of the computing system.

ZeroPoint’s nanosecond-scale memory compression could tame power-hungry AI infrastructure

In 2021, Roi Ravhon, Asaf Liveanu and Yizhar Gilboa came together to found Finout, an enterprise-focused toolset to help manage and optimize cloud costs. (We covered the company’s launch out…

Finout lands cash to grow its cloud spend management platform

On the heels of raising $102 million earlier this year, Bugcrowd is making good on its promise to use some of that funding to make acquisitions to strengthen its security…

Bugcrowd, the crowdsourced white-hat hacker platform, acquires Informer to ramp up its security chops

Google is preparing to build what will be the first subsea fiber-optic cable connecting the continents of Africa and Australia. The news comes as the major cloud hyperscalers battle it…

Google to build first subsea fiber-optic cable connecting Africa with Australia

The Kia EV3 — the new all-electric compact SUV revealed Thursday — illustrates a growing appetite among global automakers to bring generative AI into their vehicles.  The automaker said the…

The new Kia EV3 will have an AI assistant with ChatGPT DNA

Bing, Microsoft’s search engine, was working improperly for several hours on Thursday in Europe. At first, we noticed it wasn’t possible to perform a web search at all. Now it…

Bing’s API was down, taking Microsoft Copilot, DuckDuckGo and ChatGPT’s web search feature down too

If you thought autonomous driving was just for cars, think again. The “autonomous navigation” market — where ships steer themselves guided by AI, resulting in fuel and time savings —…

Autonomous shipping startup Orca AI tops up with $23M led by OCV Partners and MizMaa Ventures

The best known mycoprotein is probably Quorn, a meat substitute that’s fast approaching its 40th birthday. But Finnish biotech startup Enifer is cooking up something even older: Its proprietary single-cell…

Meet the Finnish biotech startup bringing a long-lost mycoprotein to your plate

Silo, a Bay Area food supply chain startup, has hit a rough patch. TechCrunch has learned that the company on Tuesday laid off roughly 30% of its staff, or north…

Food supply chain software maker Silo lays off ~30% of staff amid M&A discussions

Featured Article

Meta’s new AI council is composed entirely of white men

Meanwhile, women and people of color are disproportionately impacted by irresponsible AI.

1 day ago
Meta’s new AI council is composed entirely of white men

If you’ve ever wanted to apply to Y Combinator, here’s some inside scoop on how the iconic accelerator goes about choosing companies.

Garry Tan has revealed his ‘secret sauce’ for getting into Y Combinator

Indian ride-hailing startup BluSmart has started operating in Dubai, TechCrunch has exclusively learned and confirmed with its executive. The move to Dubai, which has been rumored for months, could help…

India’s BluSmart is testing its ride-hailing service in Dubai