Member-only story

Stable Diffusion 3 Is Here — Is it that impresive?

4 min readMar 7, 2024

The excitement surrounding AI’s most significant week has yet to diminish. Shortly after OpenAI unveiled Sora, capable of creating stunning videos, and Google introduced Gemini 1.5 with its impressive 1.5 million token context window capability, Stability AI has now offered a sneak peek at Stable Diffusion 3.

Introducing Stable Diffusion 3

Stable Diffusion 3, the newest and most advanced text-to-image model from Stability AI, demonstrates remarkable enhancements in managing prompts with multiple subjects, the quality of the images produced, and text rendering capabilities.

This range of models varies from 800 million to 8 billion parameters, merging a diffusion transformer structure (akin to Sora) with flow matching.

Diffusion Transformer Architecture

The Diffusion Transformer (DiT) architecture marks a new wave in diffusion models, integrating transformer technology. Deviating from the standard diffusion models which typically utilize convolutional U-Net backbones, DiTs leverage transformers to interact with latent image patches.

Stable Diffusion 3 Is Here — Is it that impresive?

Introducing Stable Diffusion 3

Written by Daniel García

No responses yet