Stability AI, the company behind the AI image generator Stable Diffusion, has previewed an entirely new text-to-image model called Stable Cascade.
Announced on Monday, the new model is built on something called Würstchen architecture which “combines competitive performance with unprecedented cost-effectiveness for large-scale text-to-image diffusion models.”
Stable Diffusion is well-known for its ability to make bespoke AI image generator models because of its open-source approach; users can download the software and run it from their own computers offline. The company adds that, “Stable Cascade is exceptionally easy to train and finetune on consumer hardware”.
“Today we are launching Stable Cascade in research preview,” Stability AI writes.
“This innovative text-to-image model introduces an interesting three-stage approach, setting new benchmarks for quality, flexibility, fine-tuning, and efficiency with a focus on further eliminating hardware barriers.
“Additionally, we are releasing training and inference code that can be found on the Stability GitHub page to allow further customization of the model and its outputs. The model is available for inference in the diffusers library.”
While Stable Diffusion uses a singular model, Stable Cascade is a pipeleine of three smaller models which are called Stages A, B, and C. Venture Beat notes that this modular architecture allows for efficiency and customization.
Stage C transforms the text prompt into a tiny 24×24 pixel latent image that is then passed along to stages A and B which are latent decoders that transform it into a high-resolution image.
“By decoupling the text-conditional generation (Stage C) from the decoding to the high-resolution pixel space (Stage A & B), we can allow additional training or finetunes, including ControlNets and LoRAs to be completed singularly on Stage C,” Stability AI writes.
This model has a 16x cost reduction compared to training a similar-sized Stable Diffusion model, according to the company.
An unofficial demo of Stable Cascade has been released. The model is currently under a “non-commercial license that permits non-commercial use only.”
Stability AI continues to release iterations of its text-to-image models; last June it released Stable Diffusion XL 1.0 which at the time was its most advanced model and in November it released SDXL Turbo which allows users to generate AI images in real-time.