Stable Diffusion 3终于开源了,2B参数的Stable Diffusion 3 Medium模型已经可以在HuggingFace上下载了:
文末获取
Stable Diffusion 3 Medium的参数量为2B,大小适中,它非常适合在消费级个人电脑和笔记本电脑上运行,同时也适用于企业级GPU。Stable Diffusion 3采用Multimodal Diffusion Transformer (MMDiT),文本编码器包括OpenCLIP-ViT/G,CLIP-ViT/L以及T5-xxl。
小小将:文生图之SD3:迈向transformer时代269 赞同 · 29 评论文章
本次发布的SD3 Medium大致训练数据集也公布了。首先是在大约10亿图文对上预训练,然后采用高质量数据进行微调,高质量数据包括30M专注于特定视觉内容和风格的高质量审美图像,以及3M偏好数据图像。
SD3 Medium的优势如下所示:
- 照片级真实感:克服了手部和面部常见的伪影问题,无需复杂的工作流程即可提供高质量的图像。
- 提示词遵循性:理解涉及空间关系、构图元素、动作和风格的复杂提示。
- 文字能力:借助Diffusion Transformer架构,在生成无伪影和拼写错误的文本方面取得了前所未有的成果。
- 资源高效:由于其低VRAM占用,非常适合在标准消费级GPU上运行,且不降低性能。
- 微调:能够从小数据集中吸收细微的细节,使其非常适合定制化。
另外,StabilityAI与NVIDIA合作,利用TensorRT来提升所有Stable Diffusion模型的性能,包括Stable Diffusion 3 Medium。经过TensorRT优化的版本,性能提升了50%,Stable Diffusion 3 Medium的TensorRT优化版本也已经发布:stabilityai/stable-diffusion-3-medium-tensorrt · Hugging Face。在A100上,生成1024x1024的图片(50步)只需要大约5.6s。而且如果采用TensorRT int8,推理速度还可以再提升1.2x~1.4x,显存节约2x。
而且,AMD也已经针对包括AMD最新的APU、消费级GPU和MI-300X企业级GPU在内的各种AMD设备,对SD3 Medium的推理进行了优化。
除了模型之外,StabilityAI还直接放出了SD3的ComfyUI工作流:
实测comfyui上12GB显存就可以跑了:
简单测试一下例子:
A closeup shot of a beautiful teenage girl in a white dress wearing small silver earrings in the garden, under the soft morning light
人物写真还行
A realistic standup pouch product photo mockup decorated with bananas, raisins and apples with the words “ORGANIC SNACKS” featured prominently
文字OK
Wide angle shot of Český Krumlov Castle with the castle in the foreground and the town sprawling out in the background, highly detailed, natural lighting
这个建筑质量一般了
A magazine quality shot of a delicious salmon steak, with rosemary and tomatoes, and a cozy atmosphere
中等,就是不够真
A pixar style illustration of a happy hedgehog, standing beside a wooden signboard saying “SUNFLOWERS”, in a meadow surrounded by blooming sunflowers
还行
A very simple, clean and minimalistic kid’s coloring book page of a young boy riding a bicycle, with thick lines, and small a house in the background
可以
an old apothecary. On the counter there are three old potions: a blue potion with the handwritten label “Mana” a green potion with the label “Health”, a red potion with the label “Poison”
这个难度很高,但是对了
photo of three people, a wizard holding the sign with the text “Magic”, a witch with the sign “Hex” and another alchemist with the sign with the text “potions”. Indoor scene, colored signs
这个差一点
A horse riding an astronaut.
马还是骑不了人
A stack of 3 plates. A blue plate is on the top, sitting on a blue plate. The blue plate is in the middle, sitting on a green plate. The green plate is on the bottom.
这个差不多对
A wine glass on top of a dog.
位置关系表现比较稳
Elon Muskis swimming
名人丢了
总结,SD3的能力确实强,但是图像质量相比Midjourney还是差一点。
而且目前SD3已经在diffusers中支持了:Diffusers welcomes Stable Diffusion 3。
import torch
from diffusers import StableDiffusion3Pipeline
pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3-medium-diffusers", torch_dtype=torch.float16)
pipe = pipe.to("cuda")
image = pipe(
"A cat holding a sign that says hello world",
negative_prompt="",
num_inference_steps=28,
guidance_scale=7.0,
).images[0]
image
你也可以在HuggingFace上在线体验(Stable Diffusion 3 Medium - a Hugging Face Space by stabilityai):