I2VGen-XL 是一种扩散模型,可以生成比 SVD 更高分辨率的视频,除了图像之外,它还能够接受文本提示。该模型使用两个分层编码器(细节和全局编码器)进行训练,以更好地捕获图像中的低级和高级细节。这些学习到的细节用于训练视频扩散模型,该模型优化了生成视频中的视频分辨率和细节。下面通过加载 [I2VGenXLPipeline] 并传递文本和图像提示来生成视频来使用 I2VGen-XL。
import os os.environ["HF_ENDPOINT"] = "https://hf-mirror.com" mport torch from diffusers import I2VGenXLPipeline from diffusers.utils import export_to_gif, load_image pipeline = I2VGenXLPipeline.from_pretrained("ali-vilab/i2vgen-xl", torch_dtype=torch.float16, variant="fp16") pipeline.enable_model_cpu_offload() image_url = "https://hf-mirror.com/datasets/diffusers/docs-images/resolve/main/i2vgen_xl_images/img_0009.png" image = load_image(image_url).convert("RGB") prompt = "Papers were floating in the air on a table in the library" negative_prompt = "Distorted, discontinuous, Ugly, blurry, low resolution, motionless, static, disfigured, disconnected limbs, Ugly faces, incomplete arms" generator = torch.manual_seed(8888) frames = pipeline( prompt=prompt, image=image, num_inference_steps=30, negative_prompt=negative_prompt, guidance_scale=9.0, generator=generator ).frames[0]
export_to_gif(frames, "i2v.gif")
以下为原始图像
以下为生成的动画