支持高达20s的文生视频，书生·筑梦Vchitect2.0模型分享

最新推荐文章于 2025-05-18 18:00:23 发布

杰说新技术

最新推荐文章于 2025-05-18 18:00:23 发布

阅读量435

点赞数 5

分类专栏： AIGC 文生视频文章标签： AIGC 人工智能

本文链接：https://blog.csdn.net/m0_71062934/article/details/143837935

版权

AIGC 同时被 2 个专栏收录

46 篇文章

订阅专栏

文生视频

6 篇文章

订阅专栏

Vchitect2.0，也称为书生·筑梦2.0，是由上海人工智能实验室推出的一款新一代视频生成大模型。

Vchitect2.0模型集成了文生视频、图生视频、插帧超分、训练系统一体化的功能，支持长达5秒至20秒的视频生成，分辨率可达到720x480。

Vchitect 2.0还支持多种视频格式，包括横屏、竖屏、4:3、9:16和16:9等比例，极大地扩展了其应用场景。

在技术架构方面，Vchitect 2.0采用了扩散式 Transformer网络模型，通过并行结构的 Transformer 模块处理视频的空间和时间信息，包括自注意力、交叉注意力和时间注意力。

此外，Vchitect 2.0 开源了训练和推理框架 LiteGen，针对性地提供了 diffusion 任务所需的各项优化，包括 Activation Offload 与 Sequence Parallel 技术，以优化显存并支持更大序列长度的训练。

github项目地址：https://github.com/Vchitect/Vchitect-2.0。

一、环境安装

1、python环境

建议安装python版本在3.10以上。

2、pip库安装

pip install torch==2.4.0+cu118 torchvision==0.19.0+cu118 torchaudio==2.4.0 --extra-index-url https://download.pytorch.org/whl/cu118

pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

3、Vchitect-XL-2B模型下载：

git lfs install

git clone https://huggingface.co/Vchitect/Vchitect-XL-2B

二、功能测试

1、运行测试：

（1）python代码调用测试

import torch
import random
import numpy as np
import os
import argparse
from models.pipeline import VchitectXLPipeline
from utils import save_as_mp4

def set_random_seed(seed):
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)

def generate_video_from_prompt(pipe, prompt, seed, save_dir, idx):
    set_random_seed(seed)
    
    with torch.cuda.amp.autocast(dtype=torch.bfloat16):
        video = pipe(
            prompt,
            negative_prompt="",
            num_inference_steps=100,
            guidance_scale=7.5,
            width=768,
            height=432,  # 480x288  624x352 432x240 768x432
            frames=40
        )

    os.makedirs(save_dir, exist_ok=True)
    duration = 1000 / 8  # milliseconds per frame
    save_path = os.path.join(save_dir, f"sample_{idx}_seed{seed}.mp4")
    save_as_mp4(video, save_path, duration=duration)

def process_test_file(args):
    pipe = VchitectXLPipeline(args.ckpt_path)

    try:
        with open(args.test_file, 'r') as f:
            for idx, line in enumerate(f.readlines()):
                prompt = line.strip()
                for seed in range(5):
                    generate_video_from_prompt(pipe, prompt, seed, args.save_dir, idx + 1)
    except FileNotFoundError:
        print(f"Test file {args.test_file} not found.")

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--test_file", type=str, required=True, help="Path to the test file.")
    parser.add_argument("--save_dir", type=str, required=True, help="Directory to save the videos.")
    parser.add_argument("--ckpt_path", type=str, required=True, help="Path to the checkpoint file.")

    args = parser.parse_args()
    process_test_file(args)

if __name__ == "__main__":
    main()

未完......

更多详细的欢迎关注：杰哥新技术