AI文本生成视频工具CogVideoX部署到华为昇腾NPU的详细步骤

p_xcn

已于 2024-11-28 11:14:10 修改

阅读量1k

点赞数 11

分类专栏：开源 for Huawei 文章标签：人工智能

于 2024-11-28 10:26:13 首次发布

本文链接：https://blog.csdn.net/qq_54958500/article/details/143732793

版权

开源 for Huawei 专栏收录该内容

11 篇文章

订阅专栏

CogVideoX是智谱AI开发的视频生成大模型。无需复杂的视频制作技能和工具，能够将文本描述或静态图片转化为高质量、具有视觉吸引力的动态视频。

https://github.com/THUDM/CogVideo

一、部署到昇腾NPU

昇腾环境：

芯片类型：昇腾910B3
CANN版本：CANN 7.0.1.5
驱动版本：23.0.6
操作系统：Huawei Cloud EulerOS 2.0

1.环境搭建

conda创建python3.10环境。

conda create --name cogvideo python=3.10

从 GitHub 拉取代码。

git clone https://github.com/THUDM/CogVideo.git
cd CogVideo

安装依赖。

pip install -r requirements.txt 
# 安装对应pytorch版本的torch_npu
pip install torch_npu

新建测试verify_npu.py，验证NPU环境：

import torch
import torch_npu
 
# 检查 NPU 是否可用
if torch.npu.is_available():
    print("NPU is available.")
    device_id = 0
    torch.npu.set_device(device_id)

    device = 'npu:' + str(device_id)
    tensor = torch.tensor([1.0, 2.0, 3.0], device=device)
    print(tensor)
else:
    print("NPU is not available.")

输出NPU is available，NPU 环境正常。

2.下载模型

模型较大，需预留45G空间。

选择使用Git下载模型，需要先安装git-lfs。

wget https://github.com/git-lfs/git-lfs/releases/download/v3.5.1/git-lfs-linux-arm64-v3.5.1.tar.gz
tar xvf git-lfs-linux-arm64-v3.5.1.tar.gz
cd git-lfs-3.5.1
sudo ./install.sh

验证git-lfs安装：

git lfs version

下载模型：

mkdir THUDM
cd THUDM
git lfs install
git clone https://www.modelscope.cn/ZhipuAI/CogVideoX-5b.git
cd CogVideoX-5b
git lfs pull

3.在npu上运行代码

新建run_t2v_npu.py:

import torch
import torch_npu
from diffusers import CogVideoXPipeline
from diffusers.utils import export_to_video

# 检查是否有可用的 GPU
if torch.npu.is_available():
    print("NPU is available.")
    device = torch.device('npu')

    torch_dtype = torch.bfloat16  # 推荐使用 BF16
    print(f"Using data type: {torch_dtype}")

    # 创建一个随机数生成器
    generator = torch.Generator(device=device).manual_seed(42)
    print("Generator created")

    prompt = "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical atmosphere of this unique musical performance."
    print("Prompt set")

    pipe = CogVideoXPipeline.from_pretrained(
        "THUDM/CogVideoX-5b",
        torch_dtype=torch_dtype
    )
    print("Pipeline loaded")

    # 确保模型和数据都在选定的设备上
    pipe.to(device)
    print("Model moved to device")

    video = pipe(
        prompt=prompt,
        num_videos_per_prompt=1,
        num_inference_steps=50,
        num_frames=49,
        guidance_scale=6,
        generator=generator,
    ).frames[0]
    print("Video generated")

    export_to_video(video, "output.mp4", fps=8)
    print("Video exported")

else:
    print("NPU is not available.")

运行报错：

(cogvideo) [root@devserver-314b-1 CogVideo]# python run_t2v_npu.py
Using device: npu
Using data type: torch.bfloat16
Generator created
Prompt set
Loading checkpoint shards: 100%|████████████████████████████████████████████████████| 2/2 [00:00<00:00,  4.63it/s]
Loading pipeline components...: 100%|███████████████████████████████████████████████| 5/5 [00:01<00:00,  2.52it/s]
Pipeline loaded
Model moved to device
  0%|                                                                                      | 0/50 [00:00<?, ?it/s][W1114 14:42:44.058506898 compiler_depend.ts:387] Warning: EH9999: Inner Error!
EH9999  [Init][Env]init env failed![FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]
        TraceBack (most recent call last):
        build op model failed, result = 500001[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]
 (function ExecFunc)
  0%|                                                                                      | 0/50 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/dev/shm/wangshijing/CogVideo/run_t2v_npu.py", line 31, in <module>
    video = pipe(prompt=prompt,num_videos_per_prompt=1,num_inference_steps=50,num_frames=49,guidance_scale=6,generator=generator,).frames[0]
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/diffusers/pipelines/cogvideo/pipeline_cogvideox.py", line 684, in __call__
    noise_pred = self.transformer(
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/diffusers/models/transformers/cogvideox_transformer_3d.py", line 473, in forward
    hidden_states, encoder_hidden_states = block(
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/diffusers/models/transformers/cogvideox_transformer_3d.py", line 132, in forward
    attn_hidden_states, attn_encoder_hidden_states = self.attn1(
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 495, in forward
    return self.processor(
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 1928, in __call__
    value = attn.to_v(hidden_states)
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 125, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: The Inner error is reported as above. The process exits for this inner error, and the current working operator name is Conv2D.
Since the operator is called asynchronously, the stacktrace may be inaccurate. If you want to get the accurate stacktrace, pleace set the environment variable ASCEND_LAUNCH_BLOCKING=1.
[ERROR] 2024-11-26-14:42:44 (PID:1154952, Device:0, RankID:-1) ERR00100 PTA call acl api failed
[W1114 14:42:44.066216328 compiler_depend.ts:659] Warning: 0Failed to find function aclrtSynchronizeDeviceWithTimeout (function operator())

设置环境变量获取更准确的堆栈跟踪信息：

export ASCEND_LAUNCH_BLOCKING=1

再次运行报错变为：

(cogvideo) [root@devserver-314b-1 CogVideo]# python run_t2v_npu.py
Using device: npu
Using data type: torch.bfloat16
Generator created
Prompt set
Loading checkpoint shards: 100%|████████████████████████████████████████████████████| 2/2 [00:00<00:00,  4.85it/s]
Loading pipeline components...: 100%|███████████████████████████████████████████████| 5/5 [00:01<00:00,  2.51it/s]
Pipeline loaded
Model moved to device
  0%|                                                                                      | 0/50 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/dev/shm/wangshijing/CogVideo/run_t2v_npu.py", line 31, in <module>
    video = pipe(prompt=prompt,num_videos_per_prompt=1,num_inference_steps=50,num_frames=49,guidance_scale=6,generator=generator,).frames[0]
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/diffusers/pipelines/cogvideo/pipeline_cogvideox.py", line 684, in __call__
    noise_pred = self.transformer(
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/diffusers/models/transformers/cogvideox_transformer_3d.py", line 446, in forward
    hidden_states = self.patch_embed(encoder_hidden_states, hidden_states)
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/diffusers/models/embeddings.py", line 412, in forward
    image_embeds = self.proj(image_embeds)
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 554, in forward                                                                                                                 
    return self._conv_forward(input, self.weight, self.bias)
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 549, in _conv_forward
    return F.conv2d(
RuntimeError: InnerRun:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:218 OPS function error: Conv2D, error code is 500001
[ERROR] 2024-11-26-14:45:31 (PID:1155878, Device:0, RankID:-1) ERR01100 OPS call acl api failed
[Error]: The internal ACL of the system is incorrect.
        Rectify the fault based on the error information in the ascend log.
EC0010: Failed to import Python module [ModuleNotFoundError: No module named 'tbe'.].
        Solution: Check that all required components are properly installed and the specified Python path matches the Python installation directory. (If the path does not match the directory, run set_env.sh in the installation package.)
        TraceBack (most recent call last):
        [GraphOpt][InitializeInner][InitTbeFunc] Failed to init tbe.[FUNC:InitializeInner][FILE:tbe_op_store_adapter.cc][LINE:1623]
        [SubGraphOpt][PreCompileOp][InitAdapter] InitializeAdapter adapter [tbe_op_adapter] failed! Ret [4294967295][FUNC:InitializeAdapter][FILE:op_store_adapter_manager.cc][LINE:85]
        [SubGraphOpt][PreCompileOp][Init] Initialize op store adapter failed, OpsStoreName[tbe-custom].[FUNC:Initialize][FILE:op_store_adapter_manager.cc][LINE:126]
        [FusionMngr][Init] Op store adapter manager init failed.[FUNC:Initialize][FILE:fusion_manager.cc][LINE:124]                                                                                                                 
        PluginManager InvokeAll failed.[FUNC:Initialize][FILE:ops_kernel_manager.cc][LINE:96]
        OpsManager initialize failed.[FUNC:InnerInitialize][FILE:gelib.cc][LINE:237]
        GELib::InnerInitialize failed.[FUNC:Initialize][FILE:gelib.cc][LINE:165]
        [Initialize][Ge]GEInitialize failed. ge result = 4294967295[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        [Init][Compiler]Init compiler failed[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]
        [Set][Options]OpCompileProcessor init failed![FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]
        [Init][Env]init env failed![FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]
        build op model failed, result = 500001[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]
                                                                                                                  
[W1114 14:45:31.433200230 compiler_depend.ts:659] Warning: 0Failed to find function aclrtSynchronizeDeviceWithTimeout (function operator())

根据错误信息，tbe 模块未找到。尝试运行环境设置脚本。

source /usr/local/Ascend/ascend-toolkit/set_env.sh

再次运行报错变为：

(cogvideo) [root@devserver-314b-1 CogVideo]# python run_t2v_npu.py
Using device: npu
Using data type: torch.bfloat16
Generator created
Prompt set
Loading checkpoint shards: 100%|████████████████████████████████████████████████████| 2/2 [00:00<00:00,  4.74it/s]
Loading pipeline components...: 100%|███████████████████████████████████████████████| 5/5 [00:01<00:00,  2.51it/s]
Pipeline loaded
Model moved to device
  0%|                                                                                      | 0/50 [00:00<?, ?it/s]SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats
  0%|                                                                                      | 0/50 [00:08<?, ?it/s]
Traceback (most recent call last):
  File "/dev/shm/wangshijing/CogVideo/run_t2v_npu.py", line 31, in <module>
    video = pipe(prompt=prompt,num_videos_per_prompt=1,num_inference_steps=50,num_frames=49,guidance_scale=6,generator=generator,).frames[0]
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/diffusers/pipelines/cogvideo/pipeline_cogvideox.py", line 684, in __call__
    noise_pred = self.transformer(
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/diffusers/models/transformers/cogvideox_transformer_3d.py", line 473, in forward
    hidden_states, encoder_hidden_states = block(
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/diffusers/models/transformers/cogvideox_transformer_3d.py", line 132, in forward
    attn_hidden_states, attn_encoder_hidden_states = self.attn1(
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 495, in forward
    return self.processor(
  File "/root/miniconda3/envs/cogvideo/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 1950, in __call__
    hidden_states = F.scaled_dot_product_attention(
RuntimeError: aclnnFusedInferAttentionScoreV2 or aclnnFusedInferAttentionScoreV2GetWorkspaceSize not in libopapi.so, or libopapi.sonot found.
[ERROR] 2024-11-26-14:55:15 (PID:1158036, Device:0, RankID:-1) ERR01004 OPS invalid pointer
[W1114 14:55:15.526441221 compiler_depend.ts:659] Warning: 0Failed to find function aclrtSynchronizeDeviceWithTimeout (function operator())

F.scaled_dot_product_attention 操作时发生的内部错误， aclnnFusedInferAttentionScoreV2 或 aclnnFusedInferAttentionScoreV2GetWorkspaceSize 函数未在 libopapi.so 中找到。

查看libopapi.so 的依赖库是否正确加载：

ldd /usr/local/Ascend/ascend-toolkit/7.0.1.5/aarch64-linux/lib64/lib opapi.so

检查 libopapi.so 的符号表：

nm -gC /usr/local/Ascend/ascend-toolkit/7.0.1.5/aarch64-linux/lib64/libopapi.so | grep aclnnFusedInferAttentionScoreV2

显示函数不存在：

nm: /usr/local/Ascend/ascend-toolkit/7.0.1.5/aarch64-linux/lib64/libopapi.so: no symbols

通过Issues提问得知该算子库当前只支持torch 2.1(稳定版本) 和 torch 2.3(beta版)。改为安装torch 2.1版本依赖。

pip install torch==2.1.0 torch_npu==2.1.0 torchvision==0.16.0

运行后显示内存不足。

改成使用NPU设备1，减少生成视频的帧数，降低模型精度：

import torch
import torch_npu
from diffusers import CogVideoXPipeline
from diffusers.utils import export_to_video
                                                                                                                 
# 检查是否有可用的 GPU
if torch.npu.is_available():
    print("NPU is available.")
    device_id = 1
    torch.npu.set_device(device_id)
                                                                                                                 
    # 创建一个 NPU 张量
    device = torch.device('npu:1')  # 指定使用 npu:1
                                                                                                                 
    # torch_dtype = torch.bfloat16  # 推荐使用 BF16
    torch_dtype = torch.float16  # 降低精度
    print(f"Using data type: {torch_dtype}")
                                                                                                                 
    # 创建一个随机数生成器
    generator = torch.Generator(device=device).manual_seed(42)
    print("Generator created")                                                                                                                 
    prompt = "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo f
orest. The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few ot
her pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, cast
ing a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. The ba
ckground includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical atmospher
e of this unique musical performance."
    print("Prompt set")
                                                                                                                 
    pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b",torch_dtype=torch_dtype)
    print("Pipeline loaded")
                                                                                                                 
    # 确保模型和数据都在选定的设备上
    pipe.to(device)
    print("Model moved to device")
    print(f"Current NPU device: {torch.npu.current_device()}")
                                                                                                                 
    video = pipe(prompt=prompt,num_videos_per_prompt=1,num_inference_steps=50,num_frames=24,guidance_scale=6,gene
rator=generator,).frames[0]
    print("Video generated")
                                                                                                                 
    export_to_video(video, "output.mp4", fps=8)
    print("Video exported")
                                                                                                                 
else:
    print("NPU is not available.")

模型运行过程中出现警告：

SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats

通常是由于使用了不兼容的 Python 版本或编译选项导致的，新建python 3.8的环境即可。

生成的视频：