MLU370-M8 Cogvlm部署手册

小军军军军军军

已于 2024-03-11 16:40:42 修改

阅读量753

点赞数 9

分类专栏：大模型寒武纪文章标签： python 人工智能深度学习 pytorch

于 2024-03-04 18:53:26 首次发布

本文链接：https://blog.csdn.net/xiaojunjun200211/article/details/136459208

版权

寒武纪同时被 2 个专栏收录

8 篇文章 5 订阅

订阅专栏

大模型

7 篇文章 1 订阅

订阅专栏

MLU370-M8 Cogvlm部署手册

一、环境选择
二、环境安装
三.模型下载
四.代码修改
- 修改cogvlm-chat代码
5.运行效果

一、环境选择

在这里插入图片描述

二、环境安装

transformers

git clone -b v4.38.0 https://githubfast.com/huggingface/transformers.git
python /torch/src/catch/tools/torch_gpu2mlu/torch_gpu2mlu.py -i transformers/
pip install -e ./transformers_mlu/

accelerate

git clone -b v0.27.2 https://githubfast.com/huggingface/accelerate.git
python /torch/src/catch/tools/torch_gpu2mlu/torch_gpu2mlu.py -i accelerate/
pip install -e ./accelerate_mlu/

常规安装

pip install 即可

modelscope
SwissArmyTransformer>=0.4.9
# transformers>=4.36.2
# xformers>=0.0.22
# torch>=2.1.0
# torchvision>=0.16.2
spacy>=3.6.0
pillow>=10.2.0
# deepspeed>=0.13.1
seaborn>=0.13.2
loguru~=0.7.2
streamlit>=1.31.0
timm>=0.9.12
# accelerate>=0.26.1
pydantic>=2.6.0

三.模型下载

#模型下载
from modelscope import snapshot_download
model_dir = snapshot_download('ZhipuAI/cogvlm-chat')
model_dir = snapshot_download('AI-ModelScope/vicuna-7b-v1.5')

将下载后的模型拷贝到存储卷中，方便后续改绝对路径使用

四.代码修改

下载社区github代码

https://github.com/THUDM/CogVLM.git
#转换算子
python /torch/src/catch/tools/torch_gpu2mlu/torch_gpu2mlu.py -i CogVLM/
cd CogVLM_mlu/

修改basic_demo/cli_demo_hf.py代码【将default改成自己的绝对路径】

parser.add_argument("--from_pretrained", type=str, default="/workspace/volume/shixisheng/wmy/models/cogagent-chat", help='pretrained ckpt')
parser.add_argument("--local_tokenizer", type=str, default="/workspace/volume/shixisheng/wmy/models/vicuna-7b-v1.5", help='tokenizer path')

修改cogvlm-chat代码

修改cogvlm-chat/visual.py 【因为xformers适配状态中，我们将attention计算方式使用pytorch实现】

1.注释
import xformers.ops as xops


2.添加在最上面
def memory_efficient_attention_pytorch(query, key, value, attn_bias=None, p=0., scale=None):
    # query     [batch, seq_len, n_head, head_dim]
    # key       [batch, seq_len, n_head, head_dim]
    # value     [batch, seq_len, n_head, head_dim]
    # attn_bias [batch, n_head, seq_len, seq_len]

    if scale is None:
        scale = 1 / query.shape[-1] ** 0.5
    
    # BLHC -> BHLC
    query = query.transpose(1, 2)
    key = key.transpose(1, 2)
    value = value.transpose(1, 2)

    query = query * scale
    # BHLC @ BHCL -> BHLL
    attn = query @ key.transpose(-2, -1)
    if attn_bias is not None:
        attn = attn + attn_bias
    attn = attn.softmax(-1)
    attn = F.dropout(attn, p)
    # BHLL @ BHLC -> BHLC
    out = attn @ value
    # BHLC -> BLHC
    out = out.transpose(1, 2)
    return out


3.注释
out = xops.memory_efficient_attention(
       q, k, v, scale=self.scale,
     )
在底下添加
out = memory_efficient_attention_pytorch(
            q, k, v,scale=self.scale
        )


4.内存连续
output = self.dense(out.view(B, L, -1))
改成
output = self.dense(out.contiguous().view(B, L, -1))

5.运行效果

启动命令

python basic_demo/cli_demo_hf.py --from_pretrained /workspace/volume/shixisheng/wmy/models/cogvlm-chat --fp16

–from_pretrained 后接模型绝对路径

启动会有点慢但是不会影响到后面的使用

在这里插入图片描述
推理使用这张图

(pytorch) root@notebook-shixisheng-0111-104304-9bjbr4-notebook-0:/workspace/volume/shixisheng/wmy/CogVLM_mlu# python basic_demo/cli_demo_hf.py 
========Use torch type as:torch.float16 with device:mlu========


/torch/venv3/pytorch/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: 
  warn(f"Failed to load image Python extension: {e}")
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████| 8/8 [00:08<00:00,  1.02s/it]
image path >>>>> /workspace/volume/shixisheng/wmy/CogVLM_mlu/20240311-163355.jpg
Human:图中的狗是什么品种
Keyword arguments {'add_special_tokens': False} not recognized.
/workspace/volume/shixisheng/wmy/CogVLM_mlu/basic_demo/cli_demo_hf.py:87: UserWarning:  MLU operators don't support 64-bit calculation. so the 64 bit data will be forcibly converted to 32-bit for calculation.  (Triggered internally at /torch/catch/torch_mlu/csrc/aten/utils/tensor_util.cpp:159.)
  'input_ids': input_by_model['input_ids'].unsqueeze(0).to(DEVICE),
[2024-3-11 16:36:47] [CNNL] [Warning]:[cnnlGetConvolutionForwardAlgorithm] is deprecated and will be removed in the future release. See cnnlFindConvolutionForwardAlgorithm() API for replacement.
[2024-3-11 16:36:47] [CNNL] [Warning]:[cnnlStridedSlice] is deprecated and will be removed in the future release, please use [cnnlStridedSlice_v2] instead.
/root/.cache/huggingface/modules/transformers_modules/cogvlm-chat/modeling_cogvlm.py:126: UserWarning: It's recommended to use torch2.0 or higher.
  warnings.warn("It's recommended to use torch2.0 or higher.")

Cog: The dog in the image is a Golden Retriever.

370问：图中的狗是什么品种
Cog: The dog in the image is a Golden Retriever.
370问：介绍下这张图
Cog:The image showcases an indoor setting, likely a living room, with a young woman kneeling down to play with a Golden Retriever dog. The room is well-lit, with a large window allowing natural light to flood in. There's a comfortable-looking sofa in the background adorned with various cushions. A tall potted plant stands to the right, and a unique white fluffy lamp hangs from the ceiling. The floor is covered with a patterned rug, and the overall ambiance of the room is warm and inviting.

下期见byebye!!!

小军军军军军军

关注

9
点赞
踩
12

收藏

觉得还不错? 一键收藏
1
评论
MLU370-M8 Cogvlm部署手册

修改cogvlm-chat/visual.py 【因为xformers适配状态中，我们将attention计算方式使用pytorch实现】修改basic_demo/cli_demo_hf.py代码【将default改成自己的绝对路径】参考https://www.zhihu.com/question/602057035。将下载后的模型拷贝到存储卷中，方便后续改绝对路径使用。启动会有点慢但是不会影响到后面的使用。pip install 即可。下载社区github代码。下期见byebye!
复制链接

扫一扫

专栏目录