MLU370-M8 Cogvlm部署手册
一、环境选择
二、环境安装
transformers
git clone -b v4.38.0 https://githubfast.com/huggingface/transformers.git
python /torch/src/catch/tools/torch_gpu2mlu/torch_gpu2mlu.py -i transformers/
pip install -e ./transformers_mlu/
accelerate
git clone -b v0.27.2 https://githubfast.com/huggingface/accelerate.git
python /torch/src/catch/tools/torch_gpu2mlu/torch_gpu2mlu.py -i accelerate/
pip install -e ./accelerate_mlu/
常规安装
pip install 即可
modelscope
SwissArmyTransformer>=0.4.9
# transformers>=4.36.2
# xformers>=0.0.22
# torch>=2.1.0
# torchvision>=0.16.2
spacy>=3.6.0
pillow>=10.2.0
# deepspeed>=0.13.1
seaborn>=0.13.2
loguru~=0.7.2
streamlit>=1.31.0
timm>=0.9.12
# accelerate>=0.26.1
pydantic>=2.6.0
三.模型下载
#模型下载
from modelscope import snapshot_download
model_dir = snapshot_download('ZhipuAI/cogvlm-chat')
model_dir = snapshot_download('AI-ModelScope/vicuna-7b-v1.5')
将下载后的模型拷贝到存储卷中,方便后续改绝对路径使用
四.代码修改
下载社区github代码
https://github.com/THUDM/CogVLM.git
#转换算子
python /torch/src/catch/tools/torch_gpu2mlu/torch_gpu2mlu.py -i CogVLM/
cd CogVLM_mlu/
修改basic_demo/cli_demo_hf.py代码【将default改成自己的绝对路径】
parser.add_argument("--from_pretrained", type=str, default="/workspace/volume/shixisheng/wmy/models/cogagent-chat", help='pretrained ckpt')
parser.add_argument("--local_tokenizer", type=str, default="/workspace/volume/shixisheng/wmy/models/vicuna-7b-v1.5", help='tokenizer path')
修改cogvlm-chat代码
修改cogvlm-chat/visual.py 【因为xformers适配状态中,我们将attention计算方式使用pytorch实现】
1.注释
import xformers.ops as xops
2.添加在最上面
def memory_efficient_attention_pytorch(query, key, value, attn_bias=None, p=0., scale=None):
# query [batch, seq_len, n_head, head_dim]
# key [batch, seq_len, n_head, head_dim]
# value [batch, seq_len, n_head, head_dim]
# attn_bias [batch, n_head, seq_len, seq_len]
if scale is None:
scale = 1 / query.shape[-1] ** 0.5
# BLHC -> BHLC
query = query.transpose(1, 2)
key = key.transpose(1, 2)
value = value.transpose(1, 2)
query = query * scale
# BHLC @ BHCL -> BHLL
attn = query @ key.transpose(-2, -1)
if attn_bias is not None:
attn = attn + attn_bias
attn = attn.softmax(-1)
attn = F.dropout(attn, p)
# BHLL @ BHLC -> BHLC
out = attn @ value
# BHLC -> BLHC
out = out.transpose(1, 2)
return out
3.注释
out = xops.memory_efficient_attention(
q, k, v, scale=self.scale,
)
在底下添加
out = memory_efficient_attention_pytorch(
q, k, v,scale=self.scale
)
4.内存连续
output = self.dense(out.view(B, L, -1))
改成
output = self.dense(out.contiguous().view(B, L, -1))
5.运行效果
启动命令
python basic_demo/cli_demo_hf.py --from_pretrained /workspace/volume/shixisheng/wmy/models/cogvlm-chat --fp16
–from_pretrained 后接模型绝对路径
启动会有点慢但是不会影响到后面的使用
推理使用这张图
(pytorch) root@notebook-shixisheng-0111-104304-9bjbr4-notebook-0:/workspace/volume/shixisheng/wmy/CogVLM_mlu# python basic_demo/cli_demo_hf.py
========Use torch type as:torch.float16 with device:mlu========
/torch/venv3/pytorch/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension:
warn(f"Failed to load image Python extension: {e}")
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████| 8/8 [00:08<00:00, 1.02s/it]
image path >>>>> /workspace/volume/shixisheng/wmy/CogVLM_mlu/20240311-163355.jpg
Human:图中的狗是什么品种
Keyword arguments {'add_special_tokens': False} not recognized.
/workspace/volume/shixisheng/wmy/CogVLM_mlu/basic_demo/cli_demo_hf.py:87: UserWarning: MLU operators don't support 64-bit calculation. so the 64 bit data will be forcibly converted to 32-bit for calculation. (Triggered internally at /torch/catch/torch_mlu/csrc/aten/utils/tensor_util.cpp:159.)
'input_ids': input_by_model['input_ids'].unsqueeze(0).to(DEVICE),
[2024-3-11 16:36:47] [CNNL] [Warning]:[cnnlGetConvolutionForwardAlgorithm] is deprecated and will be removed in the future release. See cnnlFindConvolutionForwardAlgorithm() API for replacement.
[2024-3-11 16:36:47] [CNNL] [Warning]:[cnnlStridedSlice] is deprecated and will be removed in the future release, please use [cnnlStridedSlice_v2] instead.
/root/.cache/huggingface/modules/transformers_modules/cogvlm-chat/modeling_cogvlm.py:126: UserWarning: It's recommended to use torch2.0 or higher.
warnings.warn("It's recommended to use torch2.0 or higher.")
Cog: The dog in the image is a Golden Retriever.
370问:图中的狗是什么品种
Cog: The dog in the image is a Golden Retriever.
370问:介绍下这张图
Cog:The image showcases an indoor setting, likely a living room, with a young woman kneeling down to play with a Golden Retriever dog. The room is well-lit, with a large window allowing natural light to flood in. There's a comfortable-looking sofa in the background adorned with various cushions. A tall potted plant stands to the right, and a unique white fluffy lamp hangs from the ceiling. The floor is covered with a patterned rug, and the overall ambiance of the room is warm and inviting.
下期见byebye!!!