qwen2模型部署和微调

参考视频:最新Qwen2大模型环境配置+LoRA模型微调+模型部署详细教程!真实案例对比GLM4效果展示!

去github上下载代码https://github.com/QwenLM/Qwen2

去魔塔下载模型https://modelscope.cn/models/qwen/Qwen2-7B-Instruct/files

创建conda环境

conda create -n qwen2 python==3.10
解决 Conda 环境 hf 切换问题的完整指南
python -m ipykernel install --user --name qwen2 --display-name "Python (qwen2)"

下载qwen2项目

https://github.com/QwenLM/Qwen2
进入autodl-tmp
cd autodl-tmp
在这里插入图片描述
git clone https://github.com/QwenLM/Qwen2.git

下载qwen2模型

https://modelscope.cn/models/qwen/Qwen2-7B-Instruct/files
pip install modelscope
modelscope download qwen/Qwen2-7B-Instruct

#模型下载
from modelscope import snapshot_download
model_dir = snapshot_download('qwen/Qwen2-7B-Instruct',local_dir="/root/autodl-tmp/Qwen2-7B-Instruct")

去这个目录下修改py文件
/root/autodl-tmp/Qwen2/examples/demo
在这里插入图片描述
pip install torch==2.0.0
pip install transformers
pip install accelerate

python cli_demo.py

在这里插入图片描述

User: hello

Qwen2-Instruct: The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Exception in thread Thread-2 (generate):
Traceback (most recent call last):
  File "/root/miniconda3/envs/qwen2/lib/python3.10/threading.py", line 1009, in _bootstrap_inner
    self.run()
  File "/root/miniconda3/envs/qwen2/lib/python3.10/threading.py", line 946, in run
    self._target(*self._args, **self._kwargs)
  File "/root/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/miniconda3/envs/qwen2/lib/python3.10/site-packages/transformers/generation/utils.py", line 2024, in generate
    result = self._sample(
  File "/root/miniconda3/envs/qwen2/lib/python3.10/site-packages/transformers/generation/utils.py", line 2982, in _sample
    outputs = self(**model_inputs, return_dict=True)
  File "/root/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/qwen2/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 1104, in forward
    outputs = self.model(
  File "/root/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/qwen2/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 888, in forward
    causal_mask = self._update_causal_mask(
  File "/root/miniconda3/envs/qwen2/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 1000, in _update_causal_mask
    causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
  File "/root/miniconda3/envs/qwen2/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 101, in _prepare_4d_causal_attention_mask_with_cache_position
    causal_mask = torch.triu(causal_mask, diagonal=1)
RuntimeError: "triu_tril_cuda_template" not implemented for 'BFloat16'
Traceback (most recent call last):
  File "/root/autodl-tmp/Qwen2/examples/demo/cli_demo.py", line 266, in <module>
    main()
  File "/root/autodl-tmp/Qwen2/examples/demo/cli_demo.py", line 252, in main
    for new_text in _chat_stream(model, tokenizer, query, history):
  File "/root/autodl-tmp/Qwen2/examples/demo/cli_demo.py", line 149, in _chat_stream
    for new_text in streamer:
  File "/root/miniconda3/envs/qwen2/lib/python3.10/site-packages/transformers/generation/streamers.py", line 223, in __next__
    value = self.text_queue.get(timeout=self.timeout)
  File "/root/miniconda3/envs/qwen2/lib/python3.10/queue.py", line 179, in get
    raise Empty
_queue.Empty

pip install torch==2.1.1
https://github.com/meta-llama/llama3/issues/80

User: 你好

Qwen2-Instruct: The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Exception in thread Thread-2 (generate):
Traceback (most recent call last):
  File "/root/miniconda3/envs/qwen2/lib/python3.10/threading.py", line 1009, in _bootstrap_inner
    self.run()
  File "/root/miniconda3/envs/qwen2/lib/python3.10/threading.py", line 946, in run
    self._target(*self._args, **self._kwargs)
  File "/root/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/miniconda3/envs/qwen2/lib/python3.10/site-packages/transformers/generation/utils.py", line 2024, in generate
    result = self._sample(
  File "/root/miniconda3/envs/qwen2/lib/python3.10/site-packages/transformers/generation/utils.py", line 2982, in _sample
    outputs = self(**model_inputs, return_dict=True)
  File "/root/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/qwen2/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 1104, in forward
    outputs = self.model(
  File "/root/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/qwen2/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 915, in forward
    layer_outputs = decoder_layer(
  File "/root/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/qwen2/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 655, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/root/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/qwen2/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 580, in forward
    attn_output = torch.nn.functional.scaled_dot_product_attention(
RuntimeError: cutlassF: no kernel found to launch!

kaggle运行报错RuntimeError: cutlassF: no kernel found to launch!

解决问题

在这里插入图片描述
在这里插入图片描述

  • 2
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值