MLU370-M8 快速跑通 llama3-8B

最新推荐文章于 2024-07-28 03:39:50 发布

小军军军军军军

最新推荐文章于 2024-07-28 03:39:50 发布

阅读量667

点赞数 5

分类专栏：寒武纪大模型编程应用文章标签：人工智能语言模型 pytorch python

本文链接：https://blog.csdn.net/xiaojunjun200211/article/details/137955586

版权

寒武纪同时被 3 个专栏收录

10 篇文章 6 订阅

订阅专栏

大模型

9 篇文章 1 订阅

订阅专栏

编程应用

7 篇文章 0 订阅

订阅专栏

本文介绍了如何在特定平台环境下，通过部署transformers和accelerate库，下载Meta-Llama-3-8B-Instruct模型并在MLU上进行运行，展示了模型的使用和效果。

摘要由CSDN通过智能技术生成

提示：开个玩笑，下载模型花了4分30秒，30秒就跑通了这个模型，简不简单，好不好用

一、平台环境准备

目前pytorch的教程有1.9.1.13.1 2.1的这3个版本都有，如果你更换版本使用，也可参考灵活变通。

镜像选择：pytorch:v1.17_torch1.13.1_ubuntu20.04_py310

二、环境部署

1.transformers

git clone -b v4.38.2 https://githubfast.com/huggingface/transformers.git
python /torch/src/catch/tools/torch_gpu2mlu/torch_gpu2mlu.py -i transformers/
pip install -e ./transformers_mlu/

2.accelerate

git clone -b v0.27.2 https://githubfast.com/huggingface/accelerate.git
python /torch/src/catch/tools/torch_gpu2mlu/torch_gpu2mlu.py -i accelerate/
pip install -e ./accelerate_mlu/

三、模型下载

git clone https://www.modelscope.cn/LLM-Research/Meta-Llama-3-8B-Instruct.git

四、直接运行

import transformers
import torch
#模型改成你自己的路径
model_id = "/workspace/volume/gpt/zhouguojun/llama3/Meta-Llama-3-8B-Instruct"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.float16},
    device="mlu",
)

messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Who are you?"},
]

prompt = pipeline.tokenizer.apply_chat_template(
		messages, 
		tokenize=False, 
		add_generation_prompt=True
)

terminators = [
    pipeline.tokenizer.eos_token_id,
    pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = pipeline(
    prompt,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
print(outputs[0]["generated_text"][len(prompt):])

直接run

5.效果展示

指令：You are a pirate chatbot who always responds in pirate speak!
提问：Who are you?
大模型回答：Arrrr, me hearty! Me name be Captain Chatbot, the scurviest pirate to ever sail the Seven Seas! Me be a chatbot, but don't ye worry, I be as cunning as a barnacle on a sunken ship! Me purpose be to swab the decks of yer queries and respond with answers as sharp as me trusty cutlass! So hoist the colors, me matey, and let's set sail fer a swashbucklin' good time!

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:54<00:00, 13.71s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
/workspace/volume/gpt/zhouguojun/3rd/transformers4.38.2/transformers_mlu/src/transformers/pipelines/base.py:1015: UserWarning:  MLU operators don't support 64-bit calculation. so the 64 bit data will be forcibly converted to 32-bit for calculation.  (Triggered internally at /torch/catch/torch_mlu/csrc/aten/utils/tensor_util.cpp:159.)
  return inputs.to(device)
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
[2024-4-19 10:27:6] [CNNL] [Warning]:[cnnlStridedSlice] is deprecated and will be removed in the future release, please use [cnnlStridedSlice_v2] instead.
[2024-4-19 10:27:6] [CNNL] [Warning]:[cnnlRandCreateGenerator_v2] will be deprecated.
Arrrr, shiver me timbers! Me name be Captain Chatbot, the scurviest pirate to ever sail the Seven Seas! Me be a swashbucklin' chatbot, ready to engage ye in a battle o' wits and words! So hoist the colors, me hearty, and let's set sail fer a treasure trove o' conversation!

+------------------------------------------------------------------------------+
| CNMON v5.10.22                                               Driver v5.10.22 |
+-------------------------------+----------------------+-----------------------+
| Card  VF  Name       Firmware |               Bus-Id | Util        Ecc-Error |
| Fan   Temp      Pwr:Usage/Cap |         Memory-Usage | Mode     Compute-Mode |
|===============================+======================+=======================|
| 0     /   MLU370-M8    v1.1.4 |         0000:49:00.0 | 77%                 0 |
|  0%   23C         79 W/ 300 W | 17524 MiB/ 42396 MiB | FULL          Default |
+-------------------------------+----------------------+-----------------------+

+------------------------------------------------------------------------------+
| Processes:                                                                   |
|  Card  MI  PID     Command Line                             MLU Memory Usage |
|==============================================================================|
|  0     /   2798    python                                          17123 MiB |
+------------------------------------------------------------------------------+