deepspeed-chat v100服务器部署记录

最新推荐文章于 2024-07-26 09:50:12 发布

jxy210

最新推荐文章于 2024-07-26 09:50:12 发布

阅读量334

点赞数 6

文章标签：服务器 python conda chatgpt

本文链接：https://blog.csdn.net/qq_52807918/article/details/138802066

版权

参考

DeepSpeed-Chat全流程训练实战:
https://zhuanlan.zhihu.com/p/643643325
大模型训练入门实战:
https://techdiylife.github.io/big-model-training/deepspeed/deepspeed-chat.html

配置环境

conda create -n dsp python=3.10
conda activate dsp
#下载对应版本的DeepSpeedExamples
https://github.com/waitxian/DeepSpeedExamples.git
cd DeepSpeedExamples/applications/DeepSpeed-Chat/
#先下载cuda torch 返回true即可
pip install torch2.0.0 torchvision0.15.1 torchaudio==2.0.1
pip install -r requirements.txt

修改代码

1、加载本地模型
到镜像站下载对应模型，将路径改为本地位置（不成功大概率是因为模型没有下载完整）

#查看模型大小
ls -lh

main.py


    #tokenizer = AutoTokenizer.from_pretrained(args.model_name_or_path,
      #                                        fast_tokenizer=True)
    
    tokenizer = AutoTokenizer.from_pretrained("/tmp/DeepSpeedExamples20230415/applications/DeepSpeed-Chat/model/opt-1.3b", fast_tokenizer=True)
    #model = create_hf_model(AutoModelForCausalLM, args.model_name_or_path,
     #                       tokenizer, ds_config)
    model = create_hf_model(AutoModelForCausalLM,"/tmp/DeepSpeedExamples20230415/applications/DeepSpeed-Chat/model/opt-1.3b" ,
                            tokenizer, ds_config)

run_1.3b.sh


deepspeed --num_gpus 1 main.py --model_name_or_path  "/tmp/DeepSpeedExamples20230415/applications/DeepSpeed-Chat/model/opt-1.3b" \
   --gradient_accumulation_steps 2 --lora_dim 128 --zero_stage $ZERO_STAGE \
   --deepspeed --output_dir $OUTPUT &> $OUTPUT/training.log

2、加载本地训练数据

修改加载的位置就好了，名字不用改

class DahoasRmstaticDataset(PromptRawDataset):

 def __init__(self, output_path, seed, local_rank):
        super().__init__(output_path, seed, local_rank)
        self.dataset_name = "Dahoas/rm-static"
        self.dataset_name_clean = "Dahoas_rm_static"
        self.raw_datasets = load_dataset("/tmp/DeepSpeedExamples20230415/applications/DeepSpeed-Chat/dataset")

如果服务器可以连huggingface代码直接可以加载数据
模型下载：代码会自动的下载对应的模型，默认情况下模型被存放在

~/.cache/huggingface/hub/models--facebook--opt-1.3b

数据下载：此步训练使用了以下数据

Dahoas/rm-static    # 对话（prompt，response，chosen，rejected） 
Dahoas/full-hh-rlhf # 对话（prompt，response，chosen，rejected）
Dahoas/synthetic-instruct-gptj-pairwise #对话（prompt，chosen，rejected）
yitingxie/rlhf-reward-datasets  # 对话（prompt，chosen，rejected）
openai/webgpt_comparisons       # 带人工打分的数据，comparisons with human feedback，19,578 comparisons）
stanfordnlp/SHP                 # 18个领域的385k 人类标注数据

训练

python3 train.py --step 1 --deployment-type single_gpu  #单GPU训练
python3 train.py --step 1 --deployment-type single_node #多GPU训练
python3 train.py --step 1 --deployment-type multi_node  #多Node训练

在这里插入图片描述

运行单节点多卡一开始一直不能同时使用三张卡，使用–include localhost:0,1,2指定，不能用export CUDA_VISIBLE_DEVICES=0,1,2 但是还是不行最后通过通过设置 shm-size 参数来增加共享内存的大小，可以解决多 GPU 训练时出现的一些问题，因此在创建容器的时候就要指定共享内存的大小

测试单节点多卡运行1.3b模型，配置batch=8使用zero2第三阶段三张卡上几乎都满内存一个epoch用了12min左右

4、评价与测试
打开文件 run_prompt.sh 添加 baseline 模型，和 finetune 后的模型：

#export CUDA_VISIBLE_DEVICES=0
python prompt_eval.py \
    --model_name_or_path_baseline "/tmp/DeepSpeedExamples20230415/applications/DeepSpeed-Chat/model/opt-1.3b" \
    --model_name_or_path_finetune "/tmp/DeepSpeedExamples20230415/applications/DeepSpeed-Chat/output/actor-models/1.3b"

评价程序会调用 prompt_eval.py 来分别输出 baseline 和 finetune 后模型的结果。

要执行此代码，需要切换到 step1_supervised_finetuning 目录下。

cd training/step1_supervised_finetuning
bash evaluation_scripts/run_prompt.sh

jxy210

关注

6
点赞
踩
8

收藏

觉得还不错? 一键收藏
0
评论
deepspeed-chat v100服务器部署记录

DeepSpeed-Chat全流程训练实战:https://zhuanlan.zhihu.com/p/643643325大模型训练入门实战:https://techdiylife.github.io/big-model-training/deepspeed/deepspeed-chat.htmlconda create -n dsp python=3.10conda activate dsp#下载对应版本的DeepSpeedExampleshttps://github.com/waitxian/
复制链接

扫一扫