通过fastapi多卡部署codellama-13b-instruct模型

2022.06.11来北漂

已于 2023-09-06 14:20:42 修改

阅读量701

点赞数 2

文章标签： flask python llama pytorch 语言模型

于 2023-09-06 10:15:03 首次发布

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_57683403/article/details/132709297

版权

# 如果通过多卡去部署13b会发现两个卡会占用两个进程 rank=0 rank=1，这种情况会导致程序死掉，所以无法直接部署

class Config(BaseModel):
      prompts: List[str]=["""\
                import socket

                def ping_exponential_backoff(host: str):"""]
      max_gen_len: Optional[int] = None
      temperature: float = 0.2
      top_p: float = 0.90

if dist.get_rank() == 0:
    @app.post("/llama/")
    def generate(config: Config):
          prompts = [config.prompts[0]]
          print(prompts)
          max_gen_len = config.max_gen_len
          temperature = config.temperature
          top_p = config.top_p
          dist.broadcast_object_list([config.prompts, config.max_gen_len,         
                                     config.temperature, config.top_p])
          #print(instructions,max_gen_len,temperature,top_p)
          results = generator.text_completion(
                prompts,  # type: ignore
                max_gen_len=max_gen_len,
                temperature=temperature,
                top_p=top_p,
            )
          print(results)
          return {"responses": results}
        
      uvicorn.run(app, host="127.0.0.1", port=5000)
else:
     while True:
         config = [None] * 4
         try:
             dist.broadcast_object_list(config)
             generator.text_completion(
                    config[0], max_gen_len=config[1], temperature=config[2], 
                    top_p=config[3]
                  )
         except:
                pass

通过判断rank进行部署，这样就不会报错了^_^

2022.06.11来北漂

关注

2
点赞
踩
1

收藏

觉得还不错? 一键收藏
6
评论
通过fastapi多卡部署codellama-13b-instruct模型

判断rank多卡部署模型
复制链接

扫一扫

2022.06.11来北漂 CSDN认证博客专家 CSDN认证企业博客

码龄3年

4: 原创

120万+: 周排名

180万+: 总排名

2059: 访问

: 等级

56: 积分

3: 粉丝

4: 获赞

7: 评论

3: 收藏

私信

关注

热门文章

分类专栏

笔记 1篇

最新评论

通过fastapi多卡部署codellama-13b-instruct模型
2022.06.11来北漂: 这个没有尝试
通过fastapi多卡部署codellama-13b-instruct模型
惊鸿旧影: 那你有试过模拟多个用户同时curl吗？我一旦上一个curl输出没完成，又进来一个curl就报错了
通过fastapi多卡部署codellama-13b-instruct模型
2022.06.11来北漂: CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node 2 web_server.py --ckpt_dir ./CodeLlama-13b-Instruct --tokenizer_path ./CodeLlama-13b-Instruct/tokenizer.model 通过这个命令去部署，主要用fastapi框架返回API
通过fastapi多卡部署codellama-13b-instruct模型
惊鸿旧影: 想请问您是用torchrun进行多卡部署的吗？那外面调用的具体逻辑是怎么样的啊？
通过fastapi多卡部署codellama-13b-instruct模型
CSDN-Ada助手: 恭喜您写了这篇博客！标题看起来很有趣，我对fastapi多卡部署codellama-13b-instruct模型也很感兴趣。您的文章内容会给读者带来很多启发和帮助。不过，如果可以的话，我期待您在下一篇博客中能够更多地分享一些实际应用中遇到的挑战，并提供一些解决问题的方法和建议。我相信您的经验和知识会对我们有所启发，谢谢您的分享！如何快速涨粉，请看该博主的分享：https://hope-wisdom.blog.csdn.net/article/details/130544967?utm_source=csdn_ai_ada_blog_reply5

您愿意向朋友推荐“博客详情页”吗？

强烈不推荐
不推荐
一般般
推荐
强烈推荐

提交

最新文章

目录

评论 6

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。