【fastapi】的使用(部署LLM)

安装

fastapi安装,需要安装自身包和一个ASGI的服务器

pip install fastapi
pip install "uvicorn[standard]"

与WSGI类似,ASGI( Asynchronous Server Gateway Interface)描述了Python Web应用程序和Web服务器之间的通用接口。与WSGI不同的是,ASGI允许每个应用程序有多个异步事件。
常见ASGI如uvicorn,gunicorn,hypercorn

简单使用

参考
浏览器访问:http://127.0.0.1:8000

LLM部署

参考

可以命令行运行:python -m uvicorn main:app --reload
可以放在python文件里:uvicorn.run(app, host='127.0.0.1', port=8000, workers=1)

curl测试:curl -X POST "http://127.0.0.1:8000/ask_post" -H "Content-Type:application/json" -d '{"prompt":"test"}'

curl -X POST "http://127.0.0.1:8000/ask_post" \
    -H 'Content-Type: application/json' \
    -d '{"query": "早上好,你早饭吃的什么?", "params":{"temperature":0.7, "num_return_sequences":5}}'

完整代码

import os  # 注意要放前面
os.environ["WANDB_DISABLED"]="true"  # 把wandb关了
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
from transformers import AutoTokenizer, AutoModelForCausalLM
from fastapi import FastAPI, Request
import argparse
import uvicorn
import json

MODEL_NAME="Qwen2-7B-Instruct"

app = FastAPI()

def load_model_and_tokenizer(model_name):
    path = f'/data1/shares/{model_name}'
    model = AutoModelForCausalLM.from_pretrained(path)
    tokenizer = AutoTokenizer.from_pretrained(path)
    # model = model.to(f'cuda:{device}')
    model = model.cuda()
    print(f'Load model to cuda successfully... ')
    return model, tokenizer

def generate(model, tokenizer, query_text, params):
    default_params = {'temperature': 0, 
                      'num_return_sequences': 2, 
                      'max_new_tokens': 256}
    if params is None:
        params = default_params
    else:
        for k, v in default_params.items():
            if k not in params:
                params[k] = v

    inputs = query_text
    inputs_num = len(inputs)
    inputs = tokenizer(inputs, return_tensors="pt")
    inputs = {k:v.cuda() for k, v in inputs.items()}
    outputs = model.generate(**inputs, **params,
                            eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.pad_token_id)
    generate_strs = tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)
    generate_strs = [str[inputs_num:] for str in generate_strs]
    return generate_strs, generate_strs[0]


@app.get("/model_name")
def read_root():
    return {"MODEL_NAME": MODEL_NAME}

#http://127.0.0.1:8000/ask?query=query
@app.get("/ask")
def ask(query):
    global model, tokenizer
    return {"query": query, "embed_query": generate(model, tokenizer, query, None)}

@app.post("/ask_post")
async def ask_post(request: Request):
    global model, tokenizer  # 声明全局变量以便在函数内部使用模型和分词器
    json_post_raw = await request.json()  # 获取POST请求的JSON数据
    json_post = json.dumps(json_post_raw)  # 将JSON数据转换为字符串
    final_json = json.loads(json_post)  # 将字符串转换为Python对象
    print(f'>>> final_json:\n{final_json}')

    params = None
    if "params" in final_json:
        params = final_json['params']
    
    query = final_json['query']

    data = generate(model, tokenizer, query, params)
    return {"data": data, "query": query}


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument("--model", type=str, default="Qwen2-7B-Instruct", 
                        choices=['bge-large-zh-v1.5', 'Qwen2-7B-Instruct'])
    args = parser.parse_args()
    MODEL_NAME = args.model

    model, tokenizer = load_model_and_tokenizer(MODEL_NAME)
    model.eval()
    uvicorn.run(app, host='127.0.0.1', port=8000, workers=1)

request访问

def fun(prompt, params):
	data = {"query": prompt, "params": params}
	data["params"]["num_return_sequences"] = 1
	
	json_str_data = json.dumps(data, ensure_ascii=False)
	response = requests.post(url='http://127.0.0.1:8000/ask_post', headers=self.headers, data=json_str_data)
	return response.json()["data"][0]

常见问题

  • 运行的时候报错unicorn没装上,但是已经装了 'uvicorn' is a package and cannot be directly executed因为在环境变量中找到的python版本和pip对应的python版本不一致。如export PATH=$PATH:/usr/local/python3/bin。或者使用python -m命令。【最重要的是,安装的是uvicorn不是unicorn,我装错了。。。】用python -m 命令更安全。避免用其他python的bin目录下的uvicorn。
  • 5
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值