LLM大模型统一封装接口解决方案

Jack_software

已于 2024-06-19 18:25:34 修改

阅读量1.2k

点赞数 20

分类专栏：机器学习文章标签：网络语言模型大模型

于 2024-03-18 16:56:38 首次发布

本文链接：https://blog.csdn.net/jack_software/article/details/136814353

版权

即刻关注，获取更多

关注公众号 N学无止界 获取更多

目的：封装一个通用的 Java 框架的 chat completion 接口，来适配各种大模型的统一调用

限制条件：仅对聊天（文本）接口进行封装，其他接口待后续计划安排

OpenAi chat completion 接口分析

==>官方文档

Curl 默认请求

curl http://chat.xxxxxx.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'

返回值

{
   
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-3.5-turbo-0125",
  "system_fingerprint": "fp_44709d6fcb",
  "choices": [{
   
    "index": 0,
    "message": {
   
      "role": "assistant",
      "content": "\n\nHello there, how may I assist you today?",
    },
    "logprobs": null,
    "finish_reason": "stop"
  }],
  "usage": {
   
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21
  }
}

Python 流式请求

from openai import OpenAI
# client = OpenAI()

client = OpenAI(
    base_url='http://chat.xxxxxx.com/'
)
completion = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
    {
   "role": "system", "content": "You are a helpful assistant."},
    {
   "role": "user", "content": "Hello!"}
  ],
  stream=True
)

for chunk in completion:
  print(chunk.choices[0].delta)

返回值

{
   
	"id": "chatcmpl-123",
	"object": "chat.completion.chunk",
	"created": 1694268190,
	"model": "gpt-3.5-turbo-0125",
	"system_fingerprint": "fp_44709d6fcb",
	"choices": [
		{
   
			"index": 0,
			"delta": {
   
				"role": "assistant",
				"content": ""
			},
			"logprobs": null,
			"finish_reason": null
		}
	]
}

{
   
	"id": "chatcmpl-123",
	"object": "chat.completion.chunk",
	"created": 1694268190,
	"model": "gpt-3.5-turbo-0125",
	"system_fingerprint": "fp_44709d6fcb",
	"choices": [
		{
   
			"index": 0,
			"delta": {
   
				"content": "Hello"
			},
			"logprobs": null,
			"finish_reason": null
		}
	]
}

....

{
   
	"id": "chatcmpl-123",
	"object": "chat.completion.chunk",
	"created": 1694268190,
	"model": "gpt-3.5-turbo-0125",
	"system_fingerprint": "fp_44709d6fcb",
	"choices": [
		{
   
			"index": 0,
			"delta": {
   },
			"logprobs": null,
			"finish_reason": "stop"
		}
	]
}

请求参数分析

model 选填，默认为 gpt-3.5-turbo
messages 必填

[
    {
   "role": "system", "content": "You are a helpful assistant."},
    {
   "role": "user", "content": "Hello!"}
  ]

stream 选填，默认为 false

If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message

temperature 选填，默认为 1

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

使用什么样的采样温度，在 0 到 2 之间。较高的值，比如 0.8，会使输出更随机，而较低的值，比如 0.2，会使其更加聚焦和确定性。

We generally recommend altering this or top_p but not both.

我们一般建议修改这个参数或者 top_p，但不要同时修改两者。

top_p 选填，默认为 1

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

一种与温度采样相对应的替代方法是核心采样，模型会考虑具有 top_p 概率质量的标记的结果。因此，0.1 表示仅考虑构成顶部 10% 概率质量的标记。

We generally recommend altering this or temperature but not both.

我们一般建议修改这个参数或者温度，但不要同时修改两者。

其他可能用到的请求参数

max_tokens 选填，无默认值

The maximum number of tokens that can be generated in the chat completion.

n integer or null 选填，默认值 1

How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

输出参数对比(默认和流式结果)

流式结果没有 usage 字段，要计算token数需要额外计算，官方并未提供相关方法

计算token 的 官方介绍

Another small drawback of streaming responses is that the response no longer includes the usage field to tell you how many tokens were consumed. After receiving and combining all of the responses, you can calculate this yourself using tiktoken.

python库 tiktoken使用方法
普通请求结果的 choices 字段中的 message 用 delta 字段代替

灵医Bot Chat 接口分析

接口路径 /api/01bot/sse-gateway/stream

灵医Bot Chat 提供如果是服务端调用，参考 Server Sent Events server and client for Golang

Header参数

名称	示例	类型	必选	说明
Content-Type	application/json	string	是	Content-Type类型
X-IHU-Authorization-V2	参考《鉴权认证文档》	string	是	签名字符串

非流式请求示例（单轮）参数

{
   
    "model": "test-model",
    "stream": false,
    "messages": [
        {
   
            "version":"api-v2",       
            "created": 1683944235,              
            "role": "user",                    
            "content": [{
   
                "type":"text",
                "body":"患者3天前面部肿胀伴多发红疹，自觉瘙痒，是什么疾病？"
            }]                       
        }
    ]
}