LLM大模型统一封装接口解决方案

即刻关注,获取更多

关注公众号 N学无止界 获取更多

目的:封装一个通用的 Java 框架的 chat completion 接口,来适配各种大模型的统一调用

限制条件:仅对聊天(文本)接口进行封装,其他接口待后续计划安排

OpenAi chat completion 接口分析

==>官方文档

  • Curl 默认请求

curl http://chat.xxxxxx.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'

返回值

{
   
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-3.5-turbo-0125",
  "system_fingerprint": "fp_44709d6fcb",
  "choices": [{
   
    "index": 0,
    "message": {
   
      "role": "assistant",
      "content": "\n\nHello there, how may I assist you today?",
    },
    "logprobs": null,
    "finish_reason": "stop"
  }],
  "usage": {
   
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21
  }
}
  • Python 流式请求

from openai import OpenAI
# client = OpenAI()

client = OpenAI(
    base_url='http://chat.xxxxxx.com/'
)
completion = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
    {
   "role": "system", "content": "You are a helpful assistant."},
    {
   "role": "user", "content": "Hello!"}
  ],
  stream=True
)

for chunk in completion:
  print(chunk.choices[0].delta)

返回值

{
   
	"id": "chatcmpl-123",
	"object": "chat.completion.chunk",
	"created": 1694268190,
	"model": "gpt-3.5-turbo-0125",
	"system_fingerprint": "fp_44709d6fcb",
	"choices": [
		{
   
			"index": 0,
			"delta": {
   
				"role": "assistant",
				"content": ""
			},
			"logprobs": null,
			"finish_reason": null
		}
	]
}

{
   
	"id": "chatcmpl-123",
	"object": "chat.completion.chunk",
	"created": 1694268190,
	"model": "gpt-3.5-turbo-0125",
	"system_fingerprint": "fp_44709d6fcb",
	"choices": [
		{
   
			"index": 0,
			"delta": {
   
				"content": "Hello"
			},
			"logprobs": null,
			"finish_reason": null
		}
	]
}

....

{
   
	"id": "chatcmpl-123",
	"object": "chat.completion.chunk",
	"created": 1694268190,
	"model": "gpt-3.5-turbo-0125",
	"system_fingerprint": "fp_44709d6fcb",
	"choices": [
		{
   
			"index": 0,
			"delta": {
   },
			"logprobs": null,
			"finish_reason": "stop"
		}
	]
}

请求参数分析

  • model 选填,默认为 gpt-3.5-turbo

  • messages 必填

[
    {
   "role": "system", "content": "You are a helpful assistant."},
    {
   "role": "user", "content": "Hello!"}
  ]

  • stream 选填,默认为 false

If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message

  • temperature 选填,默认为 1

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

使用什么样的采样温度,在 0 到 2 之间。较高的值,比如 0.8,会使输出更随机,而较低的值,比如 0.2,会使其更加聚焦和确定性。

We generally recommend altering this or top_p but not both.

我们一般建议修改这个参数或者 top_p,但不要同时修改两者。

  • top_p 选填,默认为 1

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

一种与温度采样相对应的替代方法是核心采样,模型会考虑具有 top_p 概率质量的标记的结果。因此,0.1 表示仅考虑构成顶部 10% 概率质量的标记。

We generally recommend altering this or temperature but not both.

我们一般建议修改这个参数或者温度,但不要同时修改两者。

其他可能用到的请求参数

  • max_tokens 选填,无默认值

The maximum number of tokens that can be generated in the chat completion.

  • n integer or null 选填,默认值 1

How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

输出参数对比(默认和流式结果)

  • 流式结果没有 usage 字段,要计算token数需要额外计算,官方并未提供相关方法

    计算token 的 官方介绍

    Another small drawback of streaming responses is that the response no longer includes the usage field to tell you how many tokens were consumed. After receiving and combining all of the responses, you can calculate this yourself using tiktoken.

    python库 tiktoken使用方法

  • 普通请求结果的 choices 字段中的 message 用 delta 字段代替

灵医Bot Chat 接口分析

接口路径 /api/01bot/sse-gateway/stream

灵医Bot Chat 提供 如果是服务端调用,参考 Server Sent Events server and client for Golang

Header参数

名称 示例 类型 必选 说明
Content-Type application/json string Content-Type类型
X-IHU-Authorization-V2 参考《鉴权认证文档》 string 签名字符串

非流式请求示例(单轮)参数

{
   
    "model": "test-model",
    "stream": false,
    "messages": [
        {
   
            "version":"api-v2",       
            "created": 1683944235,              
            "role": "user",                    
            "content": [{
   
                "type":"text",
                "body":"患者3天前面部肿胀伴多发红疹,自觉瘙痒,是什么疾病?"
            }]                       
        }
    ]
}

与OpenAi区别
messages 字段多了 version created 字段

文档中message 参数中有 default 参数必填 (兜底开关,用于兜底策略:0:不兜底 1:一言兜底) 但是给出的示例中未体现

content 内容结构不同, 有 string 改成 实体类型(type, body)

非流式响应参数(单轮)

{
   
    
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Jack_software

感谢打赏,我努力提供优质内容~

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值