OpenAI能力总结

金闪闪_Li

已于 2023-11-26 22:50:57 修改

阅读量413

点赞数 3

分类专栏： AI Agent 文章标签： agi

于 2023-11-26 22:46:38 首次发布

本文链接：https://blog.csdn.net/u010618499/article/details/134634795

版权

AI Agent 专栏收录该内容

4 篇文章 6 订阅

订阅专栏

前言：

这篇文章总结了OpenAI官方文档里记录的所有模型能力，包含了各个模型通过API调用的示例，其中，我认为函数调用是OpenAI最亮点的功能

使用之前要先实例化openai client对象：

from openai import OpenAI
import json
client = OpenAI(api_key='你的OpenAI API Key')

一.能力（capabilities）

1.文本生成（text generation）

文本生成是模型根据用户输入的文本来输出一段文本的模型，文本生成模型主要是gpt-3.5和gpt-4，api调用的时候需要按固定的json格式输入，用户的输入是user，gpt返回的输出是assistant，示例如下：

response = client.chat.completions.create(
  model="gpt-3.5-turbo", #gpt-4
  messages=[
    {"role": "system", "content": "You are a helpful assistant, please answer user's question with Chinese."},
    {"role": "user", "content": "如何成为AI算法专家?"}
  ]
)
print(response.choices[0].message.content)

输出结果：
在这里插入图片描述

2.函数调用（function calling）

函数调用让chatgpt用于使用用户自定义的函数的能力，从而最大化的发挥了chatgpt的能力，实现各种各样有用的功能

首先需要明确的是：chatgpt不会调用自定义的函数，它会根据用户输入（prompt）和函数描述（description）智能的决定何时调用函数，它输出包含是否调用函数的参数的 JSON 对象，用户根据json对象里的内容来写代码手动调用自定义函数。通过将函数响应作为新消息附加在输入里来再次调用模型，并让模型将结果汇总返回给用户。下面是一个chatgpt函数调用的示例：

# step1：定义外部函数
# 一个查询天气的函数，函数的入参包含了location和format，这是我们虚设的一个函数，无论输入是什么地点，我们都返回相同的气温。实际使用中可以根据自己的需求写对应的函数功能
def get_current_weather(location, unit="fahrenheit"):   
    return json.dumps({"location": location, "temperature": "2", "unit": "celsius"})

# step2：定义函数描述
tools = [
  {
    "type": "function",
    "function": {
      "name": "get_current_weather",
      "description": "Get the current weather in a given location", # 函数功能描述，这个很关键，chatgpt会根据描述来决定是否使用这个函数
      "parameters": {
        "type": "object",
        "properties": {
          "location": { # 参数1，地点
            "type": "string",
            "description": "The city and state, e.g. San Francisco, CA" # 参数1的描述
          },
          "unit": { #参数2，气温单位
            "type": "string",
            "enum": [ 
              "celsius",
              "fahrenheit"
            ]
          }
        },
        "required": [
          "location"
        ]
      }
    }
  }
]

# step3：定义要问chatgpt的输入
messages = [{"role": "system", "content": "You are a helpful assistant, please answer user's question with Chinese."},
            {"role": "user", "content": "今天北京的天气怎么样？ 可以使用工具来查询，并告诉我查询到的气温是多少，给我一些出行建议。"}]

# step4：api调用chatgpt，传入用户问题，告诉chatgpt可以使用哪些工具
response = client.chat.completions.create(
    model="gpt-4-1106-preview", # 或者用gpt-3.5-turbo-1106
    messages=messages,
    tools=tools,
    tool_choice="auto"
)

#step5：解析chatgpt的输出，判断是否使用我们定义的函数
response_message = response.choices[0].message
print("response_message: ", response_message)


tool_call = response_message.tool_calls[0]
if tool_call.function.name=="get_current_weather": # 如果chatgpt决定调用的函数是get_current_weather，那我们就将chatgpt返回内容里的参数传入定义的函数中
    function_args = json.loads(tool_call.function.arguments)
    function_response = get_current_weather(location=function_args.get("location"),
                                            unit=function_args.get("unit"))

    # 将chatpgt第一次的返回结果和调用函数的返回结果都添加到messages里
    messages.append(response_message)
    messages.append( # 将函数的返回内容拼接到messages里，再一起传给chatgpt
                {
                    "tool_call_id": tool_call.id,
                    "role": "tool",
                    "name": "get_current_weather",
                    "content": function_response,
                })  
    print("messages: ", messages)
# step5：将拼接好的messages再次传入chatgpt
second_response = client.chat.completions.create(
    model="gpt-4-1106-preview", messages=messages)

print("second_response: ", second_response)

输出结果：

response_message: # 从chatgpt第一次输出的内容可以看出，它决定调用get_current_weather函数，并且location参数值为Beijing，气温单位为摄氏度
ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_Woqzc7L01FHbb7tu33j1Q1l6', function=Function(arguments='{"location":"Beijing","unit":"celsius"}', name='get_current_weather'), type='function')])

messages:  # 将chatgpt第一次返回的内容和函数调用返回的结果都拼接到messages里，一起传入chatgpt
[{'role': 'system', 'content': "You are a helpful assistant, please answer user's question with Chinese."}, 
{'role': 'user', 'content': '今天北京的天气怎么样？ 可以使用工具来查询，并告诉我查询到的气温是多少，给我一些出行建议。'}, 
ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_Woqzc7L01FHbb7tu33j1Q1l6', function=Function(arguments='{"location":"Beijing","unit":"celsius"}', name='get_current_weather'), type='function')]),
{'tool_call_id': 'call_Woqzc7L01FHbb7tu33j1Q1l6', 'role': 'tool', 'name': 'get_current_weather', 'content': '{"location": "Beijing", "temperature": "2", "unit": "celsius"}'}]

second_response: # chatgpt第二次返回的结果，是根据函数调用返回的结果给出了出行建议
ChatCompletion(
id='chatcmpl-8P8dtR0HLgWeviTZvUlNwlccyJsTW', 
choices=[Choice(finish_reason='stop', 
index=0, 
message=ChatCompletionMessage(
	content='今天北京的气温是2摄氏度，出行建议穿着温暖，可以穿羽绒服或者其他保暖良好的衣物，手套和帽子也是很好的选择，来防止寒冷。如果您骑车或步行，建议务必注意保暖。此外，最好随时关注最新的天气预报，以便适应可能的天气变化。', 
	role='assistant', 
	function_call=None, 
	tool_calls=None))], 
created=1701002113, 
model='gpt-4-1106-preview', 
object='chat.completion',
system_fingerprint='fp_a24b4d720c', 
usage=CompletionUsage(completion_tokens=127,
prompt_tokens=120, 
total_tokens=247)
)

3.嵌入（embeddings）

文本嵌入就是将文本内容编码成向量，利用向量来度量文本间的相似度，可用于搜索、聚类、推荐、分类等任务。OpenAI的嵌入模型是text-embedding-ada-002，它的最大输入token数量为8191，嵌入后向量维度为1536，计算向量距离的方法一般选择余弦相似度

cl100k_base是text-embedding-ada-002模型的tokenizer词汇表，词汇表采用的分词工具为tiktoken，采用的分词方法是bpe

text-embedding-ada-002是一个本文嵌入模型，只能将输入文本转换成向量，要实现向量的存储和检索，还需要向量数据库，推荐的向量存储和检索工具有Milvus, Pinecone

embeddings的api调用示例：

response = client.embeddings.create(
    input="Your text string goes here", # 要向量化的文本
    model="text-embedding-ada-002"
)

print(response)

输出结果：

{
  "data": [
    {
      "embedding": [
        -0.006929283495992422,
        -0.005336422007530928,
        ...
        -4.547132266452536e-05,
        -0.024047505110502243
      ],
      "index": 0,
      "object": "embedding"
    }
  ],
  "model": "text-embedding-ada-002",
  "object": "list",
  "usage": {
    "prompt_tokens": 5,
    "total_tokens": 5
  }
}

4.微调（fine-tuning）

如果chatgpt的返回结果不满足业务需要，用户可以用自己的数据对chatgpt进行微调，可供微调的模型是gpt-3.5，gpt-4需额外申请

微调数据格式：

{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the capital of France?"}, {"role": "assistant", "content": "Paris, as if everyone doesn't know that already."}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "Who wrote 'Romeo and Juliet'?"}, {"role": "assistant", "content": "Oh, just some guy named William Shakespeare. Ever heard of him?"}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "How far is the Moon from Earth?"}, {"role": "assistant", "content": "Around 384,400 kilometers. Give or take a few, like that really matters."}]}

微调api调用示例：

# 上传微调数据
client.files.create(file=open("mydata.jsonl", "rb"), purpose="fine-tune")
# 创建微调任务
client.fine_tuning.jobs.create(training_file="file-abc123", model="gpt-3.5-turbo")
# 使用微调模型推理
response = client.chat.completions.create(
  model="ft:gpt-3.5-turbo:my-org:custom_suffix:id", # 传入微调后的模型名称
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
  ]
)
print(completion.choices[0].message) # 用微调模型进行文生文

5.图像生成（image generation）

图像生成是根据输入文本生成图片的模型，图像生成模型有dall-e-3和dall-e-2，dall-e-3目前只能文生图，而dall-e-2除了文生图之外还可以进行图像编辑和图像变体生成，图像编辑是根据用户输入对图片进行编辑，图像变体生成是根据原始图片生成相似风格和内容的图片

文生图api调用：

response = client.images.generate(
  model="dall-e-3",
  prompt="a white siamese cat",
  size="1024x1024",
  quality="standard",
  n=1,
)
image_url = response.data[0].url

在这里插入图片描述

图像编辑api调用：

response = client.images.edit((
  model="dall-e-2",
  image=open("sunlit_lounge.png", "rb"),
  mask=open("mask.png", "rb"),
  prompt="A sunlit indoor lounge area with a pool containing a flamingo",
  n=1,
  size="1024x1024"
)
image_url = response.data[0].url

在这里插入图片描述

图像变体api调用：

response = client.images.create_variation(
  image=open("image_edit_original.png", "rb"),
  n=2,
  size="1024x1024"
)

image_url = response.data[0].url

6.视觉（vision）

目前gpt-4中的gpt-4-vision-preview可以做图片内容理解，例如描述用户上传的图片里包含的内容，针对图片内容做一些问答

示例图片地址：https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg

在这里插入图片描述

视觉api调用：

response = client.chat.completions.create(
  model="gpt-4-vision-preview",
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "图片里有什么？"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
          },
        },
      ],
    }
  ],
  max_tokens=300,
)

print(response.choices[0].message.content)

输出结果：

这张图片展示了一片开阔的自然景观，中央是一条延伸向远方的木栈道，两旁是茂密的绿色草地。远处可以看到树木和一些灌木丛。天空呈现出淡蓝色并散布着一些细长的白云，整体给人一种宁静和谐的感觉。这个场景可能是一个自然保护区或者公园的一部分，木栈道为游客提供了一条穿越湿地或草地而不破坏自然环境的路径。

7.语音合成（text to speech）

将文本转换成音频，可选的音频包括alloy, echo, fable, onyx, nova, shimmer，语音合成模型是tts-1

语音合成api调用：

speech_file_path = r"D:\Code\z\speech.mp3"
response = client.audio.speech.create(
  model="tts-1",
  voice="alloy", # 指定音频
  input="马看到什么，是由人决定的，心有多大，舞台就有多大。你在别人眼里是什么，归根到底取决于你展示的是什么。"
)

response.stream_to_file(speech_file_path)

8.语音识别（speech to text）

将音频转换成文本，模型是whisper-1

api调用：

audio_file= open(r"D:\Code\z\speech.mp3", "rb")
transcript = client.audio.transcriptions.create(
  model="whisper-1", 
  file=audio_file
)
print(transcript.text)

'马看到什么, 是由人决定的。心有多大, 舞台就有多大。 你在别人眼里是什么, 归根到底取决于你展示的是什么。'

二.ChatGPT写prompt的技巧

获取更好的结果的6种策略

1.写下清晰的指示

在输入中包含尽量详细的信息
让模型扮演一个角色
在输入中用分隔符清晰的指示不同的部分
设定完成任务所需的步骤
提供一些例子
设定所需输出的长度

2.提供参考文本

指示模型使用参考文本回答
指示模型通过引用参考文本来回答

3.将复杂任务拆解成简单子任务

使用意图分类来识别与用户输入最相关的指令
对于对话的上下文太长，总结或过滤以前的对话
分段总结长文档并递归构建完整摘要

4.给模型时间去思考

指示模型在急于得出结论之前找出自己的解决方案
使用内心独白来隐藏模型的推理过程，内心独白可以写在
询问模型在之前的过程中是否遗漏了任何内容

5.使用外部工具

使用基于嵌入的搜索实现高效的知识检索
使用代码执行来执行更精确的计算或调用外部api
赋予模型访问特定功能的权限

6.系统性的测试

参考黄金标准答案评估模型输出

openai cookbook：https://cookbook.openai.com/

总结

openai布局AI应用场景的大部分领域，比如文本生成、图像生成、语音合成、语音识别、多模态模型等。
funciont call能力的实现大大提高了chatgpt的应用能力，用户可以充分发挥想象力，做一些实用的工具，例如让工具查询天气，读取文件，分析数据等等，从而实现一个强大的AI Agent

金闪闪_Li

关注

3
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录