Calculation of pricing of tokens in OPENAI calls

营赢盈英

于 2024-07-30 10:31:59 发布

阅读量512

点赞数 10

分类专栏： AI 文章标签：人工智能 python token openai api langchain qdrant

本文链接：https://blog.csdn.net/suiusoar/article/details/140788087

版权

AI 专栏收录该内容

86 篇文章 0 订阅

订阅专栏

题意：在OpenAI调用中计算代币的定价

问题背景：

I'm trying to price the tokens used in a call to OPENAI. I have a txt file with plain text that was uploaded to Qdrant. When I ask the following question:

我正在尝试为调用OpenAI时使用的代币定价。我有一个包含纯文本的txt文件，该文件已上传到Qdrant。当我提出以下问题时：

Who is Michael Jordan? "Who is Michael Jordan?" 的翻译为：“谁是迈克尔·乔丹？”

and use the get_openai_callback function to track the number of tokens and the price of the operation, one of the keys of information in the output doesn't make sense to me.

并使用 get_openai_callback 函数来跟踪操作中的代币数量和价格，但输出中的一个信息键对我来说没有意义。

Tokens Used: 85
    Prompt Tokens: 68
    Completion Tokens: 17
Successful Requests: 1
Total Cost (USD): $0.00013600000000000003

Why does the Prompt Tokens value differ from the input value? The amount of tokens in the input text (which is what I understand as Prompt Token) is:

为什么Prompt Tokens的值与输入值不同？输入文本中的代币数量（这是我理解的Prompt Token）是：

query = 'Who is Michael Jordan'

encoding = tiktoken.encoding_for_model('gpt-3.5-turbo-instruct')
print(f"Tokens: {len(encoding.encode(query))}")

4

, but the output in the response is like 68. I considered the idea that Prompt Tokens were the sum of the base tokens (txt file) added to the question tokens, but the math doesn't fit.

但是响应中的输出是68。我考虑过Prompt Tokens是基本代币（txt文件）和问题代币的总和的想法，但数学上并不吻合。

Number of tokens in the txt file: 17 txt文件中的代币数量：17

Arquivo txt: 'Michael Jeffrey Jordan is an American businessman and former basketball player who played as a shooting guard'

txt文件内容：‘迈克尔·杰弗里·乔丹（Michael Jeffrey Jordan）是一名美国商人和前篮球运动员，司职得分后卫

query + file_token: 21 (4+17)

Could anyone help me understand the pricing calculation?

有人能帮我理解一下定价计算吗？

I tried to search OPENAI's own documentation, github and other forums, but I don't think it's easy to find information or that it's open to the public. I want to understand if I'm missing something or if it's a calculation that users don't have access to.

我尝试搜索OpenAI自己的文档、GitHub和其他论坛，但我认为不容易找到相关信息，或者这些信息并不对公众开放。我想知道是我遗漏了什么，还是这种计算方式是用户无法访问的。

UPDATE For any future questions from other users: 更新：对于其他用户未来的任何问题：

import langchain 
langchain.debug = True

Run the get_openai_callback() function and see the entire log appear on the screen. The value of the "prompts" key is a list containing a string that is the instruction on how the response should be given. The number of tokens for this prompt is the value that appears in the Prompt Tokens.

运行get_openai_callback()函数，并查看屏幕上出现的完整日志。'prompts'键的值是一个列表，其中包含一个字符串，该字符串是关于如何给出响应的指令。这个提示的代币数量就是Prompt Tokens中显示的值。

问题解决：

Prompt Tokens includes your question and any context provided, plus additional system messages and formatting added by the API. While Completion Tokens generated in the response.

Prompt Tokens包括您的问题和提供的任何上下文，以及API添加的额外系统消息和格式。而Completion Tokens则是在响应中生成的。

In your example: 在你的示例中：

Visible Query: Who is Michael Jordan? (4 tokens) Text from File: Michael Jeffrey Jordan is an American businessman and former basketball player who played as a shooting guard (17 tokens) Expected: 4+17=21 4+17=21 tokens.

可见查询：迈克尔·乔丹是谁？（4个代币）文件中的文本：迈克尔·杰弗里·乔丹是一名美国商人和前篮球运动员，司职得分后卫（17个代币）预期：4+17=21 共有21个代币。

However, you see 68 prompt tokens because the API adds tokens for roles, instructions, and other metadata.To understand the exact token count, you can log the full request payload or use OpenAI's token counting tools. This extra context explains why the prompt token count is higher than expected.

然而，您看到68个提示代币，因为API为角色、指令和其他元数据添加了代币。为了了解确切的代币数量，您可以记录完整的请求负载或使用OpenAI的代币计数工具。这个额外的上下文解释了为什么提示代币的数量高于预期。