OpenAI API 中文文档 - 速率限制

最新推荐文章于 2025-01-17 16:07:46 发布

F2API

最新推荐文章于 2025-01-17 16:07:46 发布

阅读量1.9k

点赞数 1

分类专栏： OpenAI API 中文文档文章标签：服务器网络数据库

OpenAI API 中文文档专栏收录该内容

13 篇文章

订阅专栏

速率限制

概述

什么是速率限制？

速率限制是 API 对用户或客户端在指定时间段内可以访问服务器的次数施加的限制。

为什么我们有速率限制？

速率限制是 API 的常见做法，实施速率限制有几个不同的原因：

**它们有助于防止滥用或误用 API。**例如，恶意参与者可能会用请求淹没 API，试图使其过载或导致服务中断。通过设置速率限制，OpenAI可以防止此类活动。
**速率限制有助于确保每个人都可以公平访问 API。**如果一个人或组织发出过多的请求，则可能会使其他人的API陷入困境。通过限制单个用户可以发出的请求数量，OpenAI 确保最多的人有机会使用 API，而不会遇到速度变慢的情况。
**速率限制可以帮助 OpenAI 管理其基础设施上的总负载。**如果对 API 的请求急剧增加，则可能会对服务器造成负担并导致性能问题。通过设置速率限制，OpenAI 可以帮助为所有用户保持流畅一致的体验。

请完整阅读本文档，以更好地了解OpenAI的限速系统的工作原理。我们包括代码示例和可能的解决方案来处理常见问题。建议在填写速率限制提高申请表之前遵循本指南，并在上一节中详细说明如何填写该表单。

我们的 API 的速率限制是什么？

您可以在账户管理页面的速率限制部分下查看组织的速率限制。

我们会根据所使用的特定终端节点以及您拥有的帐户类型，在组织级别（而不是用户级别）强制实施速率限制。速率限制以两种方式度量：RPM（每分钟请求数）和 TPM（每分钟令牌数）。下表突出显示了我们 API 的默认速率限制，但在填写速率限制提高请求表单后，这些限制可能会根据您的使用案例增加。

TPM（每分钟令牌数）单位因型号而异：

类型	1 TPM 等于
达芬奇	每分钟 1 个代币
居里	每分钟 25 个代币
巴贝奇	每分钟 100 个代币
阿达	每分钟 200 个代币

实际上，这意味着您每分钟可以向模型发送大约 200 倍的令牌。ada``davinci

	文本和嵌入	聊天
免费试用用户	3 转/分
150，000 TPM	3 转/分
40，000 TPM	3 转/分
40，000 TPM	3 转/分
150，000 TPM	5张/分钟	3 转/分
即用即付用户（前 48 小时）	60 转/分
250，000 TPM	60 转/分
60，000 TPM	20 转/分
40，000 TPM	20 转/分
150，000 TPM	50张/分钟	50 转/分
即用即付用户（48 小时后）	3，500 转/分
350，000 胎压监测系统	3，500 转/分
90，000 胎压监测系统	20 转/分
40，000 TPM	20 转/分
150，000 TPM	50张/分钟	50 转/分

重要的是要注意，任何一种选择都可以达到速率限制，具体取决于首先发生的情况。例如，你可以将 20 个请求（只有 100 个令牌）发送到 Codex 终结点，这将填满你的限制，即使你未在这 40 个请求中发送 20k 个令牌也是如此。

在 GPT-4 的有限测试版推出期间，该模型将具有更严格的速率限制以满足需求。/ 的默认速率限制为 40k TPM 和 200 RPM。/ 的默认速率限制为 80k TPM 和 400 RPM。**由于容量限制，我们无法满足提高速率限制的请求。**在当前状态下，该模型旨在用于实验和原型设计，而不是大批量生产用例。gpt-4``gpt-4-0314``gpt-4-32k``gpt-4-32k-0314

速率限制如何工作？

如果您的速率限制为每分钟 60 个请求和每分钟 150k 个令牌，则您将受到达到请求/分钟上限或令牌用完的限制（以先发生者为准）。例如，如果每分钟的最大请求数为 60，则每秒应该能够发送 1 个请求。如果每 1 毫秒发送 800 个请求，一旦达到速率限制，只需让程序休眠 200 毫秒即可再发送一个请求，否则后续请求将失败。默认为 3，000 个请求/分钟，客户可以每 1 毫秒或每 .20 秒有效发送 02 个请求。davinci

如果我遇到速率限制错误会怎样？

速率限制错误如下所示：

在组织组织 {id} 中，每分钟请求数达到默认文本-davinci-002 的速率限制。限制：20.000000 / 分钟。电流：24.000000 /分钟

如果达到速率限制，则表示您在短时间内发出了太多请求，并且 API 拒绝满足进一步的请求，直到经过指定的时间量。

速率限制与max_tokens

我们提供的每个模型都有有限数量的令牌，可以在发出请求时作为输入传入。不能增加模型接收的最大令牌数。例如，如果使用，则每个请求可以发送到此模型的最大令牌数为 2，048 个令牌。text-ada-001

错误缓解

我可以采取哪些步骤来缓解这种情况？

OpenAI Cookbook有一个python笔记本，解释了如何避免速率限制错误的详细信息。

在提供编程访问、批量处理功能和自动社交媒体发布时，还应谨慎行事 - 请考虑仅为受信任的客户启用这些功能。

若要防止自动和大量滥用，请在指定时间范围（每天、每周或每月）内为单个用户设置使用限制。请考虑为超出限制的用户实施硬上限或手动审核流程。

使用指数退避重试

避免速率限制错误的一种简单方法是使用随机指数退避自动重试请求。使用指数退避重试意味着在达到速率限制错误时执行短暂睡眠，然后重试不成功的请求。如果请求仍然不成功，则增加睡眠长度并重复该过程。这一直持续到请求成功或达到最大重试次数。这种方法有很多好处：

自动重试意味着您可以从速率限制错误中恢复，而不会崩溃或丢失数据
指数退避意味着可以快速尝试首次重试，同时在前几次重试失败时仍受益于更长的延迟
将随机抖动添加到延迟有助于同时重试所有命中。

请注意，不成功的请求会增加每分钟数限制，因此连续重新发送请求将不起作用。

下面是一些使用指数退避的 Python 解决方案示例。

示例 #1：使用 Tenacity 库

Tenacity is an Apache 2.0 licensed general-purpose retrying library, written in Python, to simplify the task of adding retry behavior to just about anything. To add exponential backoff to your requests, you can use the decorator. The below example uses the function to add random exponential backoff to a request.tenacity.retry``tenacity.wait_random_exponential

Using the Tenacity library

python

Select librarypython

Copy


import openai
from tenacity import (
    retry,
    stop_after_attempt,
    wait_random_exponential,
)  # for exponential backoff
 
@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6))
def completion_with_backoff(**kwargs):
    return openai.Completion.create(**kwargs)
 
completion_with_backoff(model="text-davinci-003", prompt="Once upon a time,")

Note that the Tenacity library is a third-party tool, and OpenAI makes no guarantees about its reliability or security.

Collapse

示例 #2：使用退避库

Another python library that provides function decorators for backoff and retry is backoff:

Using the Tenacity library

python

Select librarypython

Copy

import backoff 
import openai 
@backoff.on_exception(backoff.expo, openai.error.RateLimitError)
def completions_with_backoff(**kwargs):
    return openai.Completion.create(**kwargs)
 
completions_with_backoff(model="text-davinci-003", prompt="Once upon a time,")

Like Tenacity, the backoff library is a third-party tool, and OpenAI makes no guarantees about its reliability or security.

示例 3：手动退避实现

If you don’t want to use third-party libraries, you can implement your own backoff logic following this example:

Using manual backoff implementation

python

Select librarypython

# imports
import random
import time
 
import openai
 
# define a retry decorator
def retry_with_exponential_backoff(
    func,
    initial_delay: float = 1,
    exponential_base: float = 2,
    jitter: bool = True,
    max_retries: int = 10,
    errors: tuple = (openai.error.RateLimitError,),
):
    """Retry a function with exponential backoff."""
 
    def wrapper(*args, **kwargs):
        # Initialize variables
        num_retries = 0
        delay = initial_delay
 
        # Loop until a successful response or max_retries is hit or an exception is raised
        while True:
            try:
                return func(*args, **kwargs)
 
            # Retry on specific errors
            except errors as e:
                # Increment retries
                num_retries += 1
 
                # Check if max retries has been reached
                if num_retries > max_retries:
                    raise Exception(
                        f"Maximum number of retries ({max_retries}) exceeded."
                    )
 
                # Increment the delay
                delay *= exponential_base * (1 + jitter * random.random())
 
                # Sleep for the delay
                time.sleep(delay)
 
            # Raise exceptions for any errors not specified
            except Exception as e:
                raise e
 
    return wrapper
    
@retry_with_exponential_backoff
def completions_with_backoff(**kwargs):
    return openai.Completion.create(**kwargs)

Again, OpenAI makes no guarantees on the security or efficiency of this solution but it can be a good starting place for your own solution.

批处理请求

OpenAI API 对每分钟请求数和每分钟令牌数有单独的限制。

如果达到每分钟请求数的限制，但每分钟令牌具有可用容量，则可以通过将多个任务批处理到每个请求中来提高吞吐量。这将允许您每分钟处理更多代币，尤其是对于我们较小的型号。

发送一批提示的工作方式与普通 API 调用完全相同，只是您将字符串列表传递给提示参数而不是单个字符串。

无批处理的示例

No batching

python

Select librarypython

Copy


import openai
 
num_stories = 10
prompt = "Once upon a time,"
 
# serial example, with one story completion per request
for _ in range(num_stories):
    response = openai.Completion.create(
        model="curie",
        prompt=prompt,
        max_tokens=20,
    )
    # print story
    print(prompt + response.choices[0].text)

批处理示例

Batching

python

Select librarypython

import openai  # for making OpenAI API requests
 
 
num_stories = 10
prompts = ["Once upon a time,"] * num_stories
 
# batched example, with 10 story completions per request
response = openai.Completion.create(
    model="curie",
    prompt=prompts,
    max_tokens=20,
)
 
# match completions to prompts by index
stories = [""] * len(prompts)
for choice in response.choices:
    stories[choice.index] = prompts[choice.index] + choice.text
 
# print stories
for story in stories:
    print(story)

警告：响应对象可能不会按提示顺序返回完成，因此请始终记住使用索引字段将响应匹配回提示。

以下是您如何填写此表单的一些示例：

DALL-E API 示例

Model	Estimate Tokens/Minute	Estimate Requests/Minute	# of users	Evidence of need	1 hour max throughput cost
DALL-E API	N/A	50	1000	Our app is currently in production and based on our past traffic, we make about 10 requests per minute.	$60
DALL-E API	N/A	150	10,000	Our app is gaining traction in the App Store and we’re starting to hit rate limits. Can we get triple the default limit of 50 img/min? If we need more we’ll submit a new form. Thanks!	$180

Collapse

语言模型示例

Model	Estimate Tokens/Minute	Estimate Requests/Minute	# of users	Evidence of need	1 hour max throughput cost
text-davinci-003	325,000	4,000	50	We’re releasing to an initial group of alpha testers and need a higher limit to accommodate their initial usage. We have a link here to our google drive which shows analytics and api usage.	$390
text-davinci-002	750,000	10,000	10,000	Our application is receiving a lot of interest; we have 50,000 people on our waitlist. We’d like to roll out to groups of 1,000 people/day until we reach 50,000 users. Please see this link of our current token/minute traffic over the past 30 days. This is for 500 users, and based on their usage, we think 750,000 tokens/minute and 10,000 requests/minute will work as a good starting point.	$900

Collapse

代码模型示例

Model	Estimate Tokens/Minute	Estimate Requests/Minute	# of users	Evidence of need	1 hour max throughput cost
code-davinci-002	150,000	1,000	15	We are a group of researchers working on a paper. We estimate that we will need a higher rate limit on code-davinci-002 in order to complete our research before the end of the month. These estimates are based on the following calculation […]	Codex models are currently in free beta so we may not be able to provide immediate increases for these models.

Collapse

OpenAI API 中文文档 - 速率限制

速率限制

概述

什么是速率限制？

为什么我们有速率限制？

我们的 API 的速率限制是什么？

GPT-4 速率限制

速率限制如何工作？

如果我遇到速率限制错误会怎样？

错误缓解

我可以采取哪些步骤来缓解这种情况？

使用指数退避重试

请求增加

我应该考虑何时申请提高限价？

我的速率限制提高请求会被拒绝吗？

我已经为我的文本/代码 API 实现了指数退避，但我仍然遇到此错误。如何提高速率限制？