软件的动态自修改是 AI 为软件开发带来的最大机会之一

本文分析了 OpenAI 的 o1 模型在复杂任务推理中的表现,特别是在将客户服务政策转化为可执行程序方面。通过与 GPT-4 和 GPT-4o 的对比,发现 o1 在逻辑能力和整体把握上有所提升,但在复杂任务中并未显示出显著优势。文章还深度探讨了 o1 对软件开发的潜在影响,包括设计时和运行时的推理能力,并指出大模型可能为软件带来动态自修改的重要机遇。然而,这种能力的实际应用效果仍需进一步研究和人类监督。

作者 | 卢威

责编 | 王启隆

出品 | 《新程序员​​​​​​​》编辑部

59956b13210f2c072e383ec7e39bdd15.jpeg

近日,OpenAI 公司发布的 o1 模型声称其大大提高了模型的推理能力,大致的原理如下图 1,是在推理时输出正式的结果之前生成思维链(CoT),并且可能会生成多条链或者链上一个步骤的多个可能,有一个评价(又称奖励,reward)机制决定哪个链条更可能帮助生成更好的输出。至于现在很多 Agent(智能体)中实现的虚线部分,用于进一步提高整个系统的成功率,o1 应该还没有实现。

f402df12e748de63f19602ea04221ea3.png

图 1  o1 的推理过程

模型如何学会生成中间思维链呢?估计是通过在训练数据中加入大量的合成思维链,因为为各种问题手工编写正确的思维链实在太费人工了。但是,合成思维链的质量很难保证,除了数学和编程算法竞赛这些任务有辅助工具——符号证明器或代码编译、解释器——可以比较容易地判断一个思维过程是否正确外。

以上的原理和对应的实际表现在《全面测试 OpenAI o1:逻辑推理天衣无缝,空间推理一碰就碎?》一文中有深入的分析。

下面我们分析 OpenAI 官方提供的一个例子:Using reasoning for routine generation,因为它体现了 o1 着力提高的能力对未来软件开发的意义。

官方示例链接:https://cookbook.openai.com/examples/o1/using_reasoning_for_routine_generation

054b6607dd4b1bea33c91f64b4e50512.png

使用 o1 转化客户服务政策

首先,核心任务是让 o1 根据一份指示,把一份客户服务帮助文档转化为可由(另一个)大模型执行的程序。这与我们在《大语言模型需要自己的语言吗?》一文中讨论的“中间态”是同样的思想。

提示词(Prompt)的第一部分是以下固定的指示,最后再拼接一个可变的客户服务政策(POLICY)。

You are a helpful assistant tasked with taking an external facing help center article and converting it into a internal-facing programmatically executable routine optimized for an LLM. 

The LLM using this routine will be tasked with reading the policy, answering incoming questions from customers, and helping drive the case toward resolution. 

Please follow these instructions: 

1. **Review the customer service policy carefully** to ensure every step is accounted for. It is crucial not to skip any steps or policies. 

2. **Organize the instructions into a logical, step-by-step order**, using the specified format. 

3. **Use the following format**: 

- **Main actions are numbered** (e.g., 1, 2, 3). 

- **Sub-actions are lettered** under their relevant main actions (e.g., 1a, 1b). 

    **Sub-actions should start on new lines** 

- **Specify conditions using clear 'if...then...else' statements** (e.g., 'If the product was purchased within 30 days, then…’). 

- **For instructions that require more information from the customer**, provide polite and professional prompts to ask for additional information. 

- **For actions that require data from external systems**, write a step to call a function using backticks for the function name (e.g., `call the check_delivery_date function`). 

    - **If a step requires the customer service agent to take an action** (e.g., process a refund), generate a function call for this action (e.g., `call the process_refund function`). 

    - **Define any new functions** by providing a brief description of their purpose and required parameters. 

- **If there is an action an assistant can performon behalf of the user**, include a function call for this action (e.g., `call the change_email_address function`), and ensure the function is defined with its purpose and required parameters. 

    - This action may not be explicitly defined in the help center article, but can be done to help the user resolve their inquiry faster 

- **The step prior to case resolution should always be to ask if there is anything more you can assist with**. 

- **End with a final action for case resolution**: calling the `case_resolution` function should always be the final step. 

4. **Ensure compliance** by making sure all steps adhere to company policies, privacy regulations, and legal requirements. 

5. **Handle exceptions or escalations** by specifying steps for scenarios that fall outside the standard policy. 

**Important**: If at any point you are uncertain, respond with "I don't know." 

Please convert the customer service policy into the formatted routine, ensuring it is easy to follow and execute programmatically. 

POLICY:

该提示词的中文翻译如下:

您是一位乐于助人的助手,负责将面向外部的帮助中心文章转换为面向内部的、针对 LLM 优化的、可编程执行的例程。

使用此例程的 LLM 将负责阅读政策、回答客户提出的问题,并帮助推动案件得到解决。

请遵循以下指示:

1. **仔细查看客户服务政策**,确保每一步都得到考虑。不要跳过任何步骤或政策,这一点至关重要。

2. **使用指定的格式,将说明组织成合乎逻辑的、循序渐进的顺序**。

3. **使用以下格式**:

- **主要操作已编号**(例如,1、2、3)。

- **子操作在其相关主要操作下以字母表示**(例如,1a、1b)。

**子操作应在新行开始**

- **使用清晰的“if...then...else”语句指定条件**(例如,“如果产品是在 30 天内购买的,则……”)。

- **对于需要客户提供更多信息的说明**,提供礼貌且专业的提示以询问更多信息。

- **对于需要外部系统数据的操作**,编写一个步骤来调用函数,并使用反引号作为函数名称(例如,`调用 check_delivery_date 函数`)。

- **如果某个步骤需要客户服务代理采取行动**(例如处理退款),请为此操作生成函数调用(例如,`调用 process_refund 函数`)。

- **定义任何新功能**,提供其用途和所需参数的简要说明。

- **如果助理可以代表用户执行某项操作**,请包含此操作的函数调用(例如,`调用 change_email_address 函数`),并确保该函数已定义其目的和所需参数。

- 帮助中心文章中可能未明确定义此操作,但可以执行此操作以帮助用户更快地解决他们的查询

- **案例解决之前的步骤应始终是询问是否还有其他您可以提供帮助的内容**。

- **以案例解决的最终操作结束**:调用`case_resolution`函数应始终是最后一步。

4. **确保合规**,确保所有步骤均遵守公司政策、隐私法规和法律要求。

5. **处理异常或升级**,指定超出标准政策的场景的步骤。

**重要**:如果您在任何时候不确定,请回答“我不知道”。

请将客户服务政策转换为格式化的例程,确保其易于遵循和以编程方式执行。

政策:

在这个例子中,使用了 4 个 OpenAI 用户服务政策的文档来测试。由于篇幅所限,我们只分析三个,而且从 o1 与 GPT-4/4o 有重要差别的例子开始。

第一个例子

要转化的服务政策如下:

How can I set up prepaid billing?

How it works

Prepaid billing allows API users to pre-purchase usage. The credits you've bought will be applied to your monthly invoice. This means that any API usage you incur will first be deducted from the prepaid credits. If your usage exceeds the credits you've purchased, you'll then be billed for the additional amount.

Prepaid billing helps developers know what they are committing to upfront which can provide more predictability for budgeting and spend management. 

Setting up prepaid billing

If you're on a Monthly Billing plan, you may also choose to switch to prepaid billing and purchase credits upfront for API usage. 

- Go to your billing overview in your account settings

- Click "Start payment plan" (you may see variations like "Buy credits")

Note: If you previously had an arrears billing plan, you'll need to cancel this existing payment plan first.

- Choose the initial amount of credits you want to purchase. The minimum purchase is $5. The maximum purchase will be based on your trust tier.

- Confirm and purchase your initial amount of credits.

- Use auto-recharge to set an automatic recharge amount, which is the amount of credits that will be added to your account when your balance falls below a set threshold.

Please note that any purchased credits will expire after 1 year and they are non-refundable. 

After you’ve purchased credits, you should be able to start using the API. Note that there may be a couple minutes of delay while our systems update to reflect your credit balance.

Purchasing additional credits

Once you’ve consumed all your credits, your API requests will start returning an error letting you know you’ve hit your billing quota. If you’d like to continue your API usage, you can return to the billing portal and use the “Add to balance” button to purchase additional credits.

Delayed billing

Due to the complexity of our billing and processing systems, there may be delays in our ability to cut off access after you consume all of your credits. This excess usage may appear as a negative credit balance in your billing dashboard, and will be deducted from your next credit purchase.

该提示词的中文翻译如下:

如何设置预付费账单?

工作原理

预付费账单允许 API 用户预先购买使用量。您购买的信用额将计入您的月度发票。这意味着您产生的任何 API 使用量都将首先从预付费信用额中扣除。如果您的使用量超出了您购买的信用额,那么您将需要支付额外的金额。

预付费账单可以帮助开发人员了解他们预先承诺的内容,从而为预算和支出管理提供更多的可预测性。

设置预付费账单

如果您使用的是按月计费计划,您也可以选择切换到预付费账单并预先购买 API 使用信用额。

- 转到帐户设置中的账单概览

- 单击“开始付款计划”(您可能会看到“购买信用额”等变体)

注意:如果您之前有欠款账单计划,则需要先取消此现有付款计划。

- 选择您要购买的初始信用额。最低购买金额为 5 美元。最大购买量将取决于您的信任等级。

- 确认并购买您的初始信用额度。

- 使用自动充值设置自动充值金额,即当您的余额低于设定的阈值时将添加到您帐户的信用额度。

请注意,任何购买的信用额度将在 1 年后过期,并且不可退款。

购买信用额度后,您应该可以开始使用 API。请注意,我们的系统更新以反映您的信用余额时可能会有几分钟的延迟。

购买额外信用额度

一旦您用完了所有信用额度,您的 API 请求将开始返回错误,告知您已达到计费配额。如果您想继续使用 API,可以返回计费门户并使用“添加到余额”按钮购买额外信用额度。

延迟计费

由于我们的计费和处理系统的复杂性,在您用完所有信用额度后,我们可能会延迟切断访问。此超额使用量可能会在您的账单仪表板中显示为负信用余额,并将从您的下次信用购买中扣除。

略过 o1 的中间推理,最终输出:

1. `call check_billing_plan(user_id)`

- **Function:** `check_billing_plan(user_id)`

- **Purpose:** Retrieves the user's current billing plan (e.g., Monthly Billing, Prepaid Billing, or Arrears Billing).

- **Parameters:**

- `user_id`: The unique identifier of the user.

2. If the user has an arrears billing plan:

2a. Inform the user: "Please note that since you have an arrears billing plan, you'll need to cancel your existing payment plan before switching to prepaid billing. Would you like assistance with cancelling your current plan?"

2b. If the user agrees, `call cancel_payment_plan(user_id)`

- **Function:** `cancel_payment_plan(user_id)`

- **Purpose:** Cancels the user's current arrears billing plan.

- **Parameters:**

- `user_id`: The unique identifier of the user.

3. Guide the user to set up prepaid billing:

3a. Instruct the user: "Please go to your billing overview in your account settings."

3b. Instruct the user: "Click on 'Start payment plan' (you may see variations like 'Buy credits')."

3c. Inform the user: "Choose the initial amount of credits you want to purchase. The minimum purchase is $5, and the maximum purchase will be based on your trust tier."

3d. Instruct the user: "Confirm and purchase your initial amount of credits."

3e. Suggest to the user: "You can set up auto-recharge to automatically add credits to your account when your balance falls below a set threshold."

4. Inform the user about credit expiration and refund policy:

4a. Inform the user: "Please note that any purchased credits will expire after 1 year and they are non-refundable."

5. Inform the user about activation time:

5a. Inform the user: "After you’ve purchased credits, you should be able to start using the API. Note that there may be a couple of minutes delay while our systems update to reflect your credit balance."

6. Ask the user: "Is there anything else I can assist you with today?"

7. If the user has no further questions, `call case_resolution()`

- **Function:** `case_resolution()`

- **Purpose:** Marks the case as resolved and ends the interaction

以及 GPT-4 和 GPT-4o 的结果如下:

GPT-4 的结果:https://chatgpt.com/share/66f27544-9ab4-8011-9e1c-63b8bd4999e9

GPT-4o 的结果:https://chatgpt.com/share/66f27473-eac0-8011-98b5-62269b683aa3

排除提示词中“require data from external systems”的描述没有清楚地定义什么是外部系统,可能是一个导致 GPT-4/4o 和 o1 生成不同输出的原因,总体来说 o1 都表现得更“积极”地设计动作(action)和对应的函数(function)。

GPT-4 和 4o(在第 1 步)没能正确处理“arrears billing plan”的情况。o1 表现出了更强的逻辑能力,它不仅仅根据提示词的文字描述顺序来转化为输出,而是根据整个输入重新组织了逻辑。这一点在《全面测试 OpenAI o1:逻辑推理天衣无缝,空间推理一碰就碎?》中分配座位的测试也得到了体现。

第二个例子

另一个要转化的服务政策如下:

How do I delete my payment method?

Updated over a week ago

We keep your payment method on file to cover any outstanding charges on your account. To stop charges to your payment method, please follow the steps below.

## ChatGPT 

You can cancel your ChatGPT Plus subscription to stop further charges at any time: 

Click on 'My Plan' in the ChatGPT sidebar. 

Click on 'Manage my subscription' in the pop-up window.

Select 'Cancel Plan'. 

Please note that your cancellation will take effect the day after the next billing date, and you can continue using our services until then. To avoid being charged for your next billing period, please cancel your subscription at least 24 hours before your next billing date. 

## API

We'll need to keep a payment method on file to account for any outstanding usage costs. You're welcome to cancel your pay-as-you-go service, by clicking 'Cancel paid account' in your billing overview. After the current month's invoice has been issued, the current card will no longer be charged. 

If you'd like to continue using the service, add a new payment method in the billing overview page and select 'Set as default'. You'll then be able to delete the old payment method.

中文翻译如下:

如何删除我的付款方式?

一周前更新

我们会保留您的付款方式,以支付您帐户中未支付的费用。要停止向您的付款方式收费,请按照以下步骤操作。

## ChatGPT

您可以随时取消 ChatGPT Plus 订阅以停止进一步收费:

单击 ChatGPT 侧栏中的“我的计划”。

单击弹出窗口中的“管理我的订阅”。

选择“取消计划”。

请注意,您的取消将在下一个结算日期后的第二天生效,在此之前您可以继续使用我们的服务。为避免在下一个结算期被收费,请在下一个结算日期前至少 24 小时取消订阅。

## API

我们需要保留付款方式,以支付任何未支付的使用费用。欢迎您通过单击账单概览中的“取消付费帐户”来取消您的即用即付服务。本月的账单开具后,当前卡将不再扣款。

如果您想继续使用该服务,请在账单概览页面中添加新的付款方式,然后选择“设为默认”。然后,您就可以删除旧的付款方式。

o1 的最终输出如下

1. Verify the customer's account.

a. Politely ask the customer for their email address or account ID to locate their account.

b. `call the verify_customer_account(email_or_account_id)`.

2. Verify the customer's identity.

a. Politely ask the customer to provide security information to confirm their identity (e.g., the last four digits of the payment method on file).

b. `call the verify_customer_identity(account_id, security_information)`.

c. If the customer's identity cannot be verified, then:

    - Inform the customer that we are unable to proceed without identity verification for security reasons.

    - Provide guidance on how they can verify their identity.

    - Proceed to step 6.

3. Determine the customer's account type.

a. `call the check_account_type(account_id)`.

4. If the customer is a ChatGPT Plus subscriber, then:

a. Ask the customer if they would like assistance with canceling their ChatGPT Plus subscription.

b. If the customer agrees, then:

    - `call the cancel_subscription(account_id)`.

    - Inform the customer that their subscription has been canceled and the cancellation will take effect the day after the next billing date.

    - Remind the customer that they can continue using our services until then.

c. Else:

    - Provide the following steps for the customer to cancel their subscription:

    - Click on **'My Plan'** in the ChatGPT sidebar.

    - Click on **'Manage my subscription'** in the pop-up window.

    - Select **'Cancel Plan'**.

    - Inform the customer about the cancellation effective date and continued access until then.

    - Advise the customer to cancel at least 24 hours before their next billing date to avoid being charged for the next billing period.

5. Else if the customer is an API user, then:

a. Inform the customer that we need to keep a payment method on file to account for any outstanding usage costs.

b. Ask the customer if they would like assistance with canceling their pay-as-you-go service.

c. If the customer agrees, then:

    - `call the cancel_paid_account(account_id)`.

    - Inform the customer that after the current month's invoice has been issued, the current card will no longer be charged.

d. Else:

    - Provide the following steps for the customer to cancel their pay-as-you-go service:

    - Go to the **billing overview** page.

    - Click on **'Cancel paid account'**.

    - Inform the customer that after the current month's invoice has been issued, the current card will no longer be charged.

e. If the customer wants to continue using the service but change the payment method:

    - Ask the customer if they would like assistance with adding a new payment method and setting it as default.

    - If the customer agrees:

    - Politely request the new payment method details.

    - `call the add_payment_method(account_id, payment_details)`.

    - `call the set_default_payment_method(account_id, new_payment_method_id)`.

    - `call the delete_payment_method(account_id, old_payment_method_id)`.

    - Inform the customer that the old payment method has been deleted and the new one is set as default.

- Else:

    - Instruct the customer to add a new payment method in the billing overview page.

    - Ask them to select **'Set as default'** for the new payment method.

    - Inform them that they can then delete the old payment method.

6. Ask the customer if there is anything else you can assist them with.

7. `call the case_resolution()`.

---

**Function Definitions:**

- `verify_customer_account(email_or_account_id)`: Verifies the customer's account using their email address or account ID. 

**Parameters:** `email_or_account_id`

- `verify_customer_identity(account_id, security_information)`: Confirms the customer's identity using security information. 

**Parameters:** `account_id`, `security_information`

- `check_account_type(account_id)`: Retrieves the customer's account type (ChatGPT Plus subscriber or API user). 

**Parameters:** `account_id`

- `cancel_subscription(account_id)`: Cancels the ChatGPT Plus subscription for the customer. 

**Parameters:** `account_id`

- `cancel_paid_account(account_id)`: Cancels the API pay-as-you-go service for the customer. 

**Parameters:** `account_id`

- `add_payment_method(account_id, payment_details)`: Adds a new payment method to the customer's account. 

**Parameters:** `account_id`, `payment_details`

- `set_default_payment_method(account_id, payment_method_id)`: Sets a payment method as the default for the customer. 

**Parameters:** `account_id`, `payment_method_id`

- `delete_payment_method(account_id, payment_method_id)`: Deletes a payment method from the customer's account. 

**Parameters:** `account_id`, `payment_method_id`

- `case_resolution()`: Resolves the case and marks it as completed.

以及 GPT-4 和 GPT-4o 的结果如下:

GPT-4 的结果:https://chatgpt.com/share/66f27544-9ab4-8011-9e1c-63b8bd4999e9

GPT-4o 的结果:https://chatgpt.com/share/66f275d6-7678-8011-b7d8-9770e0ff4281

o1 的第 5 步其实是错的,只有第 4 步判断不是 ChatGPT Plus 订户才会执行到这里。GPT-4 和 4o 则都会让非 plus 用户也可能选择做此处理。o1 在上例中表现出的根据输入整体上构造逻辑关系的过程在此把并行关系错误构造为二选一关系。

第三个例子

再分析一个要转化的服务政策:

How can I get a Business Associate Agreement (BAA) with OpenAI?

Information about HIPAA compliance for healthcare companies

The Health Insurance Portability and Accountability Act (HIPAA) is a U.S. federal law that requires privacy and security protections for protected health information (PHI). Our API platform can be a great fit for any covered entity or business associate looking to process protected health information, and we’d be happy to assist you in fulfilling your HIPAA compliance. To use our API platform, you’ll first need a BAA with OpenAI.

How do I get started?

If you require a BAA before you can use our API, email us at baa@openai.com with details about your company and use case.

Our team will respond within 1-2 business days. We review each BAA request on a case-by-case basis and may need additional information. The process is usually completed within a few business days.

Can I get a BAA for ChatGPT?

If you're interested in exploring a BAA for ChatGPT Enterprise, please contact sales.

What happens if I’m not approved?

We are able to approve most customers that request BAAs, but occasionally a use case doesn’t pass our team's evaluation. In that case, we’ll give feedback and context as to why that is and give you the opportunity to update your intended use of our API and re-apply.

Are all API services covered by the BAA?

No, only endpoints that are eligible for zero retention are covered by the BAA. You can see a list of those endpoints. 

Is an enterprise agreement requirement to sign a BAA?

No, an enterprise agreement is not required to sign a BAA.

中文翻译:

如何获得与 OpenAI 的业务伙伴协议 (BAA)?

有关医疗保健公司遵守 HIPAA 的信息

《健康保险流通与责任法案》(HIPAA) 是美国联邦法律,要求对受保护的健康信息 (PHI) 进行隐私和安全保护。我们的 API 平台非常适合任何希望处理受保护健康信息的受保实体或业务伙伴,我们很乐意协助您实现 HIPAA 合规性。要使用我们的 API 平台,您首先需要与 OpenAI 签订 BAA。

我如何开始?

如果您在使用我们的 API 之前需要 BAA,请发送电子邮件至 baa@openai.com,提供有关您的公司和用例的详细信息。

我们的团队将在 1-2 个工作日内回复。我们会逐案审查每个 BAA 请求,可能需要更多信息。该过程通常在几个工作日内完成。

我可以获得 ChatGPT 的 BAA 吗?

如果您有兴趣了解 ChatGPT Enterprise 的 BAA,请联系销售人员。

如果我未获批准会怎样?

我们能够批准大多数请求 BAA 的客户,但偶尔会有用例未通过我们团队的评估。在这种情况下,我们会提供反馈和背景信息,说明原因,并让您有机会更新我们 API 的预期用途并重新申请。

BAA 是否涵盖所有 API 服务?

不,只有符合零保留条件的端点才受 BAA 的覆盖。您可以查看这些端点的列表。

签署 BAA 是否需要企业协议?

不,签署 BAA 不需要企业协议。

o1 的最终输出如下:

1. Thank the customer and ask for clarification:

a. "Thank you for reaching out! Could you please specify whether you require a Business Associate Agreement (BAA) for using our API or for ChatGPT Enterprise?"

2. If the customer requires a BAA for the API, then:

a. Inform the customer: "To obtain a BAA for our API, please email baa@openai.com with details about your company and use case."

b. Inform the customer: "Our team will respond within 1-2 business days."

c. Inform the customer: "We review each BAA request on a case-by-case basis and may need additional information."

d. Inform the customer: "The process is usually completed within a few business days."

e. Inform the customer: "Please note that only endpoints eligible for zero data retention are covered by the BAA."

i. Call the `provide_list_of_zero_retention_endpoints` function.

f. Inform the customer: "An enterprise agreement is not required to sign a BAA."

3. If the customer requires a BAA for ChatGPT Enterprise, then:

a. Inform the customer: "To explore a BAA for ChatGPT Enterprise, please contact our sales team."

b. Call the `provide_sales_contact_information` function.

4. If the customer is not approved, then:

a. Inform the customer: "We are able to approve most customers that request BAAs, but occasionally a use case doesn't pass our team's evaluation."

b. Inform the customer: "In that case, we'll provide feedback and context as to why and give you the opportunity to update your intended use of our API and re-apply."

5. Ask the customer if there is anything else you can assist with:

a. "Is there anything else I can assist you with today?"

6. Call the `case_resolution` function.

---

**Function Definitions:**

- `provide_list_of_zero_retention_endpoints`:

- **Purpose**: Provides the customer with a list of API endpoints that are eligible for zero data retention under the BAA.

- **Parameters**: None.

- `provide_sales_contact_information`:

- **Purpose**: Provides the customer with contact information to reach our sales team for ChatGPT Enterprise inquiries.

- **Parameters**: None.

- `case_resolution`:

- **Purpose**: Finalizes the case and marks it as resolved.

- **Parameters**: None.

以及 GPT-4 和 GPT-4o 的结果如下:

GPT-4 的结果:https://chatgpt.com/share/66f27544-9ab4-8011-9e1c-63b8bd4999e9

GPT-4o 的结果:https://chatgpt.com/share/66f27395-4ef8-8011-80a3-5a4034d1e43a

2i,3i 序号没有严格按照字母顺序递增,GPT-4/4o 更准确。这两个步骤是“主动”增加的,有可能是增加思维链推理的功劳。我们也尝试了多次重复生成,这两个步骤也会消失,改成其他的动作如代发邮件 `call the send_baa_request_email`。

与前两个政策的转化结果一起对比,还可以发现 o1 的第一次转化是将每个函数的定义描述紧跟着第一次使用之处,其他几次则统一放在最后。虽然提示词中的指示没有明确要求统一放在最后,但三个模型多数时候都遵循了这种“约定俗成”,只是 o1 的变化性似乎更大一些。

值得注意的是,所有这些测试中,这条指示都没有被三个模型执行:**Important**: If at any point you are uncertain, respond with "I don't know."

不同于数学推理,对这样一个真实场景的复杂任务,o1 没有体现出相对于 GPT-4/4o 的非常大的优势。如果说 o1 确实是使用更多的数学和代码方面的思维链来训练,结合推理时生成更高质量的思维链,那么可以解释他对输入进行更好的整体把握和逻辑梳理,更多的代码化的动作和函数的提炼。在这个方向上继续提高的话,对软件开发是有价值的,可以从以下两个方面来看。

d6d75cd574ab48e16b42c4babfd40b5a.png

软件设计时的推理

Reasoning at Software Design Time

一个典型的软件开发过程的需求分析和设计过程可以如下图 2 所示:

9aa5babf4a19287df09da36f670eb613.png

图 2  o1 的设计时(Design-Time)

将人类的知识或手工作业的过程变成自动化的软件的第一步就是将模糊和形式多样的事物描述尽可能转化为清晰无歧义的一致形式。正如上文的用户服务政策转化任务。

多种思维工具可以用于实现和这样的转化,包括:

ⓐ 清晰和统一地定义术语和概念,比如领域驱动设计(DDD)中的 Ubiquitous Language[1] 的形成。

ⓑ 将复杂的问题分而治之,比如领域驱动设计(DDD)中 Bounded Contexts[2] 的形成。

ⓒ 逻辑关系的形成,包括各种条件判断、边界条件等。

ⓓ 各种概念的外延扩展,更多情况的预判等。

前文的提示词中的转化指示部分运用了这些思维工具,但显然还可以借鉴软件工程中的相关经验来进一步改进。但从 o1 的表现来看,大模型的不确定性仍然难以消除。如果后续的环节仍需要人类参与,就是现在得到了比较广泛应用的 Copilot 模式,对整体个软件开发生命周期的帮助有限。如果后续的环节完全由大模型执行,那么每次的不确定性累积,最后的正确率难以满足实用要求。

所以我们在《大语言模型需要自己的语言吗?》中提出大模型输出的转化结果应该尽可能便于软件自动验证。这样的软件验证包括正则表达式、代码编译和解释器、确定性非常高的 NLP 工具,如 Duckling[3] 和 CoreNLP[4] 等。

[1]: https://martinfowler.com/bliki/UbiquitousLanguage.html 

[2]: https://martinfowler.com/bliki/BoundedContext.html

[3]: https://github.com/facebook/duckling

[4]: https://github.com/stanfordnlp/CoreNLP

客服系统也是现在的 Agent(智能体)经常实现的场景(见图 3)。

681d15bb1d25b7ce6b522d695717f492.png

图 3  客服系统中 o1 的设计时(Design-Time)

简单的做法是把客服政策文档放进向量和图(Graph)数据库,在大模型推理时用 RAG 模式为对话提供相关的信息。如果客服政策很复杂,那么用大模型根据用户理解意图来展开不同的针对性处理流程,实现所谓 Agent 的计划制定和执行(Planning)。

实际使用中,大模型制定的计划经常存在问题,或者不能成功地执行。o1 的思路继续发展出更好的推理(reasoning)能力应该可以减少这样的问题。同时开发者在 Agent 中越来越多地编写代码来固定地执行已知的标准操作程序(SOP),在大模型的模糊和灵活能力之间寻找平衡。以下我们讨论一个动态实现平衡的思路。

f3ec84c5072d8aafdad71af8cc0f8e2f.png

软件运行时的推理

Reasoning at Software Run Time

传统软件固化为代码后获得确定性与执行效率(下图 4 的深蓝色箭头),但是丧失了灵活性,比如旅行规划软件希望用户输入目的地城市但获得一个活动名称,如“火人节”。

ebed035e8a8d7cd2adb2a001dc3226db.png

图 4  o1 的运行时(Run-Time)

在运行时引入大模型来处理信息和执行流程的 Agent,很大程度上增加了软件的灵活性,但是也大大增加了不确定性。图 4 的浅蓝色虚线箭头描绘了另一个路径,利用大模型不断提高的推理和代码生成能力为新遇到的情况生成软件实现。这样的生成即可以在线进行,也可以离线分析软件运行中的案例特别是错误日志时进行。根据具体情况,人类对此过程的监督也是必要的,各种辅助监督的工具也会应运而生。

我们相信,这种软件的动态自修改应该是大模型为软件开发带来的最大机会之一。

44dff62abdae1485aa51c8b567bec44b.gif

大模型刷新一切,让我们有着诸多的迷茫,AI 这股热潮究竟会推着我们走向何方?面对时不时一夜变天,焦虑感油然而生,开发者怎么能够更快、更系统地拥抱大模型?《新程序员 007》以「大模型时代,开发者的成长指南」为核心,希望拨开层层迷雾,让开发者定下心地看到及拥抱未来。

读过本书的开发者这样感慨道:“让我惊喜的是,中国还有这种高质量、贴近开发者的杂志,我感到非常激动。最吸引我的是里面有很多人对 AI 的看法和经验和一些采访的内容,这些内容既真实又有价值。”

能学习到新知识、产生共鸣,解答久困于心的困惑,这是《新程序员》的核心价值。欢迎扫描下方二维码订阅纸书和电子书。

a2acb2110a4dfa7ac9dff237c38f7d3e.jpeg

评论 10
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值