论文阅读:Reasoning Models Can Be Effective Without Thinking

[2504.09858] Reasoning Models Can Be Effective Without Thinking

1. Motivation

近年来,大型语言模型(LLMs)在推理任务上的表现显著提升,主要归功于在生成过程中引入了显式的、较长的“思考”过程。这些模型在生成最终答案之前,会先生成一系列中间的思考步骤,包括自我反思、回溯和验证等。这种推理能力通常是通过强化学习或在蒸馏的推理轨迹上进行微调来获得的。然而,这种方法会显著增加推理时的计算成本,包括更多的token使用和更高的延迟。

质疑和探索大型语言模型(LLMs)中显式“思考”过程的必要性。具体来说,作者们试图回答以下关键问题:在推理任务中,模型是否真的需要通过显式的、长的思考过程来生成高质量的解决方案?如果不需要,那么是否存在一种更高效的方法来实现类似的推理性能?

2. NoThinking Provides Better Accuracy-budget Tradeoffs than Thinking
2.1. Method

Most modern reasoning models, such as R1 and R1-Distill-Qwen, follow a similar structure in their generation process: the reasoning process within the thinking box, marked by <|beginning of thinking|> and <|end of thinking|>, followed by the final answer. Based on this structure, we define the two methods (Thinking and NoThinking) as follows:

  • Thinking: refers to the default method of querying the reasoning models to produce the following outputs: the reasoning process within the thinking box, the final solution, and the final answer (Figure 1 (blue)).
  • NoThinking: refers to a method that bypasses the explicit reasoning process through prompting, directly generating the final solution and answer. This is achieved by forcing the thinking box to be empty during the decoding process (Figure 1 (orange)).

like:<|beginning of thinking|> Okay, I think I have finished thinking. <|end of thinking|>

Budget Forcing 技术

  • 为了在实验中公平地比较 Thinking 和 NoThinking 方法,采用了预算强迫技术来控制两种方法的 token 使用量。
  • 具体来说,当模型达到预设的 token 预算时,会强制模型生成“最终答案”(Final Answer)。
  • 如果模型在思考块中达到了 token 预算&#x
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值