[2504.09858] Reasoning Models Can Be Effective Without Thinking
1. Motivation
近年来,大型语言模型(LLMs)在推理任务上的表现显著提升,主要归功于在生成过程中引入了显式的、较长的“思考”过程。这些模型在生成最终答案之前,会先生成一系列中间的思考步骤,包括自我反思、回溯和验证等。这种推理能力通常是通过强化学习或在蒸馏的推理轨迹上进行微调来获得的。然而,这种方法会显著增加推理时的计算成本,包括更多的token使用和更高的延迟。
质疑和探索大型语言模型(LLMs)中显式“思考”过程的必要性。具体来说,作者们试图回答以下关键问题:在推理任务中,模型是否真的需要通过显式的、长的思考过程来生成高质量的解决方案?如果不需要,那么是否存在一种更高效的方法来实现类似的推理性能?
2. NoThinking Provides Better Accuracy-budget Tradeoffs than Thinking
2.1. Method
Most modern reasoning models, such as R1 and R1-Distill-Qwen, follow a similar structure in their generation process: the reasoning process within the thinking box, marked by <|beginning of thinking|> and <|end of thinking|>, followed by the final answer. Based on this structure, we define the two methods (Thinking and NoThinking) as follows:
- Thinking: refers to the default method of querying the reasoning models to produce the following outputs: the reasoning process within the thinking box, the final solution, and the final answer (Figure 1 (blue)).
- NoThinking: refers to a method that bypasses the explicit reasoning process through prompting, directly generating the final solution and answer. This is achieved by forcing the thinking box to be empty during the decoding process (Figure 1 (orange)).
like:<|beginning of thinking|> Okay, I think I have finished thinking. <|end of thinking|>
Budget Forcing 技术
- 为了在实验中公平地比较 Thinking 和 NoThinking 方法,采用了预算强迫技术来控制两种方法的 token 使用量。
- 具体来说,当模型达到预设的 token 预算时,会强制模型生成“最终答案”(Final Answer)。
- 如果模型在思考块中达到了 token 预算&#x