【人工智能】Qwen3 中的思考Think 与不思考No_Think 机制

最新推荐文章于 2025-05-02 10:27:22 发布

林九生

最新推荐文章于 2025-05-02 10:27:22 发布

阅读量1.3k

点赞数 22

分类专栏：人工智能文章标签：人工智能 python windows

本文链接：https://blog.csdn.net/linjiuxiansheng/article/details/147626819

版权

人工智能专栏收录该内容

17 篇文章

订阅专栏

Qwen3 中的思考Think 与不思考No_Think 机制

在大语言模型（LLM）的使用过程中，推理速度与输出质量之间的平衡一直是开发者和用户关注的重点。Qwen3 作为阿里巴巴推出的新一代大语言模型，不仅在性能和效果上有了显著提升，还引入了一种动态控制模型行为的机制——/think 和 /no_think 标签，允许用户在多轮对话中灵活切换模型的推理模式。

本文将深入解析 Qwen3 中的 /think 和 /no_think 机制，并通过代码示例展示其在实际应用中的使用方式，帮助开发者更好地理解和应用这一高级功能。

一、QWeb3 Think 与 No_Think 机制概述

Qwen3 提供了一种软切换机制（soft switch mechanism），允许用户通过在输入中添加 /think 或 /no_think 标签，动态控制模型的推理行为。这种机制特别适用于多轮对话场景，用户可以在不同轮次中切换模型的推理模式，从而在推理速度和输出质量之间取得平衡。

/think：启用深度推理模式，模型会进行更全面的思考，输出质量更高，但响应时间可能较长。
/no_think：禁用深度推理，模型快速生成响应，适用于对速度要求较高的场景，但输出可能不够精确。

在多轮对话中，模型会遵循最近一次的指令，即如果用户在某一轮中使用了 /no_think，则后续轮次将默认使用该模式，直到用户再次使用 /think 进行切换。

二、代码示例解析

以下是一个使用 Qwen3 的 QwenChatbot 类的示例代码，展示了如何在多轮对话中使用 /think 和 /no_think 标签控制模型行为。

from transformers import AutoModelForCausalLM, AutoTokenizer

class QwenChatbot:
    def __init__(self, model_name="Qwen/Qwen3-30B-A3B"):
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModelForCausalLM.from_pretrained(model_name)
        self.history = []

    def generate_response(self, user_input):
        messages = self.history + [{"role": "user", "content": user_input}]
        text = self.tokenizer.apply_chat_template(
            messages,
            tokenize=False,
            add_generation_prompt=True
        )
        inputs = self.tokenizer(text, return_tensors="pt")
        response_ids = self.model.generate(**inputs, max_new_tokens=32768)[0][len(inputs.input_ids[0]):].tolist()
        response = self.tokenizer.decode(response_ids, skip_special_tokens=True)
        # 更新历史记录
        self.history.append({"role": "user", "content": user_input})
        self.history.append({"role": "assistant", "content": response})
        return response

# 示例使用
if __name__ == "__main__":
    chatbot = QwenChatbot()
    # 第一次输入（无 /think 或 /no_think，使用默认的 thinking 模式）
    user_input_1 = "How many r's in strawberries?"
    print(f"User: {user_input_1}")
    response_1 = chatbot.generate_response(user_input_1)
    print(f"Bot: {response_1}")
    print("----------------------")
    # 第二次输入使用 /no_think
    user_input_2 = "Then, how many r's in blueberries? /no_think"
    print(f"User: {user_input_2}")
    response_2 = chatbot.generate_response(user_input_2)
    print(f"Bot: {response_2}")
    print("----------------------")
    # 第三次输入使用 /think
    user_input_3 = "Really? /think"
    print(f"User: {user_input_3}")
    response_3 = chatbot.generate_response(user_input_3)
    print(f"Bot: {response_3}")