全球最强代码模型+首个混合推理模型Claude 3.7 Sonnet深度评测

最新推荐文章于 2025-04-24 16:12:08 发布

佛州小李哥

最新推荐文章于 2025-04-24 16:12:08 发布

阅读量1.9k

点赞数 27

分类专栏： AWS技术文章标签：人工智能亚马逊云科技 aws 科技 ai 云计算语言模型

本文链接：https://blog.csdn.net/m0_66628975/article/details/145866308

版权

AWS技术专栏收录该内容

186 篇文章

订阅专栏

亚马逊云科技作为云计算行业的领头羊，其随着生成式AI行业的快速发展而不断扩展其基础AI模型，在前一阵刚刚推出DeepSeek-R1后。今天Anthropic的Claude 3.7 Sonnet基础模型已在Amazon Bedrock上正式上线。作为Anthropic迄今为止最强大的模型，Claude 3.7 Sonnet以其是系列内首个能延展思考的混合推理能力和全球最强代码生成能力而脱颖而出，这意味着它可以通过谨慎、逐步的推理来解决复杂问题。此外亚马逊还将Claude 3.7 Sonnet添加到Amazon Q Developer所使用的模型列表中。Amazon Q集成Bedrock后，通过Amazon Q大家可以针对特定代码任务使用最合适的模型，例如Claude 3.7 Sonnet，一键生成完整的复杂代码项目，从而帮助开发者加速整个软件开发生命周期。

Claude 3.7 Sonnet的几项明星功能和特性

首个混合推理模型

Claude 3.7 Sonnet在模型的思考方式上采取了全新的方法。相对于使用两个模型——一个用于快速回答，另一个用于解决复杂问题——Claude 3.7 Sonnet将推理整合为一个单一模型。这样的组合更类似于人类大脑的工作方式。毕竟无论是回答简单问题还是解决困难难题，我们使用的都是同一个大脑。

该模型有两种模式——标准模式和延展思考模式，开发者们可以在Amazon Bedrock中进行切换。在标准模式下，Claude 3.7 Sonnet单纯是是Claude 3.5 Sonnet的升级版本；但在延展思考模式下，Claude 3.7 Sonnet推理方式则会发生改变，它会花费更多时间来详细分析问题、规划解决方案并从多个角度进行思考，然后再提供答案，从而在性能上进一步提升。大家可以通过变换用推理功能来控制速度和成本。延展思考所产生的token会计入上下文窗口，并作为输出token进行计费。

全球最强代码模型

Claude 3.7 Sonnet在编码能力处于最全球最前沿，擅长理解上下文和进行创新性的解决问题。据Anthropic所述，该模型在SWE-bench Verified评测中(标准模式)达到了业界领先的70.3%。相较于Claude 3.5 Sonnet，Claude 3.7 Sonnet在大多数基准测试上都有更出色的表现。这些增强的能力使Claude 3.7 Sonnet非常适合用于驱动AI代理和复杂的工作流程。

Claude 3.7 Sonnet benchmarks

相对于上一代模型超过15倍输出长度提升

与Claude 3.5 Sonnet相比，该模型的输出长度能力显著提升。当大家在请求中要求生成更多细节、示例或需要额外背景信息时，这种增强的输出长度尤其有用。要生成长篇内容，可以尝试先要求提供详细的生成内容的大纲(对于写作的场景，我们可以指定大纲的细化到段落级别，并包含字数目标)，然后让模型在回答时对各段内容进行索引以匹配大纲的结构，并重申字数要求让模型在要求的字数范围内回复。Claude 3.7 Sonnet最高支持输出长度达128K token(目前普遍可用的是64K，128K为测试版)。

低成本、预算可控制的AI

在Amazon Bedrock中使用Claude 3.7 Sonnet时，大家可以自主控制模型用于思考的预算。这样一来，大家可以在速度、成本与性能之间进行权衡：对于复杂问题可以分配更多token进行推理，或者在需要快速响应时限制token数量，以便根据自身的用例需求来优化性能。

Claude 3.7 Sonnet使用评测

要访问Claude 3.7模型，我们需要先在Amazon Bedrock控制台开启模型的访问权限。在导航窗格中，我们选择Bedrock configurations下的Model access。然后，我选择Modify model access来请求Claude 3.7 Sonnet的访问权限。

下面我们开始使用Claude 3.7 Sonnet，我在导航窗格中进入Playgrounds并选择Chat / Text。接着选择Select model，在Categories中选择Anthropic，在Models中选择Claude 3.7 Sonnet。若要启用延展思考模式，我需要在Configurations中切换Model reasoning选项。输入以下提示词后，我点击Run：

You’re the manager of a small restaurant facing these challenges:

Three staff members called in sick for tonight’s dinner service
You’re expecting a full house (80 seats)
There’s a large party of 20 coming at 7 PM
Your main chef is available but two kitchen helpers are among those who called in sick
You have 2 regular servers and 1 trainee available

How would you:
Reorganize the available staff to handle the situation
Prioritize tasks and service
Determine if you need to make any adjustments to reservations
Handle the large party while maintaining service quality
Minimize negative impact on customer experience
Explain your reasoning for each decision and discuss potential trade-offs

这里是运行结果，下图图片展示了模型推理的过程。

为了测试图像识别到文本的能力，我上传了一张使用Amazon Bedrock创建的建筑平面图，在将图片导入到Claude中后，收到了模型对该平面图进行的详细分析和推理见解。

Claude 3.7 Sonnet也可以通过亚马逊云科技SDK并使用Amazon Bedrock API进行访问。想要了解更多关于Claude 3.7 Sonnet功能和特性的详细信息，请访问Anthropic’s Claude in Amazon Bedrock产品详情页面。

通过API以代码形式调用Claude 3.7进行推理

import boto3
from botocore.exceptions import ClientError

"""
This example demonstrates how to use Anthropic Claude 3.7 Sonnet's reasoning capability
with Amazon Bedrock. It shows how to:
- Set up the Amazon Bedrock runtime client
- Create a message
- Configure reasoning parameters
- Send a request with reasoning enabled
- Process both the reasoning output and final response
"""

def reasoning_example():
    # Create the Amazon Bedrock runtime client
    client = boto3.client("bedrock-runtime", region_name="us-east-1")

    # Specify the model ID. For the latest available models, see:
    # https://docs.aws.amazon.com/bedrock/latest/userguide/models-supported.html
    model_id = "us.anthropic.claude-3-7-sonnet-20250219-v1:0"

    # Create the message with the user's prompt
    user_message = "Describe the purpose of a 'hello world' program in one line."
    conversation = [
        {
            "role": "user",
            "content": [{"text": user_message}],
        }
    ]

    # Configure reasoning parameters with a 2000 token budget
    reasoning_config = {
        "thinking": {
            "type": "enabled",
            "budget_tokens": 2000
        }
    }

    try:
        # Send message and reasoning configuration to the model
        response = client.converse(
            modelId=model_id,
            messages=conversation,
            additionalModelRequestFields=reasoning_config
        )

        # Extract the list of content blocks from the model's response
        content_blocks = response["output"]["message"]["content"]

        reasoning = None
        text = None

        # Process each content block to find reasoning and response text
        for block in content_blocks:
            if "reasoningContent" in block:
                reasoning = block["reasoningContent"]["reasoningText"]["text"]
            if "text" in block:
                text = block["text"]

        return reasoning, text

    except (ClientError, Exception) as e:
        print(f"ERROR: Can't invoke '{model_id}'. Reason: {e}")
        exit(1)

if __name__ == "__main__":
    # Execute the example and display reasoning and final response
    reasoning, text = reasoning_example()
    print("\n<thinking>")
    print(reasoning)
    print("</thinking>\n")
    print(text)

Claude 3.7 Sonnet所具备的增强功能可以在现实生活里为多个行业带来价值。企业可以创建与客户直接交互的高级AI助手和代理。在医疗领域，它可以协助进行医学影像分析以及研究总结；在金融领域，它能够帮助解决复杂的金融模型问题。对于开发者而言，它则可以充当编程助理，对代码进行审阅、解释技术概念，并在多种语言间提供改进建议。

Anthropic的Claude 3.7 Sonnet现已在美国东部(北弗吉尼亚)、美国东部(俄亥俄)和美国西部(俄勒冈)区域上线。Claude 3.7 Sonnet的定价与Claude 3.5 Sonnet相同，具体定价请参阅Amazon Bedrock的定价页面。想要开始在Amazon Bedrock中使用Claude 3.7 Sonnet，请访问Amazon Bedrock控制台和Amazon Bedrock文档。

总结：

总体而言，Claude 3.7 Sonnet作为Anthropic迄今最强大的混合推理模型和史上最强代码生成模型，不仅在快速响应和延展思考方面兼具优势，还为开发者提供了更强大的编码支持能力。它的低成本的推理预算、可扩展的输出长度以及出色的多任务处理表现，非常适合在各种行业场景下快速部署先进的AI应用，为业务和开发者带来更大的灵活性与效率提升。