节省 20% 成本，更智能的云资源运维，通过 Amazon Bedrock Agent 实现！

亚马逊云开发者

于 2024-09-25 15:41:45 发布

阅读量99

点赞数

文章标签：运维人工智能大数据

原文链接：https://mp.weixin.qq.com/s?__biz=Mzg4NjU5NDUxNg==&mid=2247581168&idx=2&sn=48d0c7dfdefd4cd62e802f4123258628&chksm=ce0f07658c8f63ccfd17b4d98edfc7b398d930048a783120d56795d493fd5205ac4b42c62691&scene=126&sessionid=0

版权

业务背景

本篇文章以使用 Amazon EBS 卷的客户业务作为案例。目前，客户大量的 Amazon EBS 卷依然是 gp2 类型，通过和客户一起分析，我们发现：如果将这些 Amazon EBS 卷做类型转换到 gp3，可以节省 20% 的成本。

技术选型

客户的 Amazon EBS 卷数量超过 200 个，最大的超过 1T 的容量。基于当前客户的使用情况，虽然对于 Amazon EBS 类型转换，亚马逊云科技从底层机制上保证了对上层的应用无感知，但为了安全和稳定的考虑，我们建议客户分批次进行操作，按照某个 Amazon EBS 卷的使用对象和项目归属等进行标记，然后每天选择 10 个左右的卷进行转换。

为减轻客户手动操作的复杂度以及更好的对整个过程进行监控和运维，我们为客户提供了两种运维的手段，除了使用批量 Tag 和 Amazon Lambda 函数进行 Amazon EBS 卷类型转换，我们还提供了基于 Amazon Bedrock Agent 能力的智能运维体验，让客户可以方便地通过大模型对话的方式对感兴趣的 Amazon EBS 卷状态进行查询和修改。

本文主要介绍基于 Amazon Bedrock Agent 实现的智能 Amazon EBS 卷运维的具体配置方式。

方案效果

Amazon Bedrock Agent 原理

Amazon Bedrock 是亚马逊云科技的生成式 AI 服务平台，旨在简化和加速企业级 AI 应用的开发。它支持多种预训练的大型语言模型（LLM），用户可以轻松选择和集成这些模型。

Amazon Bedrock Agent，旨在简化自动化任务和操作。它支持通过自然语言交互，用户可以轻松执行复杂任务，如管理 Amazon EBS Volumes 等。Agent 将用户请求分解成多个步骤，自动调用 API 完成具体操作。

此外，Agent 可集成知识库，增强响应能力，提供更准确和详细的答案。无需编写大量代码，开发者即可轻松创建和配置 Agent，自动管理基础设施和安全，使和 IT 工具的集成更为简便。

Amazon Bedrock Agent 可以帮助大语言模型通过一种称为 ReAct（推理与行动相结合）的推理技术来推理和找出解决用户请求的步骤和方法（工具）。使用 ReAct，您可以构建结构化的提示词来向基础模型展示如何通过任务进行推理并决定有助于找到解决方案的行动（Action）。

结构化的提示词包括一系列的对于“提问-思考-行动-观察”这个 ReAct 过程的示例。其中“提问”是要解决的用户问题。“思考”是一个推理步骤，有助于向基础模型演示如何应对问题并确定要采取的行动。“行动”是模型可以从一组允许的工具中调用对应的 API。“观察”是获得执行特定 API（Action）的返回结果。

以上这个过程已经包装在了 Amazon Bedrock Agent 的实现当中，Agent 的用户只需要定义和实现可以供大模型挑选和使用的 Action 即可，一次和 Agent 的对话过程如下图所示：

用户提出一个问题：“我的 us-east-1 区域有多少 Amazon EBS 卷？”
这个问题被发送到由 Amazon Bedrock 中的大语言模型提供支持的 Amazon Bedrock Agent。
Amazon Bedrock Agent 利用大模型分析问题并与“Action Group”中的 OpenAPI 规范互动，寻找适当的 API 路径和参数。
Amazon Bedrock Agent 按照 OpenAPI 规范提供选定的 API 路径和参数给到 Amazon Lambda 函数。
Amazon Lambda 函数使用指定的 API 调用 boto3 的 Amazon EBS 接口，获取给定区域的真实的 Amazon EBS 卷的 ID 列表，并返回结果给 Amazon Bedrock Agent。
Amazon Bedrock Agent 借助大模型将结果包装成用户更可读的文本形式。
用户接收到最终的回复：“你有以下 Amazon EBS 卷：[xxxx,xxxx,xxxx]”。

技术实现

为了实现基本的 Amazon EBS 运维功能，我们需要至少实现三个步骤：罗列 Amazon EBS 卷，针对给定的卷显示详细信息，以及修改卷的类型。这三个功能需要用 OpenAPI 的格式定义出他们的 API 名称、输入参数和输出的格式等。这样的 OpenAPISchema 可以通过 Amazon Bedrock 中的大语言模型进行初稿生成，然后再根据实际情况进行调整，最终的内容如下：

openapi: 3.0.3
info:
  title: AWS EBS Service API
  description: API for managing AWS EBS volumes
  version: 1.0.0
servers:
  - url: https://api.example.com/v1
paths:
  /volumes:
    get:
      summary: List all EBS volumes in a region
      description: This endpoint retrieves a list of all EBS volume IDs in a specified region.
      parameters:
        - name: region
          in: query
          required: true
          schema:
            type: string
          description: The name of the region to list EBS volumes
      responses:
        '200':
          description: A list of EBS volume IDs
          content:
            application/json:
              schema:
                type: array
                items:
                  type: string
        '400':
          description: Invalid region name


  /volume_id:
    get:
      summary: Get the current status of an EBS volume
      description: This endpoint retrieves the current status information of a specific EBS volume, including volume ID, type, and size.
      parameters:
        - name: region
          in: query
          required: true
          schema:
            type: string
          description: The name of the region
        - name: volumeId
          in: path
          required: true
          schema:
            type: string
          description: The ID of the EBS volume
      responses:
        '200':
          description: The current status of the EBS volume
          content:
            application/json:
              schema:
                type: object
                properties:
                  volumeId:
                    type: string
                  volumeType:
                    type: string
                  volumeSize:
                    type: integer
                  volumeState:
                    type: string
        '400':
          description: Invalid volume ID or region name


  /volume_change_type:
    post:
      summary: Change the type of an EBS volume
      description: This endpoint changes the type of a specified EBS volume. The operation is asynchronous, and the response indicates whether the command was successfully sent.
      parameters:
        - name: region
          in: query
          required: true
          schema:
            type: string
          description: The name of the region
        - name: volumeId
          in: path
          required: true
          schema:
            type: string
          description: The ID of the EBS volume
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              properties:
                originalType:
                  type: string
                  description: The original type of the EBS volume
                targetType:
                  type: string
                  description: The target type to which the EBS volume should be changed
      responses:
        '202':
          description: Command accepted and being processed asynchronously
        '400':
          description: Invalid volume ID or type information

左右滑动查看完整示意

Amazon Bedrock Agent 在创建的时候，需要配置一个叫 Action Group 的属性。Action Group 顾名思义，就是可以被大模型用来调用的动作组，每一个 Action Group 需要通过上面的 OpenAPI 格式定义好可以支持的 API 列表（Action Group Schema），同时需要绑定一个 Amazon Lambda 函数作为 Action 的具体实现。

你可以让 Agent 给你自动创建一个 Amazon Lambda 函数，在这个基础上完善你需要的功能实现。如下图：

这里定义的 Amazon Lambda 函数：ebs-operations-gffka，就是具体的每个 API 的实现逻辑，具体的代码如下：

import json
import boto3


def list_volumes(region):
    ec2_client = boto3.client('ec2', region_name=region)
    volumes = ec2_client.describe_volumes()
    volume_ids = [volume['VolumeId'] for volume in volumes['Volumes']]
    return volume_ids


def get_volume_details(region, volume_id):
    ec2_client = boto3.client('ec2', region_name=region)
    response = ec2_client.describe_volumes(VolumeIds=[volume_id])
    volume_details = response['Volumes'][0]
    return volume_details
    
def modify_volume_type(region, volume_id, original_type, target_type):
    ec2_client = boto3.client('ec2', region_name=region)
    
    response = ec2_client.modify_volume(VolumeId=volume_id, VolumeType=target_type)
    task_status = {
                'message': 'Volume modification initiated successfully.',
                'modificationState': response['VolumeModification']['ModificationState'],
                'targetType': target_type,
                'originalType': original_type
    }
    




def lambda_handler(event, context):
    agent = event['agent']
    actionGroup = event['actionGroup']
    apiPath = event['apiPath']
    httpMethod =  event['httpMethod']
    parameters = event.get('parameters', [])
    requestBody = event.get('requestBody', {})
    print(event)


    if apiPath == "/volumes" and httpMethod == "GET":
        region = next((param['value'] for param in parameters if param['name'] == 'region'), None)
        if region:
            volumes = list_volumes(region)
            responseBody = {
                "application/json": {
                    "body": volumes
                }
            }
            httpStatusCode = 200
        else:
            responseBody = {
                "application/json": {
                    "body": "Invalid region name"
                }
            }
            httpStatusCode = 400
            
    elif apiPath == "/volume_id" and httpMethod == "GET":
        region = next((param['value'] for param in parameters if param['name'] == 'region'), None)
        volume_id = next((param['value'] for param in parameters if param['name'] == 'volumeId'), None)
        if region is not None and volume_id is not None:
            volume_details = get_volume_details(region, volume_id)
            print(volume_details)


            volume_info = {
                "volumeId": volume_details["VolumeId"],
                "volumeType": volume_details["VolumeType"],
                "volumeSize": volume_details["Size"],
                "volumeState": volume_details["State"]
            }
            responseBody = {
                "application/json": {
                    "body": volume_info
                }
            }
            httpStatusCode = 200
        else:
            responseBody = {
                "application/json": {
                    "body": "Invalid region name or volume id"
                }
            }
            httpStatusCode = 400
    
    elif apiPath == "/volume_change_type" and httpMethod == "POST":
        region = next((param['value'] for param in parameters if param['name'] == 'region'), None)
        volume_id = next((param['value'] for param in parameters if param['name'] == 'volumeId'), None)
        properties = requestBody['content']['application/json']['properties']
        # Initialize variables to hold the values
        original_type = None
        target_type = None
        
        # Loop through the properties to find originalType and targetType
        for prop in properties:
            if prop['name'] == 'originalType':
                original_type = prop['value']
            elif prop['name'] == 'targetType':
                target_type = prop['value']


        if region is not None and volume_id is not None:
            task_details = modify_volume_type(region, volume_id, original_type, target_type)
            print(task_details)
            responseBody = {
                "application/json": {
                    "body": task_details
                }
            }
            httpStatusCode = 200
        else:
            responseBody = {
                "application/json": {
                    "body": "Invalid region name or volume id"
                }
            }
            httpStatusCode = 400
    
    else:
        responseBody = {
            "application/json": {
                "body": "Invalid API path or HTTP method"
            }
        }
        httpStatusCode = 400
   


    action_response = {
        'actionGroup': actionGroup,
        'apiPath': apiPath,
        'httpMethod': httpMethod,
        'httpStatusCode': 200,
        'responseBody': responseBody


    }


    dummy_api_response = {'response': action_response, 'messageVersion': event['messageVersion']}
    print("Response: {}".format(dummy_api_response))


    return dummy_api_response

左右滑动查看完整示意

我们可以看到每一个在 Action Group Schema 里定义的 API，Amazon Lambda 函数传入的 event 数据结构，获得 apiPath 和 httpMethod 来进行分支判断，同时 event 数据结构里的 parameters 和 requestBody 携带了 API 调用传入的参数和请求体，根据这些信息，就可以在 Amazon Lambda 函数里进行具体的功能实现。

为了让 Amazon Lambda 函数具备操作 Amazon EBS 的权限，还需要保证 Amazon Lambda 函数使用的执行角色具备相应的权限：

另外还要注意 Amazon Lambda 函数默认的执行超时时间是 3 秒钟，需要根据实际情况设置成大一些的取值，比如 3 分钟。

定义好了 Action Group OpenAPI Schema 和对应的 Amazon Lambda 函数实现之后，还需要确认 Agent 使用的大模型，并通过提示词的方式让大模型对自己的角色有一个更好的认知，这里，我们选择了 Amazon Bedrock中的大语言模型和 Claude 3.5 Sonnet 模型，使用的提示词如下：

“你是一个亚马逊云科技的运维专家，你会根据亚马逊云科技用户针对自己账户内的资源相关的问题，提供你自己的见解，但是如果亚马逊云科技用户询问的是自己账户资源的数量、状态等问题，你会调用 Action Group 来进行实际信息的获取。如果亚马逊云科技用户希望你对资源进行增删改等动作，你会先让用户确认，获得确认之后再调用相关的 Action Group 来完成。你的输出涉及到和亚马逊云科技资源相关的信息的时候，你会用 .json 的格式来组织这些内容再输出。”