利用GenAI在亚马逊云科技上开发面相开发者的技能学习/考试测试平台_亚马逊考试题库technical skill generative ai-CSDN博客

本文链接：https://blog.csdn.net/m0_66628975/article/details/141176928

项目简介：

小李哥将继续每天介绍一个基于亚马逊云科技AWS云计算平台的全球前沿AI技术解决方案，帮助大家快速了解国际上最热门的云计算平台亚马逊云科技AWS AI最佳实践，并应用到自己的日常工作里。

本次介绍的是如何在亚马逊云科技利用大模型托管服务Amazon Bedrock，开发一个云原生技能学习认证备考平台，用于开发者学习亚马逊云科技基础知识提升日常开发技能，高效备考亚马逊云科技认证。本架构设计全部采用了云原生Serverless架构，提供可扩展和安全的AI解决方案。本方案的解决方案架构图如下：

方案所需基础知识

什么是 Amazon Bedrock？

Amazon Bedrock 是亚马逊云科技提供的一项服务，旨在帮助开发者轻松构建和扩展生成式 AI 应用。Bedrock 提供了访问多种强大的基础模型（Foundation Models）的能力，支持多种不同大模型厂商的模型，如AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, 和Amazon，用户可以使用这些模型来创建、定制和部署各种生成式 AI 应用程序，而无需从头开始训练模型。Bedrock 支持多个生成式 AI 模型，包括文本生成、图像生成、代码生成等，简化了开发流程，加速了创新。

将 Amazon Bedrock 应用于开发者理论知识学习和测验

Amazon Bedrock 可以为开发者的理论知识学习和测验提供智能支持。通过 Bedrock 的生成式 AI 模型，开发者可以自动生成学习材料、考试测试题以及个性化的学习建议。例如，Bedrock 可以根据开发者的学习进度自动生成测试题，帮助评估知识掌握情况，并根据测验结果提供针对性的学习资源推荐，从而提高学习效率和效果。这种应用不仅简化了内容创建过程，还能根据每个开发者的需求提供个性化、定制化的学习内容。我们本篇文章就会利用Amazon Bedroc的生成式AI能力，开发一个开发者学习平台。

本方案包括的内容

1. 利用Amazon Bedrock上的大模型和Python SDK编写亚马逊云科技技能测试题。

2. 基于亚马逊云科技服务介绍文档创建自定义测试题生成模板

3. 利用Lambda函数根据测试题生成模板调用Amazon Bedrock生成测试题

项目搭建具体步骤：

1. 登录亚马逊云科技控制台，进入Amazon Bedrock主页，确认大模型Titan Text G1 - Premier是在开启状态。

2. 进入Lambda函数服务主页，创建一个Lambda函数“bedrock_function”，创建一个Python文件“lambda_function.py”和template.py，复制以下代码进文件“lambda_function.py”，该文件用于基于用户请求来生成技能问答测试题。

# Import necessary libraries
import json
import boto3
import os
import re
import logging

from langchain.prompts import ChatPromptTemplate, HumanMessagePromptTemplate
from template import sagemaker_faqs_template, bedrock_faqs_template, bedrock_rag_faqs_template


# Set up logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)


# Create a Bedrock Runtime client
bedrock_client = boto3.client('bedrock-runtime')


# Define Lambda function
def lambda_handler(event, context):
    # Log the incoming event in JSON format
    logger.info('Event: %s', json.dumps(event))

    request_body = json.loads(event['body'])
    content  = request_body.get('content')
        
    # Generate the prompt
    template_function_name = os.environ['TEMPLATE_FUNCTION']
    template_generator = eval(template_function_name)

    template = template_generator(context=content)
    
    prompt_template = ChatPromptTemplate(
        messages=[HumanMessagePromptTemplate.from_template(template)],
        input_variables=["context"],
    )
    
    prompt = prompt_template.format(context=content)
    
    # Prepare the input data for the model
    input_data = {
        "inputText": prompt,
        "textGenerationConfig": {
            "temperature": 0.7,
            "topP": 0.95,
            "maxTokenCount": 2048,
            "stopSequences": []
        }
    }
    
    # Log the input data
    logger.info('Input data: %s', json.dumps(input_data))

    # Invoke the Bedrock Runtime with the cleaned body as payload
    response = bedrock_client.invoke_model(
        modelId="amazon.titan-text-premier-v1:0",
        body=json.dumps(input_data).encode("utf-8"),
        accept='application/json',
        contentType='application/json'
    )

    # Load the response body and decode it
    result = json.loads(response["body"].read().decode())
    
    # Log the response payload
    logger.info('Response payload: %s', json.dumps(result))
    
    # Extract the generated text from the response
    generated_text = ""
    if "results" in result and result["results"]:
        generated_text = result["results"][0].get("outputText", "").replace("\\n", "\n")

    # Parse the generated text and extract questions, options, and correct answers
    quiz = []
    questions = re.split(r'\n\nQ\d+:', generated_text)
    
    for question_text in questions[1:]:
        question_text = question_text.strip()
        
        if not question_text:
            continue

        # Extract the question
        question = re.search(r'^(.*?)\n', question_text).group(1)
        
        # Extract the options
        options = re.findall(r'[a-d]\. (.*?)(?=\n[a-d]\.|$|(?=\n\n))', question_text)

        # Extract the correct answer index
        correct_answer_match = re.search(r'Correct Answer: ([a-d])\.', question_text)
        if correct_answer_match:
            correct_answer_index = ord(correct_answer_match.group(1).lower()) - ord('a')
        else:
            correct_answer_index = None

        quiz.append({
            'question': question,
            'options': options,
            'correct_answer_index': correct_answer_index
        })

    # Return the quiz as JSON
    return {
        'statusCode': 200,
        'headers': {
            'Access-Control-Allow-Headers': 'Content-Type',
            'Access-Control-Allow-Origin': '*',
            'Access-Control-Allow-Methods': 'OPTIONS,POST'
        },
        'body': json.dumps({'quiz': quiz})
    }

3. 复制以下代码进文件“template.py”，该文件定义了生成问题基于的内容介绍，以及定义了生成问题的模板。

def sagemaker_faqs_template(context):
    template = f"""
    Below is the SageMaker Low-code ML FAQ:

    Q: What is Amazon SageMaker Canvas?
    SageMaker Canvas is a no-code service with an intuitive, point-and-click interface that lets you create highly accurate ML-based predictions from your data. SageMaker Canvas lets you access and combine data from a variety of sources using a drag-and-drop user interface, automatically cleaning and preparing data to minimize manual cleanup. SageMaker Canvas applies a variety of state-of-the-art ML algorithms to find highly accurate predictive models and provides an intuitive interface to make predictions. You can use SageMaker Canvas to make much more precise predictions in a variety of business applications and easily collaborate with data scientists and analysts in your enterprise by sharing your models, data, and reports.

    Q: What is Amazon SageMaker Autopilot?
    SageMaker Autopilot is the industry’s first automated machine learning capability that gives you complete control and visibility into your ML models. SageMaker Autopilot automatically inspects raw data, applies feature processors, picks the best set of algorithms, trains and tunes multiple models, tracks their performance, and then ranks the models based on performance, all with just a few clicks. The result is the best-performing model that you can deploy at a fraction of the time normally required to train the model. You get full visibility into how the model was created and what’s in it, and SageMaker Autopilot integrates with SageMaker Studio. You can explore up to 50 different models generated by SageMaker Autopilot inside SageMaker Studio so it’s easy to pick the best model for your use case. SageMaker Autopilot can be used by people without ML experience to easily produce a model, or it can be used by experienced developers to quickly develop a baseline model on which teams can further iterate.

    Q: How does SageMaker Canvas pricing work?
    With SageMaker Canvas, you pay based on usage. SageMaker Canvas lets you interactively ingest, explore, and prepare your data from multiple sources, train highly accurate ML models with your data, and generate predictions. There are two components that determine your bill: session charges based on the number of hours for which SageMaker Canvas is used or logged into, and charges for training the model based on the size of the dataset used to build the model.

    Q: Can I stop an Amazon SageMaker Autopilot job manually?
    Yes. You can stop a job at any time. When a SageMaker Autopilot job is stopped, all ongoing trials will be stopped and no new trial will be started.

    Generate a multiple choice quiz with the following format: 
    
    Quiz:

    Q1: <Question 1 text>
    a. <Option 1 for Question 1>
    b. <Option 2 for Question 1>
    c. <Option 3 for Question 1>
    d. <Option 4 for Question 1>

    Correct Answer: <Correct option letter for Question 1>.

    Q2: <Question 2 text>
    a. <Option 1 for Question 2>
    b. <Option 2 for Question 2>
    c. <Option 3 for Question 2>
    d. <Option 4 for Question 2>

    Correct Answer: <Correct option letter for Question 2>.

    ... (Repeat the same format for all questions)
    
    Base it on this context: {{context}}
    """
    return template

def bedrock_faqs_template(context):
    template = f"""
    Below is the Bedrock FAQ:

    Q: What is Amazon Bedrock?
    Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) along with a broad set of capabilities that you need to build generative AI applications, simplifying development with security, privacy, and responsible AI. With the comprehensive capabilities of Amazon Bedrock, you can experiment with a variety of top FMs, customize them privately with your data using techniques such as fine-tuning and retrieval-augmented generation (RAG), and create managed agents that execute complex business tasks—from booking travel and processing insurance claims to creating ad campaigns and managing inventory—all without writing any code. Since Amazon Bedrock is serverless, you don't have to manage any infrastructure, and you can securely integrate and deploy generative AI capabilities into your applications using the AWS services you are already familiar with.

    Q: Which FMs are available on Amazon Bedrock?
    Amazon Bedrock customers can choose from some of the most cutting-edge FMs available today. This includes Anthropic's Claude, AI21 Labs' Jurassic-2, Stability AI's Stable Diffusion, Cohere's Command and Embed, Meta's Llama 2, and the Amazon Titan language and embeddings models.

    Q: How can I get started with Amazon Bedrock?
    With the serverless experience of Amazon Bedrock, you can quickly get started. Navigate to Amazon Bedrock in the AWS Management Console and try out the FMs in the playground. You can also create an agent and test it in the console. Once you’ve identified your use case, you can easily integrate the FMs into your applications using AWS tools without having to manage any infrastructure.

    Q: What are the most common use cases for Amazon Bedrock?
    You can quickly get started with use cases: Create new pieces of original content, such as short stories, essays, social media posts, and web page copy. Search, find, and synthesize information to answer questions from a large corpus of data. Create realistic and artistic images of various subjects, environments, and scenes from language prompts. Help customers find what they’re looking for with more relevant and contextual product recommendations than word matching. Get a summary of textual content such as articles, blog posts, books, and documents to get the gist without having to read the full content.

    Generate a multiple choice quiz with the following format: 
    
    Quiz:

    Q1: <Question 1 text>
    a. <Option 1 for Question 1>
    b. <Option 2 for Question 1>
    c. <Option 3 for Question 1>
    d. <Option 4 for Question 1>

    Correct Answer: <Correct option letter for Question 1>.

    Q2: <Question 2 text>
    a. <Option 1 for Question 2>
    b. <Option 2 for Question 2>
    c. <Option 3 for Question 2>
    d. <Option 4 for Question 2>

    Correct Answer: <Correct option letter for Question 2>.

    ... (Repeat the same format for all questions)
    
    Base it on this context: {{context}}
    """
    return template

def bedrock_rag_faqs_template(context):
    template = f"""
    Below is the Bedrock RAG FAQ:

    Q: What types of data formats are accepted by Knowledge Bases for Amazon Bedrock?
    Supported data formats include .pdf, .txt, .md, .html, .doc and .docx, .csv, .xls, and .xlsx files. Files must be uploaded to Amazon S3. Point to the location of your data in Amazon S3, and Knowledge Bases for Amazon Bedrock takes care of the entire ingestion workflow into your vector database.

    Q: Which embeddings model is used to convert documents into embeddings (vectors)?
    At present, Knowledge Bases for Amazon Bedrock uses the latest version of the Amazon Titan Text Embeddings model available in Amazon Bedrock. The Amazon Titan Text Embeddings model supports 8K tokens and 25+ languages and creates embeddings of 1,536 dimension size. 

    Q: Which vector databases are supported by Knowledge Bases for Amazon Bedrock?
    Knowledge Bases for Amazon Bedrock takes care of the entire ingestion workflow of converting your documents into embeddings (vector) and storing the embeddings in a specialized vector database. Knowledge Bases for Amazon Bedrock supports popular databases for vector storage, including vector engine for Amazon OpenSearch Serverless, Pinecone, Redis Enterprise Cloud, Amazon Aurora (coming soon), and MongoDB (coming soon). If you do not have an existing vector database, Amazon Bedrock creates an OpenSearch Serverless vector store for you.

    Q: Is it possible to do a periodic or event-driven sync from Amazon S3 to Knowledge Bases for Amazon Bedrock?
    Depending on your use case, you can use Amazon EventBridge to create a periodic or event-driven sync between Amazon S3 and Knowledge Bases for Amazon Bedrock.

    Generate a multiple choice quiz with the following format: 
    
    Quiz:

    Q1: <Question 1 text>
    a. <Option 1 for Question 1>
    b. <Option 2 for Question 1>
    c. <Option 3 for Question 1>
    d. <Option 4 for Question 1>

    Correct Answer: <Correct option letter for Question 1>.

    Q2: <Question 2 text>
    a. <Option 1 for Question 2>
    b. <Option 2 for Question 2>
    c. <Option 3 for Question 2>
    d. <Option 4 for Question 2>

    Correct Answer: <Correct option letter for Question 2>.

    ... (Repeat the same format for all questions)
    
    Base it on this context: {{context}}
    """
    return template

4. 创建一个API Gateway，用户对外暴露的API端点，部署到stages后，我们可以获取该API的url。

5. 我们再进入到CloudFront界面，创建一个用于前后端网络加速的CDN，获取前端的URL。