无需运维！AWS全托管服务构建RAG应用，让AI秒答你的业务难题

最新推荐文章于 2025-05-09 17:11:59 发布

AWS官方合作商

最新推荐文章于 2025-05-09 17:11:59 发布

阅读量749

点赞数 22

文章标签：运维 aws 人工智能

本文链接：https://blog.csdn.net/awscloud/article/details/147788729

版权

在AWS上快速构建检索增强生成（RAG）应用，可以结合以下服务和步骤实现高效、可扩展的解决方案。以下是基于AWS服务的详细构建流程：

1. RAG 核心架构

RAG（Retrieval-Augmented Generation）的核心流程包括：

数据准备：文档存储与处理
向量化：文本转换为向量嵌入（Embeddings）
检索：基于语义相似度的向量检索
生成：大语言模型（LLM）生成答案

2. AWS 服务选型

(1) 数据存储与处理

Amazon S3
存储原始文档（PDF、TXT等），作为数据湖的核心存储。
Amazon Textract
提取非结构化文档（如PDF）中的文本。
Amazon Comprehend
文本分析（关键词提取、实体识别等）。

(2) 向量化与检索

Amazon OpenSearch Service
支持向量搜索的托管搜索引擎，用于存储向量和快速检索相似文本。
AWS Lambda
无服务器计算，处理文档分块、调用模型生成嵌入。
Amazon SageMaker
可选，用于部署自定义嵌入模型（如Sentence-BERT）。

(3) 大语言模型（LLM）

Amazon Bedrock
托管主流LLM（如Anthropic Claude、AI21 Labs），直接调用生成答案。
Amazon SageMaker JumpStart
部署开源模型（如Llama-2、Falcon）。

(4) 应用层

API Gateway + Lambda
构建RESTful API，处理用户查询并返回结果。
AWS Amplify
快速部署前端Web界面。

3. 实现步骤

步骤1：文档处理与向量化

将文档上传至S3。
使用Lambda触发Textract/Comprehend处理文档，生成纯文本。
对文本分块（如每段512 tokens），通过嵌入模型（如AWS Titan Embeddings）转换为向量。
向量数据存入OpenSearch的索引中。

示例Lambda处理逻辑（Python）：

import boto3

def handler(event, context):
s3 = boto3.client('s3')
textract = boto3.client('textract')
# 从S3获取文档并提取文本
response = textract.detect_document_text(
Document={'S3Object': {'Bucket': 'my-bucket', 'Name': 'doc.pdf'}}
)
text = " ".join([item['Text'] for item in response['Blocks'] if item['BlockType'] == 'LINE'])
# 分块并生成向量
chunks = split_text(text)
embeddings = generate_embeddings(chunks) # 调用Bedrock或SageMaker
# 存储到OpenSearch
opensearch.index(index='rag-index', body={'vector': embeddings, 'text': chunks})

步骤2：语义检索

用户提问时，通过API Gateway触发Lambda。
Lambda将问题转换为向量，查询OpenSearch获取Top-K相似文本片段。
将检索结果与问题拼接为LLM的输入。

OpenSearch向量查询示例：

{
"size": 5,
"query": {
"knn": {
"vector_field": {
"vector": [0.1, 0.2, ..., 0.5], # 问题向量
"k": 5
}
}
}
}

步骤3：生成答案

调用Amazon Bedrock的LLM生成最终回答：

bedrock = boto3.client(service_name='bedrock-runtime')

prompt = f"基于以下上下文：{context}，回答：{question}"
response = bedrock.invoke_model(
modelId='anthropic.claude-v2',
body=json.dumps({"prompt": prompt, "max_tokens_to_sample": 300})
)
answer = json.loads(response.get('body').read()).get('completion')