在亚马逊云科技上利用生成式AI开发用户广告营销平台

本文链接：https://blog.csdn.net/m0_66628975/article/details/141128470

项目简介：

小李哥将继续每天介绍一个基于亚马逊云科技AWS云计算平台的全球前沿AI技术解决方案，帮助大家快速了解国际上最热门的云计算平台亚马逊云科技AWS AI最佳实践，并应用到自己的日常工作里。

本次介绍的是如何利用亚马逊云科技大模型托管服务Amazon Bedrock和个性化推荐算法服务Amazon Personalize搭建面向用户的广告营销平台，将生成式AI应用到用户的广告营销场景，提升用户产品转化率。本架构设计全部采用了云原生Serverless架构，提供可扩展和安全的AI解决方案。通过Application Load Balancer和AWS ECS将应用程序与AI模型集成。本方案的解决方案架构图如下：

方案所需基础知识

什么是 Amazon Bedrock？

Amazon Bedrock 是亚马逊云科技提供的一项服务，旨在帮助开发者轻松构建和扩展生成式 AI 应用。Bedrock 提供了访问多种强大的基础模型（Foundation Models）的能力，支持多种不同大模型厂商的模型，如AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, 和Amazon，用户可以使用这些模型来创建、定制和部署各种生成式 AI 应用程序，而无需从头开始训练模型。Bedrock 支持多个生成式 AI 模型，包括文本生成、图像生成、代码生成等，简化了开发流程，加速了创新。

什么是 Amazon Personalize？

Amazon Personalize 是亚马逊云科技提供的一项机器学习服务，旨在帮助开发者轻松构建和部署个性化推荐系统。该服务利用亚马逊多年来在推荐系统领域积累的经验，通过自动化机器学习模型来生成高度精准的个性化推荐，无需开发者具备深厚的机器学习背景。

应用场景

定制广告营销内容：

Amazon Personalize 可以根据用户的行为和偏好，动态生成个性化的广告和营销内容，提升广告投放的效果和转化率。

产品推荐：

在电商平台中，Personalize 可根据用户的浏览和购买历史，推荐相关产品，增加销售额和用户粘性。

内容推荐：

在流媒体平台上，Personalize 能根据用户的观影或听歌习惯，推荐符合其兴趣的电影、电视剧或音乐，提升用户体验。

电子邮件个性化：

Personalize 可以用于定制电子邮件的内容，确保每个用户接收到的邮件内容都是基于其偏好定制的，从而提高邮件的打开率和点击率。

本方案包括的内容

1. 将S3存储桶中的用户源数据导入到Amazon Personalize推荐系统

2. 利用我们的用户源数据，训练、部署一个基于源用户数据的Amazon Personalize推荐算法模型，

3. 使用基于Amazon Personalize推荐结果的提示词工程，利用Amazon Bedrock上的基础AI模型生成定制化营销邮件

项目搭建具体步骤：

1. 进入亚马逊云科技控制台，确认Titan Text G1 - Lite模型是开启的，我们将利用该模型进行广告内容生成。

2. 接下来我们打开亚马逊云科技机器学习服务SageMaker，新建一个Jupyter Notebook：“Lab-Notebook”并打开。

3. 新建一个NoteBook命名为：“personalized-marketing.ipynb”，并粘贴、运行以下代码。首先我们导入必要依赖。

# Import packages
import boto3
import time
import pandas as pd
import json
import random

4. 下面我们将我们的csv文件中保存的电影评分源数据导入到DataFrame中

item_data = pd.read_csv('imdb/items.csv', sep=',', dtype={'PROMOTION': "string"})
item_data.head(5)
movies = pd.read_csv('imdb/items.csv', sep=',', usecols=[0,1], encoding='latin-1', dtype={'movieId': "str", 'imdbId': "str", 'tmdbId': "str"})
pd.set_option('display.max_rows', 25)
movies

5. 接下来我们为Amazon Personalize模型算法训练创建必要的前提条件，如创建Amazon Personalize客户端、获取IAM权限和S3存储桶。

# Configure the SDK to Amazon Personalize
personalize = boto3.client('personalize')
personalize_runtime = boto3.client('personalize-runtime')

account_id = boto3.client('sts').get_caller_identity().get('Account')
print("account id:", account_id)

with open('/opt/ml/metadata/resource-metadata.json') as notebook_info:
    data = json.load(notebook_info)
    resource_arn = data['ResourceArn']
    region = resource_arn.split(':')[3]
print("region:", region)

# Set up a Boto3 client to access IAM functions 
iam = boto3.client('iam')

#  A role has been set up for this solution. The following obtains the ARN for that role 
#  and also prints the role name for your information

role_name = iam.get_role(RoleName='personalize_exec_role')
role_arn = role_name['Role']['Arn']

role_name = role_arn.split('/')[1]
role_name

# Set up a Boto3 client to access S3 functions 
s3 = boto3.client('s3')

# Get a list of all S3 buckets so that we can find the one that starts with "personalized-marketing"
response = s3.list_buckets()

# Filter buckets that start with 'personalized-marketing'
buckets_list = [bucket['Name'] for bucket in response['Buckets'] if bucket['Name'].startswith('personalized-marketing')]

# Get the one bucket name from the list
for data_bucket in buckets_list:
    data_bucket_name = data_bucket

# Display the name of the bucket found    
data_bucket_name

6. 由于Amazon Personalize模型算法训练的数据集需要放置在S3存储桶中，我们将我们的源数据上传到我们刚刚创建的S3桶。

interactions_filename = 'interactions.csv'
items_filename = "items.csv"

interactions_file = interactions_filename

try:
    s3.get_object(
        Bucket=data_bucket_name,
        Key=interactions_filename,
    )
    print("{} already exists in the bucket {}".format(interactions_filename, data_bucket_name))
except s3.exceptions.NoSuchKey:
    # Uploading the file if it does not already exist
    boto3.Session().resource('s3').Bucket(data_bucket_name).Object(interactions_filename).upload_file(interactions_filename)
    print("File {} uploaded to bucket {}".format(interactions_filename, data_bucket_name))

items_file = "imdb/" + items_filename

try:
    s3.get_object(
        Bucket=data_bucket_name,
        Key=items_filename,
    )
    print("{} already exists in the bucket {}".format(items_file, data_bucket_name))
except s3.exceptions.NoSuchKey:
    # Uploading the file if it does not already exist
    # Note that the following line will be needed for the DIY     
    boto3.Session().resource('s3').Bucket(data_bucket_name).Object(items_filename).upload_file(items_file)
    print("File {} uploaded to bucket {}".format(items_filename, data_bucket_name))

7. 接下来我们创建用于Amazon Personalize模型算法训练的数据集组，用于隔离和区分不同的数据集。训练Amazon Personalize模型算法需要创建3个不同的数据集和数据集组，分别为用户信息数据集、推荐物品数据集以及用户购买/使用物品的历史记录的用户物品交互数据集“User-item-interactions”。

marketing_dataset_group_name = "marketing-email-dataset"
try:     
    # Try to create the dataset group. This block will run fully if the dataset group does not exist yet
    # Refer to this section for the DIY
    create_dataset_group_response = personalize.create_dataset_group(
        name = marketing_dataset_group_name,
        domain='VIDEO_ON_DEMAND'
    )

    marketing_dataset_group_arn = create_dataset_group_response['datasetGroupArn']
    print(json.dumps(create_dataset_group_response, indent=2))
    print ('\nCreating the Dataset Group with dataset_group_arn = {}'.format(marketing_dataset_group_arn))

except personalize.exceptions.ResourceAlreadyExistsException as e:
    # If the dataset group already exists, get the unique identifier, marketing_dataset_group_arn, 
    # from the existing resource
    
    marketing_dataset_group_arn = 'arn:aws:personalize:'+region+':'+account_id+':dataset-group/'+marketing_dataset_group_name 
    print ('\nThe the Dataset Group with dataset_group_arn = {} already exists'.format(marketing_dataset_group_arn))
    print ('\nWe will be using the existing Dataset Group dataset_group_arn = {}'.format(marketing_dataset_group_arn))

8. 我们以创建用户-物品交互数据集“User-item-interactions”为例。接下来我们定义一个数据集结构描述的JSON脚本，帮助Amazon Personalize 就能够理解数据的含义，并在训练推荐模型时正确地使用这些数据。

interactions_schema_name = "marketing_interactions_schema"

interactions_schema = {
    "type": "record",
    "name": "Interactions",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {
            "name": "USER_ID",
            "type": "string"
        },
        {
            "name": "ITEM_ID",
            "type": "string"
        },
        {
            "name": "EVENT_TYPE", # "Watch", "Click", etc.
            "type": "string"
        },
        {
            "name": "TIMESTAMP",
            "type": "long"
        }
    ],
    "version": "1.0"
}

try:
    # Try to create the interactions dataset schema. This block will run fully 
    # if the interactions dataset schema does not exist yet
    create_schema_response = personalize.create_schema(
        name = interactions_schema_name,
        schema = json.dumps(interactions_schema),
        domain='VIDEO_ON_DEMAND'
    )
    print(json.dumps(create_schema_response, indent=2))
    marketing_interactions_schema_arn = create_schema_response['schemaArn']
    print ('\nCreating the Interactions Schema with marketing_interactions_schema_arn = {}'.format(marketing_interactions_schema_arn))
    
except personalize.exceptions.ResourceAlreadyExistsException:
    # If the interactions dataset schema already exists, get the unique identifier marketing_interactions_schema_arn
    # from the existing resource 
    
    marketing_interactions_schema_arn = 'arn:aws:personalize:'+region+':'+account_id+':schema/'+interactions_schema_name 
    print('The schema {} already exists.'.format(marketing_interactions_schema_arn))
    print ('\nWe will be using the existing Interactions Schema with marketing_interactions_schema_arn = {}'.format(marketing_interactions_schema_arn))

9. 接下来我们正式创建训练数据集，对于其他两个数据集我们可以按照相同的方式创建。

interactions_dataset_name = "marketing_interactions"
try:
    # Try to create the interactions dataset. This block will run fully 
    # if the interactions dataset does not exist yet
    
    dataset_type = 'INTERACTIONS'
    create_dataset_response = personalize.create_dataset(
        name = interactions_dataset_name,
        datasetType = dataset_type,
        datasetGroupArn = marketing_dataset_group_arn,
        schemaArn = marketing_interactions_schema_arn
    )

    marketing_interactions_dataset_arn = create_dataset_response['datasetArn']
    print(json.dumps(create_dataset_response, indent=2))
    print ('\nCreating the Interactions Dataset with marketing_interactions_dataset_arn = {}'.format(marketing_interactions_dataset_arn))
    
except personalize.exceptions.ResourceAlreadyExistsException:
    # If the interactions dataset already exists, get the unique identifier, marketing_interactions_dataset_arn, 
    # from the existing resource 
    marketing_interactions_dataset_arn =  'arn:aws:personalize:'+region+':'+account_id+':dataset/'+marketing_dataset_group_name+'/INTERACTIONS'
    print('The Interactions Dataset {} already exists.'.format(marketing_interactions_dataset_arn))
    print ('\nWe will be using the existing Interactions Dataset with marketing_interactions_dataset_arn = {}'.format(marketing_interactions_dataset_arn))

10. 我们创建一个import job导入任务，将S3中的数据集导入到Amazon Personalize中的数据集，用于模型算法训练。

interactions_import_job_name = "dataset_import_interaction"
# Check if the import job already exists

# List the import jobs
interactions_dataset_import_jobs = personalize.list_dataset_import_jobs(
    datasetArn=marketing_interactions_dataset_arn,
    maxResults=100
)['datasetImportJobs']

# Check if there is an existing job with the prefix
job_exists = False  
job_arn = None

for job in interactions_dataset_import_jobs:
    if (interactions_import_job_name in job['jobName']):
        job_exists = True
        job_arn = job['datasetImportJobArn']
    
if (job_exists):
    marketing_interactions_dataset_import_job_arn = job_arn
    print('The Interactions Import Job {} already exists.'.format(marketing_interactions_dataset_import_job_arn))
    print ('\nWe will be using the existing Interactions Import Job with marketing_interactions_dataset_import_job_arn = {}'.format(marketing_interactions_dataset_import_job_arn))
        
else:
    # If there is no import job with the prefix, create it   
    create_dataset_import_job_response = personalize.create_dataset_import_job(
        jobName = interactions_import_job_name,
        datasetArn = marketing_interactions_dataset_arn,
        dataSource = {
            "dataLocation": f"s3://{data_bucket_name}/interactions.csv"
        },
        roleArn = role_arn
    )
    marketing_interactions_dataset_import_job_arn = create_dataset_import_job_response['datasetImportJobArn']
    print(json.dumps(create_dataset_import_job_response, indent=2))
    
    print ('\nImporting the Interactions Data with marketing_interactions_dataset_import_job_arn = {}'.format(marketing_interactions_dataset_import_job_arn))

11. 我们从Amazon Personalize获取预定义的视频话题相关（VIDEO_ON_DEMAND）的推荐算法，再创建并训练一个推荐器实例为用户推荐视频，推荐器会根据用户信息和观看历史推荐他们最喜欢的视频。

available_recipes = personalize.list_recipes(domain='VIDEO_ON_DEMAND')
display_available_recipes = available_recipes ['recipes']
available_recipes = personalize.list_recipes(domain='VIDEO_ON_DEMAND',nextToken=available_recipes['nextToken'])#paging to get the rest of the recipes 
display_available_recipes = display_available_recipes + available_recipes['recipes']
display(display_available_recipes)

recommender_top_picks_for_you_name = "marketing_top_picks_for_you"

try:
    create_recommender_response = personalize.create_recommender(
        name = recommender_top_picks_for_you_name,
        recipeArn = 'arn:aws:personalize:::recipe/aws-vod-top-picks',
        datasetGroupArn = marketing_dataset_group_arn,
        recommenderConfig = {"enableMetadataWithRecommendations": True}
    )
    marketing_recommender_top_picks_arn = create_recommender_response["recommenderArn"]
    
    print (json.dumps(create_recommender_response))
    print ('\nCreating the Top Picks For You recommender with marketing_recommender_top_picks_arn = {}'.format(marketing_recommender_top_picks_arn))
    
except personalize.exceptions.ResourceAlreadyExistsException as e:
    marketing_recommender_top_picks_arn =  'arn:aws:personalize:'+region+':'+account_id+':recommender/'+recommender_top_picks_for_you_name
    print('The Top Picks For You recommender {} already exists.'.format(marketing_recommender_top_picks_arn))
    print ('\nWe will be using the existing Top Picks For You recommender with marketing_recommender_top_picks_arn = {}'.format(marketing_recommender_top_picks_arn))

12. 我们定义一个函数用户为不同的用户，基于他们的id推荐出他们最喜欢的视频

def getRecommendedMoviesForUserId(
    user_id, 
    marketing_recommender_top_picks_arn, 
    item_data, 
    number_of_movies_to_recommend = 5):
    # For a user_id, get the top n (number_of_movies_to_recommend) movies by using Amazon Personalize 
    # and get the additional metadata for each movie (item_id) from the item_data
    # Return a list of movie dictionaries (movie_list) with the relevant data

    # Get recommended movies
    get_recommendations_response = personalize_runtime.get_recommendations(
        recommenderArn = marketing_recommender_top_picks_arn,
        userId = str(user_id),
        numResults = number_of_movies_to_recommend,
        metadataColumns = {
            "ITEMS": ['TITLE', 'GENRES']
        }
    )

    # Create a list of movies with title, genres 
    movie_list = []
    
    for recommended_movie in get_recommendations_response['itemList']:      
        movie_list.append(
            {
                'title' : recommended_movie['metadata']['title'],
                'genres' : recommended_movie['metadata']['genres'].replace('|', ' and ')
            }
        )
    return movie_list

13. 下面我们就开始利用Bedrock结合我们的用户推荐结果，为用户发送广告营销邮件。在以下代码中我们定义了用户的个人信息demographic，如用户是一名50岁居住在多伦多的成年人。再从Amazon Personalize取出基于用户id获得的推荐电影，基于以上信息构建一个提示词模板，再利用Titan Text AI大模型生成广告营销邮件。

# Set up a Boto3 client to access the functions within Amazon Bedrock
bedrock = boto3.client('bedrock-runtime') 

# Model parameters
# The LLM you will be using
model_id = 'amazon.titan-text-lite-v1'

# The desired MIME type of the inference body in the response
accept = 'application/json'

# The MIME type of the input data in the request
content_type = 'application/json'

# The maximum number of tokens to use in the generated response
max_tokens_to_sample = 1000

# Sample user demographics
user_demographic_1 = f'The user is a 50 year old adult called Otto.'
user_demographic_3 = f'The user is a young adult called Jane.'

def generate_personalized_prompt(user_demographic, favorite_genre, movie_list, model_id, max_tokens_to_sample = 50):

    prompt_template = f'''You are a skilled publicist. Write a high-converting marketing email advertising several movies available in a video-on-demand streaming platform next week, 
    given the movie and user information below. Your email will leverage the power of storytelling and persuasive language. 
    You want the email to impress the user, so make it appealing to them based on the information contained in the <user> tags, 
    and take into account the user's favorite genre in the <genre> tags. 
    The movies to recommend and their information is contained in the <movie> tag. 
    All movies in the <movie> tag must be recommended. Give a summary of the movies and why the human should watch them. 
    Put the email between <email> tags.
    Sign it from "Cloud island movies".
    
    <user>
    {user_demographic}
    </user>

    <genre>
    {favorite_genre}
    </genre>

    <movie>
    {movie_list}
    </movie>

    '''

    prompt_input = json.dumps({
        "inputText":prompt_template,
        "textGenerationConfig": {
            "maxTokenCount": 4096,
            "stopSequences": [],
            "temperature": 0.7,
            "topP": 0.9
        }
    })
      
    return prompt_input


# Create prompt input
prompt_input_json = generate_personalized_prompt(user_demographic, user_favorite_genre, movie_list, model_id, max_tokens_to_sample )
prompt_input_json


response = bedrock.invoke_model(
    body= prompt_input_json,
    modelId=model_id,
    accept=accept,    
    contentType=content_type
    )

response_body = json.loads(response.get('body').read())
model_output_string = response_body['results'][0]['outputText']
# model_output_str_clean = re.sub(r'<[^>]*>', '', model_output_string)

print(model_output_string)