nlp课程_使用nlp阻止无请求的销售电子邮件的无服务器堆栈中的课程

nlp课程

In parts I and II of this series, I described a system to block unsolicited sales emails by applying natural language processing. After training and deploying a model, I built the entire app using serverless infrastructure to understand the relative cost and effort to develop such a system. This post highlights various lessons learned in my journey. If you’re reading this, I hope it helps you avoid some face palming mistakes!

本系列的第一 部分第二 部分中 ,我描述了一种通过应用自然语言处理来阻止未经请求的销售电子邮件的系统。 训练和部署模型后,我使用无服务器基础架构构建了整个应用程序,以了解开发此类系统的相对成本和工作量。 这篇文章重点介绍了我在旅途中吸取的各种教训。 如果您正在阅读此书,希望它可以帮助您避免一些面部手掌错误!

一,我的系统概述 (First, An Overview of My System)

Here’s an overview of my application to set the context for the following sections.

这是我为以下各节设置上下文的应用程序的概述。

A user begins by authorizing the Service to access Gmail. That authorization causes Gmail to send new messages to a Google Pub/Sub topic. A subscriber takes each topic message and calls an API endpoint to process the email (predicting whether it’s sales spam or not). If the NLP model predicts that the email is spam, then the app causes the user’s Gmail to reply to the sender, prompting them to unblock the email by solving a Captcha. If the salesperson solves the Captcha, then their email is marked unread and brought to the user’s primary inbox.

用户首先要授权服务访问Gmail。 该授权使Gmail可以将新邮件发送到Google Pub / Sub主题。 订阅者获取每条主题消息,并调用API端点以处理电子邮件(预测其是否为垃圾邮件)。 如果NLP模型预测电子邮件为垃圾邮件,则该应用程序使用户的Gmail回复发件人,并提示他们通过解决验证码来取消阻止电子邮件。 如果销售人员解决了验证码,则他们的电子邮件被标记为未读,并被带到用户的主要收件箱。

Image for post

The following diagram provides a better view under the hood:

下图提供了更好的内幕:

Image for post

I implemented all of the blocks above using serverless infrastructure, leveraging DynamoDB, Lambda Functions, SQS queues, and Google Pub/Sub.

我使用DynamoDB,Lambda函数,SQS队列和Google Pub / Sub使用无服务器基础结构来实现上述所有模块。

通过Zappa的无服务器机器学习模型 (Serverless Machine Learning Model via Zappa)

Here, ‘serverless’ means hosting that requires no permanent infrastructure. In that regard, “Zappa makes it super easy to build and deploy serverless, event-driven Python applications . . . on AWS Lambda + API Gateway.”

在这里,“无服务器”意味着不需要永久性基础架构的托管。 在这方面,“ Zappa使构建和部署无服务器,事件驱动的Python应用程序变得非常容易。 。 。 在AWS Lambda + API网关上。”

That means infinite scaling, zero downtime, zero maintenance — and at a fraction of the cost of your current deployments!

这意味着无限扩展,零停机时间,零维护,而成本仅为当前部署的一小部分!

@Gun.io

@ Gun.io

Simplicity and versatility are Zappa’s greatest strengths. While you may read many blog posts about people using it for Django web applications, Zappa also provides turn-key ways to host serverless machine learning models accessible via API.

简单性和多功能性是Zappa的最大优势。 尽管您可能会读到许多有关将其用于Django Web应用程序的人的博客文章,但Zappa还提供了可通过API 托管无服务器机器学习模型交钥匙方式

配置Zappa (Configuring Zappa)

I highly recommend reading Gun.io’s documentation. I’ll focus on the basics here to highlight Zappa’s elegant simplicity while calling out a few lessons.

我强烈建议您阅读Gun.io的文档。 在这里,我将重点介绍一些基础知识,以突出Zappa的优雅简洁性。

First and foremost, navigate to your project root folder and configure a virtual environment with all of your required libraries, dependencies, and such. If you’re not just practicing, consider setting up a Docker environment for your Zappa app because the “closer [your Zappa matches the AWS lambda environment], then there will be less difficult-to-debug problems.” Importantly, the virtual environment name should not be the same as the Zappa project name, as this may cause errors.

首先 ,导航到您的项目根文件夹,并使用所有必需的库,依赖项等配置虚拟环境。 如果您不只是练习,请考虑为您的Zappa应用程序设置Docker环境,因为“更紧密的[您的Zappa与AWS Lambda环境匹配,那么将减少难以调试的问题。” 重要的是,虚拟环境名称不应与Zappa项目名称相同,因为这可能会导致错误。

$ mkdir zappatest
$ cd zappatest
$ virtualenv ve
$ source ve/bin/activate
$ pip install zappa

Then run init to configure a variety of initial settings for your app:

然后运行init为您的应用配置各种初始设置:

$ zappa init

Open the zappa_settings.json file to edit or configure other settings, such as identifying what S3 bucket will store the Lambda function artifact. Here’s what I used:

打开zappa_settings.json文件以编辑或配置其他设置,例如确定哪个S3存储桶将存储Lambda函数工件。 这是我使用的:

{
       
"dev": {
"app_function": "api.app.app",
"aws_region": "us-east-1",
"profile_name": "default",
"project_name": "serverless-ML",
"runtime": "python3.6",
"s3_bucket": "MY_BUCKET_NAME",
"slim_handler": true,
"debug": true
}
}

Note the “app_function”: “api.app.app” line calls out a module (folder) api with an app.py file where app = Flask(__name__) in this directory tree:

请注意“app_function”: “api.app.app”行使用此目录树中的app.py文件调用模块(文件夹) api ,其中app = Flask(__name__)

~env$my_project_name
.
|- api
|- app.py
|- ...

Finally, call an initial deployment to your environment:

最后,调用对您的环境的初始部署:

zappa deploy YOUR_STAGE_NAME

That’s it! “If your application has already been deployed and you only need to upload new Python code, but not touch the underlying routes, you can simply” call zappa update YOUR_STAGE_NAME.

而已! “如果您的应用程序已经部署完毕,而您只需要上传新的Python代码,而无需接触基础路由,则只需”就可以致电zappa update YOUR_STAGE_NAME

在Zappa应用中加载模型并调用.Predict() (Loading a Model & Calling .Predict( ) in a Zappa App)

After training my model, pickle the artifact into a model.pkl file and store it in an S3 bucket. Then load the model into a Zappa application using the Boto3 library. Next, transform the input JSON into a format amenable to the model and return a response:

训练好模型后 ,将工件腌制到model.pkl文件中,并将其存储在S3存储桶中。 然后使用Boto3库将模型加载到Zappa应用程序中。 接下来,将输入的JSON转换为适合模型的格式并返回响应:

Call zappa update YOUR_STAGE_NAME and the model is accessible via API. Hit the model with cURL request to test it:

调用zappa update YOUR_STAGE_NAME ,即可通过API访问该模型。 使用cURL请求命中模型以对其进行测试:

$ curl -F 'file=@payload.json' https://YOUR_ENDPOINT.execute-api.us-east-1.amazonaws.com/dev/handle_data_from_app

and watch the magic unfold in your CloudWatch logs:

并查看您的CloudWatch日志中的魔术:

Image for post

在无服务器ML模型中包括帮助器模块 (Including Helper Modules in Your Serverless ML Model)

The trickiest part of the setup above is highlighted in lines 63–66, where I call my helper module preprocess in the __main__ namespace. Without line 66, the model.pkl file performs various transformations, calling my preprocess module but producing errors that it cannot find that module name.

上面设置最棘手的部分在第63-66行中突出显示,在这里我在__main__命名空间中将我的助手模块称为preprocess 。 没有第66行, model.pkl文件将执行各种转换,调用我的preprocess模块,但会产生错误,导致找不到该模块名称。

This happened because, before pickling my model, I used a local module from helper import preprocess as part of the pipeline called from within .predict(). So when I want to reuse that model, the environment in which it is called was not identical. I spent hours trying to figure out how to get the environments to match-up. Here’s the key lesson: Zappa wraps up the dependencies and libraries installed in your virtual environment into a zip file that gets uploaded into S3, which forms the content of the Lambda function. Essentially,

发生这种情况的原因是,在对模型进行酸洗之前,我使用了from helper import preprocess的本地模块作为.p

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值