hashmap 从头到尾_从头到尾开发和销售机器学习应用程序

最新推荐文章于 2024-10-02 00:23:27 发布

weixin_26739165

最新推荐文章于 2024-10-02 00:23:27 发布

阅读量180

点赞数

文章标签：人工智能机器学习 python linux

原文链接：https://towardsdatascience.com/develop-and-sell-a-machine-learning-app-from-start-to-end-tutorial-ed5b5a2b6b2b

版权

hashmap 从头到尾

入门(Getting Started)

COVID-19预测端到端应用(COVID-19 prediction end-to-end app)

After developing and selling a Python API, I now want to expand the idea with a machine learning solution. So I decided to quickly write a COVID-19 prediction algorithm, deploy it, and make it sellable. If you want to see how I did it, check out the post for a step by step tutorial.

在开发和销售Python API之后，我现在想通过机器学习解决方案来扩展这个想法。因此，我决定快速编写COVID-19预测算法，进行部署并使其可销售。如果您想了解我的操作方法，请查看该帖子以获取分步教程。

目录 (Table of Contents)

关于本文(About this article)

In this article, I take the ideas from my previous article “How to sell a Python API from start to end” further and build a machine learning application. If the steps described here are too rough consider reading my previous article first.

在本文中，我将进一步借鉴上一篇文章“如何从头到尾销售Python API”的思想，并构建一个机器学习应用程序。如果此处描述的步骤过于粗糙，请考虑先阅读我的上一篇文章。

There are a number of new and more complicated issues to cover in this project:

此项目中涉及许多新的和更复杂的问题：

Machine Learning content. The application takes basic steps of building a Machine Learning model. This covers the preparation, but also the prediction.
机器学习内容。该应用程序采用了构建机器学习模型的基本步骤。这既包括准备工作，也包括预测。
In time evaluation (not in time training) of the prediction. This means that the dataset is freshly fetched and the prediction is performed on the latest data.
在时间评估中(而不是在时间训练中)进行预测。这意味着将重新获取数据集，并对最新数据执行预测。
Deployment. Deploying a Machine Learning app has various challenges. In this article, we met and solved the issue of outsourcing the trained model on AWS.
部署。部署机器学习应用程序面临各种挑战。在本文中，我们遇到并解决了在AWS上外包经过训练的模型的问题。
It is not only an API but also has a minor frontend.
它不仅是一个API，而且还有一个较小的前端。

It paints a picture for developing a Python API from start to finish and provides help in more difficult areas like the setup with AWS Lambda.

它描绘了从头到尾开发Python API的情况，并在更困难的领域(例如，使用AWS Lambda进行设置)提供了帮助。

There were various difficulties, which allowed me to learn more about the deployment and building process. It is also a great way to build side projects and maybe even make some money.

有很多困难，这使我可以了解有关部署和构建过程的更多信息。这也是构建辅助项目甚至可能赚钱的好方法。

As the Table of content shows, it consists of 4 major parts, namely:

如目录所示，它由四个主要部分组成，即：

Setting up the environment
搭建环境
Creating a problem solution with Python
使用Python创建问题解决方案
Setting up AWS
设置AWS
Setting up Rapidapi
设置Rapidapi

You will find all my code open on Github:

您会在Github上找到所有我打开的代码：

https://github.com/Createdd/ml_api_covid
https://github.com/Createdd/ml_api_covid

You will find the end result here on Rapidapi:

您将在Rapidapi上找到最终结果：

https://rapidapi.com/Createdd/api/covid_new_cases_prediction
https://rapidapi.com/Createdd/api/covid_new_cases_prediction

免责声明 (Disclaimer)

I am not associated with any of the services I use in this article.

我与本文中使用的任何服务都不相关。

I do not consider myself an expert. If you have the feeling that I am missing important steps or neglected something, consider pointing it out in the comment section or get in touch with me. Also, always make sure to monitor your AWS costs to not pay for things you do not know about.

我不认为自己是专家。如果您觉得我错过了重要的步骤或忽略了某些内容，请考虑在评论部分中指出，或者与我联系。此外，请始终确保监视您的AWS成本，以免支付您不知道的事情。

I am always happy for constructive input and how to improve.

我总是很乐于提供建设性的意见以及如何改进。

There are numerous things to improve and build upon. For example, the machine learning part has a very low effort. The preparation was very rough and many steps are missing. From my professional work, I am aware of this fact. However, I cannot cover every detail in one article. Nevertheless, I am curious to hear your suggestions on improvement in the comments. :)

有很多事情需要改进和发展。例如，机器学习部分的工作量很小。准备工作非常艰难，缺少许多步骤。从我的专业工作中，我知道这一事实。但是，我无法在一篇文章中涵盖所有细节。尽管如此，我很想听到您关于改进评论的建议。 :)

If you need more information on certain parts, feel free to point it out in the comments.

如果您需要某些部分的更多信息，请随时在注释中指出。

关于术语“教程” (Regarding the term “tutorial”)

I consider this as a step by step tutorial. However, as I am already too long working as a developer, I assume some knowledge of certain tools. This makes the tutorial probably an intermediate/advanced app.

我认为这是循序渐进的教程。但是，由于我从事开发人员的时间已经太长了，因此我假设您对某些工具有所了解。这使得本教程可能是中级/高级应用程序。

I assume knowledge of:

我假设了解以下内容：

Python
Python
Git
吉特
Jupyter Notebook
Jupyter笔记本
Terminal/Shell/Unix commands
终端/外壳/ Unix命令

堆叠使用(Stack used)

We will use

我们将使用

Github (Code hosting),
Github(代码托管)，
Anaconda (Dependency and environment management),
Anaconda(依赖性和环境管理)，
Docker (for possible further usage in microservices)
Docker(可能在微服务中进一步使用)
Jupyter Notebook (code development and documentation),
Jupyter Notebook(代码开发和文档)，
Python (programming language),
Python(编程语言)，
AWS, especially AWS Lambda and S3(for deployment),
AWS，尤其是AWS Lambda和S3(用于部署)，
Rapidapi (market to sell)
Rapidapi(出售市场)

1.创建项目手续 (1. Create project formalities)

It’s always the same but necessary. I do it along with these steps:

总是一样，但是必要。我按照以下步骤进行操作：

Create a local folder mkdir NAME
创建一个本地文件夹mkdir NAME
Create a new repository on Github with NAME
使用NAME在Github上创建一个新的存储库
Create conda environment conda create --name NAME python=3.7
创建conda环境conda create --name NAME python=3.7
Activate conda environment conda activate PATH_TO_ENVIRONMENT
激活conda activate PATH_TO_ENVIRONMENT环境conda activate PATH_TO_ENVIRONMENT
Create git repo git init
创建git repo git init
Connect to Github repo. Add Readme file, commit it and
连接到Github存储库。添加自述文件，提交并

git remote add origin URL_TO_GIT_REPO
git push -u origin master

2.为问题制定解决方案 (2. Develop a solution for a problem)

As we will develop a Machine Learning solution, a Jupyter Notebook will be very useful.

随着我们将开发机器学习解决方案，Jupyter Notebook将非常有用。

安装软件包并正确跟踪Jupyter文件 (Install packages and track Jupyter files properly)

Install jupyter notebook and jupytext:

安装jupyter笔记本和jupytext：

pip install notebook jupytext

在jupyter ipython kernel install --name NAME--user注册新环境

Set a hook in .git/hooks/pre-commit for tracking the notebook changes in git properly:

在.git/hooks/pre-commit设置一个钩子，以正确跟踪git中的笔记本更改：

touch .git/hooks/pre-commit
code  .git/hooks/pre-commit

copy this in the file

将其复制到文件中

#!/bin/sh
# For every ipynb file in the git index, add a Python representation
jupytext --from ipynb --to py:light --pre-commit

afterward for making the hook executable (on mac)

之后使挂钩可执行(在Mac上)

chmod +x .git/hooks/pre-commit

制定问题解决方案 (Develop a solution to a problem)

目标(The goal)

As currently, the world is in a pandemic I thought I use one of the multiple datasets for Covid-19 cases. Given the structure of the dataset, we want to predict the new cases of infections per day for a country.

就像目前一样，我认为我使用的是Covid-19病例的多个数据集之一。给定数据集的结构，我们希望预测一个国家每天的新感染病例。

pip install -r requirements.txt

This will install all packages we need. Have a look in the /development/predict_covid.ipynb notebook to see what libraries are used.

这将安装我们需要的所有软件包。在/development/predict_covid.ipynb笔记本中查看使用的库。

Most important are the libraries

最重要的是图书馆

pandas for transforming the dataset and
用于转换数据集的熊猫和
sklearn for machine learning
sklearn用于机器学习

For the following subheadings please check out the Jupyter notebook for more details:

对于以下子标题，请查看Jupyter笔记本以获取更多详细信息：

https://github.com/Createdd/ml_api_covid/blob/master/development/predict_covid.ipynb
https://github.com/Createdd/ml_api_covid/blob/master/development/predict_covid.ipynb

Image for post — https://github.com/Createdd/ml_api_covid/blob/master/development/predict_covid.ipynb https://github.com/Createdd/ml_api_covid/blob/master/development/predict_covid.ipynb

下载资料 (Download data)

We will use the dataset from https://ourworldindata.org/coronavirus-source-data in csv format.

我们将以csv格式使用https://ourworldindata.org/coronavirus-source-data中的数据集。

The license of data is Attribution 4.0 International (CC BY 4.0)
数据许可为Attribution 4.0 International(CC BY 4.0)
Source code available on Github
Github上可用的源代码

制备(Preparation)

In short, I did:

简而言之，我做到了：

Check for missing data
检查缺少的数据
Remove columns with more than 50% missing data
删除数据丢失率超过50％的列
Remove rows with remaining missing content like continent or isocode. (Not useful for my app solution which requires a country)
删除行，其中缺少剩余内容，例如大洲或等距编码。 (对于需要国家/地区的我的应用程序解决方案无效)
Encode categorical data with labels
使用标签编码分类数据
Fill in the remaining numerical missing data with the mean of the column
用列的平均值填写剩余的数字缺失数据
Split into training and test set
分为训练和测试集

创建分类器并预测(Create classifier and predict)

Create a Random Forest Regressor
创建一个随机森林回归器
Train it on the data and evaluate
训练数据并评估
Perform hyperparameter tuning with RandomizedSearchCV
使用RandomizedSearchCV执行超参数调整
Save the trained model
保存训练好的模型
Predict the new cases by providing a country name
通过提供国家/地区名称来预测新病例

建立服务器以使用REST执行功能(Build a server to execute the function with REST)

For the API functionality, we will use a Flask server (in app.py)

对于API功能，我们将使用Flask服务器(在app.py )

https://github.com/Createdd/ml_api_covid/blob/master/app.py
https://github.com/Createdd/ml_api_covid/blob/master/app.py

服务基本前端 (Serve basic frontend)

@app.route('/')
def home():
    return render_template("home.html")

Which serves a basic HTML and CSS file.

它提供基本HTML和CSS文件。

<!DOCTYPE html>
<html>
<head>
  <meta charset="UTF-8">
  <title>Predict Covid</title>
<link type="text/css" rel="stylesheet" href="{{ url_for('static', filename='./style.css') }}">
</head>
<body>
 <div class="page">
     <form if="form" action="{{ url_for('predict')}}"method="POST">
         <input type="text" name="country" placeholder="Country" required="required" /><br>
         <h3 class='res'>{{pred}}</h3>
         <button id="button" type="submit" class="btn btn-primary btn-block btn-large">Predict</button>
        </form>
 </div>
</body>
</html>

负荷预测(Load prediction)

This is a little more complex.

这有点复杂。

The key route is this:

关键路线是这样的：

@app.route('/predict',methods=['POST'])
def predict():
    input_val = [x for x in request.form.values()][0]
    rf = load_model(BUCKET_NAME, MODEL_FILE_NAME, MODEL_LOCAL_PATH)
    if input_val not in available_countries:
        return f'Country {input_val} is not in available list. Try one from the list! Go back in your browser', 400
    to_pred = get_prediction_params(input_val, url_to_covid)
    prediction = rf.predict(to_pred)[0]
    return render_template('home.html',pred=f'New cases will be {prediction}')

But before we can return the prediction result we need to get the latest data and pre-process it again. This is done with

但是在返回预测结果之前，我们需要获取最新数据并再次对其进行预处理。这是用

def pre_process(df):
    cols_too_many_missing = ['new_tests',
                             'new_tests_per_thousand',
                             'total_tests_per_thousand',
                             'total_tests',
                             'tests_per_case',
                             'positive_rate',
                             'new_tests_smoothed',
                             'new_tests_smoothed_per_thousand',
                             'tests_units',
                             'handwashing_facilities']
    df = df.drop(columns=cols_too_many_missing)
    nominal = df.select_dtypes(include=['object']).copy()
    nominal_cols = nominal.columns.tolist()
    for col in nominal_cols:
        col
        if df[col].isna().sum() > 0:
            df[col].fillna('MISSING', inplace=True)
        df[col] = encoder.fit_transform(df[col])
    numerical = df.select_dtypes(include=['float64']).copy()
    for col in numerical:
        df[col].fillna((df[col].mean()), inplace=True)
    X = df.drop(columns=['new_cases'])
    y = df.new_cases
    return X, y




def get_prediction_params(input_val, url_to_covid):
    df_orig = pd.read_csv(url_to_covid)
    _ = encoder.fit_transform(df_orig['location'])
    encode_ind = (encoder.classes_).tolist().index(input_val)
    df_orig[df_orig.location == input_val]
    X, _ = pre_process(df_orig)
    to_pred = X[X.location == encode_ind].iloc[-1].values.reshape(1,-1)
    return to_pred

Pre-process again transforms the downloaded dataset for machine learning purposes, whereas get_prediction_params takes the input value (which is the country to be predicted) and the URL to the latest dataset.

Pre-process再次转换下载的数据集以进行机器学习，而get_prediction_params获取输入值(这是要预测的国家/地区)和最新数据集的URL。

Those processes make the prediction true for the latest data but also slows down the app.

这些过程使对最新数据的预测正确，但也会降低应用程序的速度。

You might wonder why we do rf = load_model(BUCKET_NAME, MODEL_FILE_NAME, MODEL_LOCAL_PATH). The reason for this is that we need to load the pre-trained model from an AWS S3 bucket to save memory when executing everything with AWS Lambda. Scroll down for more details.

您可能想知道为什么我们这样做rf = load_model(BUCKET_NAME, MODEL_FILE_NAME, MODEL_LOCAL_PATH) 。这样做的原因是，当使用AWS Lambda执行所有操作时，我们需要从AWS S3存储桶中加载经过预先训练的模型以节省内存。向下滚动以获取更多详细信息。

But if we do not want to deploy it in the cloud we can simply do something like joblib.load(PATH_TO_YOUR_EXPORTED_MODEL). In the notebook, we export the model with joblib.dump. More info on model exports in the sklearn docs

但是，如果我们不想将其部署在云中，则可以简单地执行诸如joblib.load(PATH_TO_YOUR_EXPORTED_MODEL) 。在笔记本中，我们使用joblib.dump导出模型。 sklearn文档中有关模型导出的更多信息

But that is the mere functionality of the FLAK server. Providing a route for serving the HTML template and a route for prediction. Quite simple!

但这仅仅是FLAK服务器的功能。提供服务HTML模板的路径和预测的路径。非常简单！

Running now

现在开始

env FLASK_APP=app.py FLASK_ENV=development flask run

will start the server.

将启动服务器。

奖励：使用Docker可重现(BONUS: Make reproducible with Docker)

Maybe you want to scale the app or allow other people to test it more easily. For this, we can create a Docker container. I will not explain in detail how it works but if you are interested check one of the links in my “Inspiration” section.

也许您想扩展应用程序或允许其他人更轻松地对其进行测试。为此，我们可以创建一个Docker容器。我不会详细解释它是如何工作的，但是如果您有兴趣，请查看“灵感”部分中的链接之一。

Building a Docker container is not necessary for making this application work!

要使此应用程序正常工作，无需构建Docker容器！

创建Dockerfile (Create Dockerfile)

FROM python:3.7ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1ENV FLASK_APP=app.py
ENV FLASK_ENV=development# install system dependencies
RUN apt-get update \
    && apt-get -y install gcc make \
    && rm -rf /var/lib/apt/lists/*sRUN python3 --version
RUN pip3 --versionRUN pip install --no-cache-dir --upgrade pipWORKDIR /appCOPY ./requirements.txt /app/requirements.txtRUN pip3 install --no-cache-dir -r requirements.txtCOPY . .EXPOSE 8080CMD ["gunicorn", "--bind", "0.0.0.0:8080", "app:app"]

Note: the last line is for starting the Flask server

注意：最后一行是用于启动Flask服务器的

After creating the Dockerfile run

创建Dockerfile运行

docker build -t YOUR_APP_NAME .

and afterward

然后

docker run -d -p 80:8080 YOUR_APP_NAME

Afterward, you will see your app running on http://localhost/

之后，您将看到您的应用程序在http://localhost/上运行

3.部署到AWS(3. Deploy to AWS)

Until now this was a rather easy path. Nothing too complicated, nothing too fancy. Now that we come to deployment it gets interesting and challenging.

到目前为止，这是一条相当简单的道路。没有什么太复杂，没有太花哨。现在我们开始部署，它变得有趣而富有挑战性。

Again, I would strongly encourage you to check out my previous article https://towardsdatascience.com/develop-and-sell-a-python-api-from-start-to-end-tutorial-9a038e433966 if you have any issues with Zappa and AWS.

再次，如果您有任何疑问，强烈建议您阅读我以前的文章https://towardsdatascience.com/develop-and-sell-a-python-api-from-start-to-end-tutorial-9a038e433966 Zappa和AWS。

I will not go so much in detail here anymore but rather point out pain points.

我在这里不再赘述，而是指出痛点。

设置zappa (Set up zappa)

After we created the app locally we need to start setting up the hosting on a real server. We will use zappa.

在本地创建应用程序后，我们需要开始在真实服务器上设置托管。我们将使用zappa 。

Zappa makes it super easy to build and deploy server-less, event-driven Python applications (including, but not limited to, WSGI web apps) on AWS Lambda + API Gateway. Think of it as “serverless” web hosting for your Python apps. That means infinite scaling, zero downtime, zero maintenance — and at a fraction of the cost of your current deployments!

通过Zappa，可以在AWS Lambda + API网关上轻松构建和部署无服务器，事件驱动的Python应用程序(包括但不限于WSGI Web应用程序)。 将其视为Python应用程序的“无服务器”网络托管。 这意味着无限扩展，零停机时间，零维护，而成本仅为当前部署的一小部分！

pip install zappa

As we are using a conda environment we need to specify it:

当我们使用conda环境时，我们需要指定它：

which python

will give you /Users/XXX/opt/anaconda3/envs/XXXX/bin/python (for Mac)

将为您提供/Users/XXX/opt/anaconda3/envs/XXXX/bin/python (对于Mac)

remove the bin/python/ and export

删除bin/python/并导出

export VIRTUAL_ENV=/Users/XXXX/opt/anaconda3/envs/XXXXX/

Now we can do

现在我们可以做

zappa init

to set up the config.

设置配置。

Just click through everything and you will have a zappa_settings.json like

只需单击所有内容，您将获得一个zappa_settings.json例如

{
    "dev": {
        "app_function": "app.app",
        "aws_region": "eu-central-1",
        "profile_name": "default",
        "project_name": "ml-api-covid",
        "runtime": "python3.7",
        "s3_bucket": "zappa-eyy4wkd2l",
        "slim_handler": true,
        "exclude": [
            "*.joblib", "development", "models"
        ]
    }
}

NOTA BENE! Do not enter a name for the s3 bucket as it cannot be found. I really don’t know what the problem with naming your s3 bucket is, but it never worked. There were multiple error statements and I could not resolve this. Just leave the suggested one and everything works fine. ;)

NOTA BENE！请勿输入s3存储桶的名称，因为找不到该名称。我真的不知道命名您的s3存储桶是什么问题，但是它从来没有奏效。有多个错误声明，我无法解决。只需留下建议的一个，一切正常。 ;)

Note that we are NOT yet ready to deploy. First, we need to get some AWS credentials.

请注意，我们尚未准备好部署。首先，我们需要获取一些AWS凭证。

设置AWS (Set up AWS)

Note: This takes quite some effort. Do not be discouraged by the complexity of AWS and its policy management.

注意：这需要花费很多精力。不要因AWS及其策略管理的复杂性而灰心。

AWS凭证 (AWS credentials)

First, you need te get an AWS access key id and access key

首先，您需要获取一个AWS access key id和access key

使用IAM中的用户和角色设置凭据 (Set up credentials with users and roles in IAM)

I break it down as simple as possible:

我将其分解得尽可能简单：

Within the AWS Console, type IAM into the search box. IAM is the AWS user and permissions dashboard.
在AWS控制台中，在搜索框中键入IAM。 IAM是AWS用户和权限仪表板。
Create a group
建立群组
Give your group a name (for example zappa_group)
为您的群组命名(例如zappa_group)
Create our own specific inline policy for your group
为您的小组创建自己的特定内联政策
In the Permissions tab, under the Inline Policies section, choose the link to create a new Inline Policy
在“权限”选项卡的“内联策略”部分下，选择链接以创建新的内联策略。
In the Set Permissions screen, click the Custom Policy radio button and click the “Select” button on the right.
在“设置权限”屏幕中，单击“自定义策略”单选按钮，然后单击右侧的“选择”按钮。
Create a Custom Policy written in json format
创建以json格式编写的自定义策略
Read through and copy a policy discussed here: https://github.com/Miserlou/Zappa/issues/244
通读并复制此处讨论的策略： https : //github.com/Miserlou/Zappa/issues/244
Scroll down to “My Custom policy” see a snippet of my policy.
向下滚动到“我的自定义策略”，查看我的策略的摘要。
After pasting and modifying the json with your AWS Account Number, click the “Validate Policy” button to ensure you copied valid json. Then click the “Apply Policy” button to attach the inline policy to the group.
在使用您的AWS帐号粘贴并修改json之后，单击“ Validate Policy”按钮以确保您复制了有效的json。然后单击“应用策略”按钮，将内联策略附加到组。
Create a user and add the user to the group
创建一个用户并将该用户添加到组中
Back at the IAM Dashboard, create a new user with the “Users” left-hand menu option and the “Add User” button.
返回IAM仪表板，使用“用户”左侧菜单选项和“添加用户”按钮创建一个新用户。
In the Add user screen, give your new user a name and select the Access Type for Programmatic access. Then click the “Next: Permissions” button.
在“添加用户”屏幕中，为新用户命名，然后选择用于程序访问的访问类型。然后单击“下一步：权限”按钮。
In the Set permissions screen, select the group you created earlier in the Add user to group section and click “Next: Tags”.
在“设置权限”屏幕中，在“将用户添加到组”部分中选择您先前创建的组，然后单击“下一步：标签”。
Tags are optional. Add tags if you want, then click “Next: Review”.
标签是可选的。如果需要，添加标签，然后单击“下一步：查看”。
Review the user details and click “Create user”
查看用户详细信息，然后单击“创建用户”
Copy the user’s keys
复制用户的密钥
Don’t close the AWS IAM window yet. In the next step, you will copy and paste these keys into a file. At this point, it’s not a bad idea to copy and save these keys into a text file in a secure location. Make sure you don’t save keys under version control.
暂时不要关闭AWS IAM窗口。在下一步中，您将这些密钥复制并粘贴到文件中。此时，将这些密钥复制并保存到安全位置的文本文件中并不是一个坏主意。确保不要将密钥保存在版本控制下。

My Custom policy:

我的自定义政策：

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "iam:AttachRolePolicy",
                "iam:GetRole",
                "iam:CreateRole",
                "iam:PassRole",
                "iam:PutRolePolicy"
            ],
            "Resource": [
                "arn:aws:iam::XXXXXXXX:role/*-ZappaLambdaExecutionRole"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "apigateway:DELETE",
                "apigateway:GET",
                "apigateway:PATCH",
                "apigateway:POST",
                "apigateway:PUT",
                "events:DeleteRule",
                "events:DescribeRule",
                "events:ListRules",
                "events:ListRuleNamesByTarget",
                "events:ListTargetsByRule",
                "events:PutRule",
                "events:PutTargets",
                "events:RemoveTargets",
                "lambda:AddPermission",
                "lambda:CreateFunction",
                "lambda:DeleteFunction",
                "lambda:DeleteFunctionConcurrency",
                "lambda:GetAlias",
                "lambda:GetFunction",
                "lambda:GetFunctionConfiguration",
                "lambda:GetPolicy",
                "lambda:InvokeFunction",
                "lambda:ListVersionsByFunction",
                "lambda:RemovePermission",
                "lambda:UpdateFunctionCode",
                "lambda:UpdateFunctionConfiguration",
                "cloudformation:CreateStack",
                "cloudformation:DeleteStack",
                "cloudformation:DescribeStackResource",
                "cloudformation:DescribeStacks",
                "cloudformation:ListStackResources",
                "cloudformation:UpdateStack",
                "cloudfront:UpdateDistribution",
                "logs:DeleteLogGroup",
                "logs:DescribeLogStreams",
                "logs:FilterLogEvents",
                "route53:ListHostedZones"
            ],
            "Resource": [
                "*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:CreateBucket",
                "s3:ListBucket",
                "s3:ListBucketMultipartUploads",
                "s3:ListAllMyBuckets",
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::zappa-*",
                "arn:aws:s3:::*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:DeleteObject",
                "s3:GetObject",
                "s3:PutObject",
                "s3:AbortMultipartUpload",
                "s3:ListMultipartUploadParts"
            ],
            "Resource": [
                "arn:aws:s3:::zappa-*/*"
            ]
        }
    ]
}

As you can see in the policy, I added S3 related policies. This is because we want to download our pre-trained model from S3. More Infos on that later.

正如您在策略中看到的，我添加了S3相关策略。这是因为我们要从S3下载我们的预训练模型。稍后会有更多信息。

在项目中添加凭据 (Add credentials in your project)

Create a .aws/credentials folder in your root with

使用以下.aws/credentials在根目录中创建一个.aws/credentials文件夹

mkdir ~/.aws
code ~/.aws/credentials

and paste your credentials from AWS

并从AWS粘贴您的凭证

[dev]
aws_access_key_id = YOUR_KEY
aws_secret_access_key = YOUR_KEY

Same with the config

与config相同

code ~/.aws/config# and add:[default]
region = YOUR_REGION (eg. eu-central-1)

Note that code is for opening a folder with vscode, my editor of choice.

请注意，该code用于使用我选择的编辑器vscode打开一个文件夹。

Save the AWS access key id and secret access key assigned to the user you created in the file ~/.aws/credentials. Note the .aws/ directory needs to be in your home directory and the credentials file has no file extension.

将分配给您创建的用户的AWS访问密钥ID和秘密访问密钥保存在文件〜/ .aws / credentials中。请注意，.aws /目录必须位于您的主目录中，并且凭据文件没有文件扩展名。

部署 (Deploy)

Now you can do deploy your API with

现在，您可以使用

zappa deploy dev

However, there are a few things to consider:

但是，有几件事要考虑：

Zappa will pack your entire environment and whole root content. This will be quite large.
Zappa将打包您的整个环境和整个根目录内容。这会很大。
There is an upload limit for AWS Lambda
AWS Lambda的上传限制

减小上传大小(Reduce uploading size)

There are several discussions on how to reduce the upload size with zappa. Check out the Inspiration section for links.

关于如何使用zappa减小上传大小有一些讨论。查看“灵感”部分的链接。

First, we need to reduce the package size for the upload.

首先，我们需要减小上传文件的大小。

We will put all exploratory content into an own folder. I named it “development”. Afterwards, you can specify excluded files and folder in zappa_settings.json with exclude:

我们会将所有探索性内容放入一个单独的文件夹中。我称它为“发展”。之后，您可以在zappa_settings.json中使用exclude来指定排除的文件和文件夹：

{
    "dev": {
        ...
        "slim_handler": true,
        "exclude": [
            "*.ipynb", "*.joblib", "jupytext_conversion/", ".ipynb_checkpoints/",
            "predict_covid.ipynb", "development", "models"
        ]
    }
}

You can add everything that doesn’t need to be packaged for deployment.

您可以添加部署时不需要打包的所有内容。

Another issue is the environment dependencies. In our case, we have multiple dependencies, which we don’t need for deployment. To solve this I created a new “requirements_prod.txt” file. This shall only have dependencies which are needed on AWS.

另一个问题是环境依赖性。在我们的例子中，我们有多个依赖关系，我们不需要部署。为了解决这个问题，我创建了一个新的“ requirements_prod.txt”文件。这将仅具有AWS所需的依赖项。

Make sure to export your current packages with

确保使用以下命令导出当前软件包

pip freeze > requirements.txt

Afterward, uninstall all packages

之后，卸载所有软件包

pip uninstall -r requirements.txt -y

Install new packages for deployment and save them in the file

安装要部署的新软件包并将其保存在文件中

pip install Flask pandas boto3 sklearn zappapip freeze > requirements_prod.txt

When you hit zappa deploy dev there should be considerably less size to package.

当您使用zappa deploy dev ，应该打包的大小要小得多。

You will note that I also set slim_handler=true. This allows us to upload more than 50MB. Behind the scenes, zappa already puts content into an own S3 bucket. Read the zappa docs for more info.

您会注意到，我还设置了slim_handler=true 。这使我们可以上传50MB以上的内存。在幕后，zappa已将内容放入自己的S3存储桶中。阅读zappa文档以获取更多信息。

将模型加载到S3存储桶 (Load model to S3 Bucket)

Since we excluded our model from the AWS Lambda upload we need to get the model from somewhere else. We will use a AWS S3 Bucket.

由于我们从AWS Lambda上传中排除了模型，因此我们需要从其他地方获取模型。我们将使用AWS S3存储桶。

During the development process, I tried to upload it there programmatically as well but I just uploaded it by hand as it was just faster now. (But you can still try to upload it — I still have an outcommented file in the repo)

在开发过程中，我也尝试以编程方式将其上传到那里，但由于它现在速度更快，所以我只是手动上传了它。 (但是您仍然可以尝试上传它-我在仓库中仍然有一个注释过的文件)

Go to https://console.aws.amazon.com/s3/

转到https://console.aws.amazon.com/s3/

“create bucket”
“创建存储桶”
give a name and leave rest as default. check for sufficient permissions.
给一个名字，让休息作为默认。检查是否有足够的权限。
“create bucket”
“创建存储桶”

check if you have a sufficient policy for interacting with the bucket and boto3. You should have something similar to

检查您是否有足够的策略与存储桶和boto3进行交互。你应该有类似的东西

{
      "Effect": "Allow",
      "Action": [
        "s3:CreateBucket",
        "s3:ListBucket",
        "s3:ListBucketMultipartUploads",
        "s3:ListAllMyBuckets",
        "s3:GetObject"
      ],
      "Resource": [
        "arn:aws:s3:::zappa-*",
        "arn:aws:s3:::*"
      ]
    }

调试和更新 (Debugging and updates)

Finally, there shouldn’t be any errors anymore. However, if there are still some, you can debug with:

最后，应该不再有任何错误。但是，如果仍然有一些，可以使用以下命令进行调试：

zappa status# andzappa tail

The most common errors are permission related (then check your permission policy) or about python libraries that are incompatible. Either way, zappa will provide good enough error messages for debugging. Top list of errors from my experience are:

最常见的错误是与权限相关的(然后检查您的权限策略)或与不兼容的python库有关。无论哪种方式，zappa都将提供足够好的错误消息以进行调试。根据我的经验，最重要的错误是：

Policy issues with your user from IAM
IAM中与您的用户有关的政策问题
Zappa and size issues
Zappa和尺寸问题
Boto3 and permission/location of files issues
Boto3和文件的权限/位置问题

If you update your code don’t forget to update the deployment as well with

如果您更新代码，请不要忘记同时更新部署。

zappa update dev

AWS API Gateway-限制访问 (AWS API Gateway — restrict access)

To set up the API on a market we need to first restrict its usage with an API-key and then set it up on the market platform.

要在市场上设置API，我们需要首先使用API密钥限制其使用，然后在市场平台上进行设置。

To break it down:

分解：

go to your AWS Console and go to API gateway
转到您的AWS控制台并转到API网关
click on your API
点击您的API
we want to create an x-api-key to restrict undesired access to the API and also have a metered usage
我们想要创建一个x-api-key来限制对API的不期望访问，并且还要进行计量使用
create a Usage plan for the API, with the desired throttle and quota limits
使用所需的限制和配额限制为API创建使用计划
create an associated API stage
创建一个关联的API阶段
add an API key
添加API密钥
in the API key overview section, click “show” at the API key and copy it
在“ API密钥概述”部分中，单击API密钥上的“显示”并复制它
then associate the API with the key and discard all requests that come without the key
然后将API与密钥相关联，并丢弃所有没有密钥的请求
go back to the API overview. under resources, click the “/ any” go to the “method request”. then in settings, set “API key required” to true
返回API概述。在资源下，单击“ / any”转到“方法请求”。然后在设置中，将“需要API密钥”设置为true
do the same for the “/{proxy+} Methods”
对“ / {proxy +}方法”执行相同的操作

Now you have restricted access to your API.

现在您已经限制了对API的访问。

4.设置Rapidapi (4. Set up Rapidapi)

Add new API
添加新的API
Test endpoint with rapidapi
使用Rapidapi测试端点
Create code to consume API
创建代码以使用API

I will not go into detail in this article anymore. Again, check my previous https://towardsdatascience.com/develop-and-sell-a-python-api-from-start-to-end-tutorial-9a038e433966 for setting everything up. There is no big difference in my new machine learning model.

我不再在本文中详细介绍。再次检查我以前的https://towardsdatascience.com/develop-and-sell-a-python-api-from-start-to-end-tutorial-9a038e433966以进行所有设置。我的新机器学习模型没有太大区别。

最终结果 (End result)

https://rapidapi.com/Createdd/api/covid_new_cases_prediction

灵感 (Inspiration)

My main motivation this time came from Moez Ali, who provides great articles on deploying machine learning systems. I also enjoy following him on social media. I can recommend his articles:

这次我的主要动机来自Moez Ali ，他提供了许多有关部署机器学习系统的文章。我也喜欢在社交媒体上关注他。我可以推荐他的文章：

Also François Marceau with

还有FrançoisMarceau与

https://towardsdatascience.com/how-to-deploy-a-machine-learning-model-on-aws-lambda-24c36dcaed20
https://towardsdatascience.com/how-to-deploy-a-machine-learning-model-on-aws-lambda-24c36dcaed20

常见问题 (Common issues)

https://github.com/Miserlou/Zappa/issues/1927 Package Error: python-dateutil
https://github.com/Miserlou/Zappa/issues/1927软件包错误：python-dateutil
https://stackabuse.com/file-management-with-aws-s3-python-and-flask/
https://stackabuse.com/file-management-with-aws-s3-python-and-flask/
https://ianwhitestone.work/zappa-zip-callbacks/ remove unnecessary files in zappa
https://ianwhitestone.work/zappa-zip-callbacks/删除zappa中不必要的文件
https://stackoverflow.com/questions/62941174/how-to-write-load-machine-learning-model-to-from-s3-bucket-through-joblib
https://stackoverflow.com/questions/62941174/how-to-write-load-machine-learning-model-to-from-s3-bucket-through-joblib

补充阅读 (Additional reading)

最终链接(Final links)

Open source code:

开源代码：

https://github.com/Createdd/ml_api_covid
https://github.com/Createdd/ml_api_covid

On Rapidapi:

在Rapidapi上：

https://rapidapi.com/Createdd/api/covid_new_cases_prediction
https://rapidapi.com/Createdd/api/covid_new_cases_prediction

关于 (About)

Daniel is an entrepreneur, software developer, and lawyer. He has worked at various IT companies, tax advisory, management consulting, and at the Austrian court.

Daniel是一位企业家，软件开发人员和律师。他曾在多家IT公司，税务咨询，管理咨询以及奥地利法院工作。

His knowledge and interests currently revolve around programming machine learning applications and all its related aspects. To the core, he considers himself a problem solver of complex environments, which is reflected in his various projects.

他的知识和兴趣目前围绕着编程机器学习应用程序及其所有相关方面。从根本上讲，他认为自己是复杂环境的问题解决者，这反映在他的各个项目中。

Don’t hesitate to get in touch if you have ideas, projects, or problems.

如果您有想法，项目或问题，请随时与我们联系。