亚马逊机器学习服务的实例基本操作 -- 自定义ML模型

最新推荐文章于 2024-04-22 09:58:57 发布

XuYongshi02

最新推荐文章于 2024-04-22 09:58:57 发布

阅读量2.3k

点赞数

分类专栏： AWS入门文章标签：机器学习亚马逊实例 AWS ML

本文链接：https://blog.csdn.net/xuyongshi02/article/details/53945129

版权

AWS入门专栏收录该内容

8 篇文章 0 订阅

订阅专栏

亚马逊机器学习服务的实例基本操作 -- 自定义ML模型
Amazon Machine Learning

https://aws.amazon.com/machine-learning/

Steps

Step 1: Prepare Your Data
Step 2: Create a Training Datasource
Step 3: Create an ML Model
Step 4: Review the ML Model's Predictive Performance
Step 5: Use the ML Model to Generate Predictions
Step 6: Clean Up

准备数据(清洗，转换...)→选模型→检查结果→预测新数据→清理

1. 查看控制台

2. 选择数据集

3. 创建模型 – 自定义

4. 调整食谱 Recipe

缺省Recipe内容

{

"groups": {

"NUMERIC_VARS_QB_50": "group('emp_var_rate','cons_price_idx')",

"NUMERIC_VARS_QB_500": "group('campaign','age')",

"NUMERIC_VARS_QB_10": "group('duration','cons_conf_idx','previous','nr_employed','euribor3m','pdays')"

"assignments": {},

"outputs": [

"ALL_BINARY",

"ALL_CATEGORICAL",

"quantile_bin(NUMERIC_VARS_QB_50,50)",

"quantile_bin(NUMERIC_VARS_QB_500,500)",

"quantile_bin(NUMERIC_VARS_QB_10,10)"

]

}

修改为：

{

"groups": {

"NUMERIC_VARS_QB_10": "group('emp_var_rate','campaign')"

"assignments": {

"myassign" : "quantile_bin(NUMERIC_VARS_QB_10, 10)"

"outputs": [

"ALL_BINARY",

"ALL_CATEGORICAL",

"myassign"

]

}

5. 设置训练参数

Training Parameters

参考http://docs.aws.amazon.com/machine-learning/latest/dg/training-parameters.html?icmpid=docs_machinelearning_console

可以设置的参数如下：

Maximum model size

Maximum number of passes over training data

Shuffle type

Regularization type

Regularization amount

6. 设置评估参数

7. 概览

8. 切换到Dashboard

9. 使用某行原始数据

36,admin.,married,university.degree,no,no,no,cellular,jun,mon,174,1,3,1,success,-2.9,92.963,-40.8,1.266,5076.2

（原始数据，第10行，原始的目标列的值为 1）

回归分析，给出的预测值为 0.38

附录

1. 什么是Recipe

参考http://docs.aws.amazon.com/machine-learning/latest/dg/feature-transformations-with-data-recipes.html

a）特征变量需要处理，有两种方式，一种是上传AWS之前自处理，另种是AWS预定义的数据转换功能（即Recipe）。

举例：

如event发生的时间，整体来看，只发生一次，没有意义，但是，如果拆分出小时或者weekday，也许可以预测在哪段时间发生频率高。

b）三个基本部分

参考：http://docs.aws.amazon.com/machine-learning/latest/dg/recipe-format-reference.html

注释符： //

不是严格的JSON格式，只有以下三个部分：

Groups
Assignments
Outputs

内建的groups 有：

ALL_TEXT, ALL_NUMERIC, ALL_CATEGORICAL, ALL_BINARY

ALL_INPUTS

outputs节，说明了ML模型，能够“看到”哪些数据，包含的项目，可以是

组，变量名字，或者函数

其中，组的定义，来自Groups节，

变量，可以是原始的字段名，或者是Assignments节中定义的临时变量，

函数，具体有哪些，可以参考 AWS 中相关文档。

c）处理方法

语法参考：http://docs.aws.amazon.com/machine-learning/latest/dg/data-transformations-reference.html

比较重要的，举例说明如下：

quantile_bin(var1, 50)

var1是Numeric类型的，根据其数值，分到50个categorial中，即分到50个组中

组的大小，必须>= 5 && <= 1000。

normalize(var1)

将变量var1的值归一化。

更常用的，如normalize(ALL_NUMERIC) ，归一化每条记录。

cartesian(var1, var2)

对变量var1 和 var2，求他们的笛卡尔乘积。

2. 二分分类Binary Model

横轴，是得分 Score，一般使用 Sigmoid函数把输入值映射到区间 (0, 1) 内。

纵轴，是频度，某个分值Score 出现的次数。

ML Model Accuracy

Correct Predictions

True positive (TP): Amazon ML predicted the value as 1, and the true value is 1.
True negative (TN): Amazon ML predicted the value as 0, and the true value is 0.

Erroneous Predictions

False positive (FP): Amazon ML predicted the value as 1, but the true value is 0.
False negative (FN): Amazon ML predicted the value as 0, but the true value is 1.

Advanced Metrics

Accuracy, precision, recall, and false positive rate.

Accuracy

Accuracy (ACC) measures the fraction of correct predictions. The range is 0 to 1. A larger value indicates better predictive accuracy:

Precision

Precision measures the fraction of actual positives among those examples that are predicted as positive. The range is 0 to 1. A larger value indicates better predictive accuracy: