工业机器人入门实用教程_机器学习实用入门

最新推荐文章于 2022-03-28 16:13:37 发布

weixin_26752765

最新推荐文章于 2022-03-28 16:13:37 发布

阅读量936

点赞数

文章标签：人工智能机器学习 python java

原文链接：https://towardsdatascience.com/a-practical-introduction-to-machine-learning-f43d8badc5a7

版权

工业机器人入门实用教程

Following on from my earlier post on Data Science, here I will try to summarize and compile the major practical concepts of Machine Learning in a handy, easy to use, language-agnostic, reference guide format. Most of the information is presented as short and succinct bullet points. I expect this to be especially valuable to beginners or as a quick look-up for those with a basic level of experience in data science and machine learning.

在我之前关于数据科学的文章之后，我将在这里尝试以一种方便，易于使用，与语言无关的参考指南格式总结和编译机器学习的主要实用概念。大多数信息以简短的要点表示。我希望这对初学者特别有用，或者对于那些具有数据科学和机器学习基础知识的人来说是快速查找。

入门概念 (Introductory Concepts)

Let us get some basic terminologies out of our way first:

让我们首先摆脱一些基本术语：

Structured data refers to data stored in a predefined format, e.g., tables, spreadsheets or relational databases
结构化数据是指以预定义格式存储的数据，例如表，电子表格或关系数据库
Unstructured data, on the other hand, does not have a predefined format and therefore, cannot be saved in a tabular form. Unstructured data can come in a variety of types, e.g., blobs of text, images, videos, audio files
另一方面， 非结构化数据没有预定义的格式，因此无法以表格形式保存。非结构化数据可以有多种类型，例如，文本，图像，视频，音频文件的斑点
Categorical data is any data that can be labeled and usually comprises of a range of fixed values, e.g., gender, nationality, risk grades. Categorical data can be either nominal (without any inherent ordering, e.g., gender) or ordinal (ordered or ranked data, e.g., risk grades). These fixed values are known as classes or categories
分类数据是可以被标记的任何数据，通常包含一系列固定值，例如性别，国籍，风险等级。分类数据可以是名义数据(没有任何固有的排序，例如性别)或排序数据(排序或排序的数据，例如风险等级)。这些固定值称为类或类别
Features or Predictors: input data/variables used by an ML model, usually denoted with X, to predict the target variable
特征或预测变量：ML模型使用的输入数据/变量，通常用X表示，以预测目标变量
Target variable: the data point that we want to predict by an ML model, often represented with y
目标变量 ：我们要通过ML模型预测的数据点，通常用y表示
Classification problem involves predicting a discrete class of a categorical target variable, e.g., spam or not, default or non-default
分类问题涉及预测分类目标变量的离散类别，例如，是否为垃圾邮件，默认还是非默认
Regression problem deals with predicting a continuous numeric value, e.g., sales, house price
回归问题涉及预测连续的数值，例如销售，房价
Feature Engineering: transforming existing features or engineering new input features that can potentially be more useful during model training. E.g., calculating the number of months from today for a date variable
特征工程 ：转换现有特征或工程新的输入特征，这些特征在模型训练期间可能会更加有用。例如，为日期变量计算从今天起的月数
Training, Validation & Test data: Training data is used during initial model training/fitting. Validation data is used to evaluate the model, usually to fine-tune model parameters or identify the most suitable ML model among many. Test data is used for the final evaluation of a short-listed or fine-tuned model
训练，验证和测试数据 ：训练数据用于初始模型训练/拟合。验证数据用于评估模型，通常用于微调模型参数或识别众多模型中最合适的ML模型。测试数据用于最终入围或微调模型的最终评估
Overfitting happens when a model performs well on the training data but poorly on the test/validation data, i.e., fails to generalize adequately on new and unseen data
当模型在训练数据上表现良好但在测试/验证数据上表现不佳时，即无法在新的和看不见的数据上充分归纳，就会发生过度拟合
Underfitting occurs when the model is not complex and robust enough that it is unable to learn the variable relationships from the training data, and has low accuracy even when applied on the data on which it was trained
当模型不够复杂且不够健壮以至于无法从训练数据中学习变量关系，并且即使将其应用于训练数据时，其准确性也很低，就会发生欠拟合
Model bias and variance: A model is said to be biased when it performs poorly on the training dataset as a result of underfitting. Variance is associated with how well or poorly the model performs on the test/validation set, with a high variance being usually caused by overfitting
模型偏差和方差 ：当模型由于训练不足而在训练数据集上表现不佳时，被认为是偏差的。方差与模型在测试/验证集上的表现有多好或差有关，通常由过度拟合导致高方差
Generalization, closely related to overfitting and model variance, refers to a model’s ability to make correct predictions on new, previously unseen data
泛化与过度拟合和模型方差密切相关，是指模型针对新的，以前看不见的数据做出正确预测的能力
Regularization techniques improve the generalizability of a model, e.g., through penalizing or shrinking regression coefficients towards zero
正则化技术例如通过将回归系数降低或缩小至零来提高模型的通用性
Ensemble Learning is a modeling technique that combines multiple models into one
集成学习是一种将多种模型组合为一个模型的建模技术
Baseline Model is a naive model/heuristic used as a reference point to evaluate a conventional ML model
基线模型是一种天真的模型/启发式方法，用作评估常规ML模型的参考点
Hyperparameters are the specific model parameters that can be tweaked during model training
超参数是在模型训练期间可以调整的特定模型参数

数据清理和特征工程 (Data Cleaning and Feature Engineering)

Data cleaning transforms raw data into a form and format that can be effectively and efficiently processed by ML models. Despite their perceived intelligence and robustness, the GIGO principle remains valid in ML. Refer to my previous article for further details.

数据清理将原始数据转换为可以由ML模型有效处理的形式和格式。尽管它们具有智能和鲁棒性，但GIGO原则在ML中仍然有效。有关更多详细信息，请参阅我的上一篇文章。

Deal with missing data:

处理丢失的数据：

Drop all records with missing features — not recommended
删除所有缺少功能的记录-不建议
Heuristic-based imputation using domain knowledge
使用领域知识进行启发式插补
Mean/median/mode imputation of missing values
缺失值的均值/中位数/众数推算
Use a random value or a constant to fill in missing data
使用随机值或常量填写缺失的数据
Utilize k Nearest Neighbors or a linear regression model to predict and impute missing values
利用k最近邻或线性回归模型来预测和估算缺失值

Some other typical data cleaning tasks include:

其他一些典型的数据清理任务包括：

Identify and delete zero-variance features
识别和删除零方差特征
Identify, and potentially drop, features that exhibit multicollinearity, or a high degree of pairwise correlation
识别并可能表现出多重共线性或高度成对相关性的特征
Evaluate features with low variance or near-zero variance utilizing domain knowledge. Mostly applicable for numerical and nominal categorical data
利用领域知识评估具有低方差或接近零方差的特征。最适用于数值和名义分类数据
Drop, if applicable, duplicate records
删除(如果适用)重复记录
Identify outliers and determine an appropriate strategy to deal with them — either drop them, trim them, or leave them as it is since some ML models can effectively deal with outliers
识别异常值并确定适当的策略来对其进行处理-丢弃它们，修剪它们或保留它们原样，因为某些ML模型可以有效地处理异常值

特征工程 (Feature Engineering)

Feature engineering is more of an art than science and relies predominantly on one’s domain knowledge. Done correctly, it has the potential to increase a model’s predictive power.

要素工程更多地是一门艺术，而不是科学，并且主要依靠一个人的领域知识。正确完成后，它有可能增加模型的预测能力。

Feature engineering techniques for numerical data:

数值数据的特征工程技术：

Scale, normalize, or standardize using log scales, z-scores, min-max
使用对数刻度，z分数，最小-最大来缩放，归一化或标准化
Create new features using mathematical or statistical interaction(s) within raw numerical features, e.g., through addition, subtraction, or a statistical test
使用原始数值特征内的数学或统计相互作用来创建新特征，例如，通过加法，减法或统计检验
Utilize statistical transformations to convert skewed distributions to Gaussian-like, e.g., log/power and Box-Cox transform
利用统计变换将偏态分布转换为类似高斯的分布，例如对数/幂和Box-Cox变换
Dimensionality reduction techniques, e.g., Principal Component Analysis (PCA)
降维技术，例如主成分分析(PCA)
Binning a numerical feature into categories is generally not recommended. However, there are certain use cases (e.g., credit risk scoring) where it is a proven and well-researched industry best practice
通常不建议将数字特征分类。但是，在某些用例(例如，信用风险评分)中，这是行之有效且经过充分研究的行业最佳实践

Feature engineering techniques for categorical data:

分类数据的特征工程技术：

Ordinal encoding: convert ordered categorical data into numerical values, e.g., Good, Bad, Worse converts to 1, 2, 3
顺序编码：将有序的分类数据转换为数值，例如，良好，不良，较差转换为1、2、3
One-hot encoding for nominal categorical data. Each feature’s category is converted to a separate column where its presence is denoted by 1 and absence by 0. E.g., [USA, UK, Australia] is converted to [[1, 0, 0], [0, 1, 0], [0, 0, 1]]
名义分类数据的一键编码。每个地图项的类别都将转换为单独的列，其中的存在由1表示，不存在由0表示。例如，[美国，英国，澳大利亚]转换为[[1、0、0]，[0、1、0]， [0，0，1]]
Certain specific techniques that are widely utilized in Natural Language Processing, e.g., feature hashing scheme and word embeddings
在自然语言处理中广泛使用的某些特定技术，例如，特征哈希方案和单词嵌入

建模概述和原则 (Modeling Overview and Principles)

So what exactly does a Machine Learning model do? Given a set of features, X in training data, an ML model tries to iteratively find an ideal statistical function (often called the training function) that maps X to the training data’s target variable, y, most accurately. Finding the ideal training function usually involves making certain assumptions about the underlying data and its form. This training function is then used to predict y on any new data in the future, given X.

那么，机器学习模型到底能做什么？ 给定训练数据中的一组特征X ，一个ML模型试图迭代找到理想的统计函数(通常称为训练函数)，该函数将X最准确地映射到训练数据的目标变量y 。寻找理想的训练功能通常涉及对基础数据及其形式进行某些假设。然后，在给定X ，使用该训练函数来预测未来任何新数据的y 。

一些基本的建模原理 (Some basic modeling principles)

Here we will touch on some general modeling related principles and philosophies:

在这里，我们将探讨一些与一般建模有关的原理和理念：

Model accuracy and its interpretability tradeoff

模型的准确性及其可解释性的权衡

Better model accuracy usually results in relatively lower model interpretability.

更好的模型准确性通常会导致相对较低的模型可解释性。

Complex models, like deep neural networks and ensemble decision trees, usually perform better than simpler models. However, they are much less interpretable in the sense that the training function is not easy to understand for a layperson. Simpler models, like linear regression, logistic regression, and a single decision tree, are easily interpretable at the cost of accuracy.

诸如深层神经网络和整体决策树之类的复杂模型通常比简单模型表现更好。但是，就培训人员不容易理解的培训功能而言，它们的解释性要差得多。诸如线性回归，逻辑回归和单个决策树之类的简单模型很容易以准确性为代价进行解释。

Consider a simple logistic regression model. It provides us with the coefficients for each feature that, in turn, provide us with insights as to how useful that feature is for the prediction problem.

考虑一个简单的逻辑回归模型。它为我们提供了每个特征的系数，进而为我们提供了该特征对预测问题的有用性的见解。

Therefore, model selection, at times, is usually driven by the level of complexity and interpretability required. Some domains, like credit scoring, more or less mandate the use of an easily interpretable model. Hence, logistic regression has been historically and widely used in credit scoring problems. However, for image detection, recognition, and natural language processing, interpretability is less of a concern. Therefore, complex, deep neural networks can be safely deployed in these domains.

因此，有时模型选择通常由所需的复杂性和可解释性级别决定。某些领域(例如信用评分)或多或少要求使用易于解释的模型。因此，逻辑回归在历史上一直广泛用于信用评分问题。但是，对于图像检测，识别和自然语言处理，可解释性就不那么重要了。因此，可以在这些域中安全地部署复杂的深度神经网络。

Bias/Variance tradeoff

偏差/方差折衷

Generally speaking, a model’s bias can be improved either by parameter tuning or selecting an altogether different model, while variance can be reduced with more training data, regularization techniques, or preventing any data leakage between training and test sets. Bias and variance can be simultaneously improved only to a certain extent, beyond which an improvement in one will usually result in deterioration of another.

一般而言，可以通过参数调整或选择完全不同的模型来改善模型的偏差，而可以通过使用更多的训练数据，正则化技术来减少方差，或者防止训练和测试集之间的任何数据泄漏。偏差和方差只能同时在一定程度上得到改善，超过这一点通常会导致另一方的恶化。

This tradeoff is very common and requires a delicate balance in practice. Note that there will always be some unavoidable variance due to random noise in data, which makes it practically impossible to reduce variance to 0.

这种折衷是很常见的，并且在实践中需要微妙的平衡。请注意，由于数据中的随机噪声，总会有一些不可避免的方差，这实际上使方差减小到0是不可能的。

Occam’s Razor

奥卡姆剃刀

Occam’s Razor is a general philosophical principle that states: if there are two explanations for an event or a fact, then it is highly likely that the simplest explanation with the least number of assumptions will most likely be correct. When applied to ML, Occam’s Razor principle implies that, when comparing two models with similar predictive power or accuracy, we should select the simpler model.

奥卡姆(Occam)的《剃刀》(Razor)是一条普遍的哲学原理，其中指出：如果对事件或事实有两种解释，那么假设数量最少的最简单解释很可能是正确的。当应用于ML时，Occam的Razor原理意味着，当比较具有相似预测能力或准确性的两个模型时，我们应该选择更简单的模型。

No Free Lunch Theorem

没有免费的午餐定理

No single machine learning model works best for all possible problems. Wolpert and Macready¹ stated: “If an algorithm performs better than random search on some class of problems then it must perform worse than random search on the remaining problems”. It is common to try multiple relevant models and find one that works best for your particular problem.

没有任何一种机器学习模型能够最好地解决所有可能出现的问题。 Wolpert和Macready¹表示：“如果算法在某些类别的问题上比随机搜索要好，那么它在其余问题上的性能必须比随机搜索差”。通常尝试多个相关模型，然后找到最适合您特定问题的模型。

分类法建模 (Modeling Taxonomy)

Machine learning models can be classified in several ways, some of which are:

机器学习模型可以通过几种方式分类，其中一些是：

Parametric vs. Non-Parametric

参数与非参数

Parametric models make strong assumptions on training data to identify and simplify the training function to a known form, the parameters of which fully describe and capture the relationships between features and the target variable.

参数模型对训练数据进行了强有力的假设，以识别训练函数并将其简化为已知形式，其参数完全描述并捕获了特征与目标变量之间的关系。

For example, a linear regression model assumes a linear relationship between the input features and the target variable and will try to find the best possible linear function for the same. However, if this assumption is invalid, then the model will predict poor results.

例如，线性回归模型假设输入要素与目标变量之间存在线性关系，并将尝试为其找到最佳的线性函数。但是，如果此假设无效，则该模型将预测不良结果。

Examples of parametric models include linear regression, logistic regression, and Naive Bayes.

参数模型的示例包括线性回归，逻辑回归和朴素贝叶斯。

Non-Parametric models don’t make strong assumptions about either the training data or the form of the training function and, thereby, are generally more flexible but at the cost of potential overfitting.

非参数模型对训练数据或训练函数的形式均未作强力假设，因此通常更灵活，但以潜在的过度拟合为代价。

Examples of non-parametric models include k-Nearest Neighbors, decision trees, and Support Vector Machines (SVM).

非参数模型的示例包括k最近邻，决策树和支持向量机(SVM)。

Supervised vs. Unsupervised

有监督与无监督

Supervised models try to predict the known target variable(s), while unsupervised models do not have any knowledge about the target variable(s) beforehand. The goal of unsupervised learning is to try and understand the relationships between the variables or observations.

监督模型试图预测已知的目标变量，而无监督模型则事先不了解目标变量。无监督学习的目的是试图理解变量或观察值之间的关系。

Supervised models include linear regression, logistic regression, and decision trees while unsupervised models include k-Nearest neighbors, SVM, isolation forest, and PCA

有监督的模型包括线性回归，逻辑回归和决策树，而无监督的模型包括k最近邻居，SVM，隔离林和PCA

Blackbox vs. DescriptiveBlackbox models utilize multiple complex algorithms to make decisions, but we do not know how the decision/prediction is arrived at. E.g., deep learning and neural networks.

黑盒与描述性黑盒模型利用多种复杂算法来做出决策，但是我们不知道决策/预测是如何得出的。例如，深度学习和神经网络。

Descriptive models provide clear insight into why and how they make their decisions. E.g., linear regression, logistic regression, decision trees

描述性模型可以清楚地了解为什么以及如何做出决策。例如，线性回归，逻辑回归，决策树

Oft-used ML Models

常用的ML模型

Some of the widely used ML models, outside the deep machine learning domain, include:

深度机器学习领域之外的一些广泛使用的ML模型包括：

Linear Regression is a simple and extensively used supervised learning model for regression problems. It predicts a numerical target variable on the assumption that there is a linear relationship between features and the target variable
线性回归是用于回归问题的简单且广泛使用的监督学习模型。它假设要素与目标变量之间存在线性关系，从而预测数字目标变量
Logistic Regression is extensively used to predict the class, or the probability of being assigned to it, given a set of features. Hence, it is a supervised model for classification problems. Logistic regression assumes that all features have a linear relationship with the log-odds (logit) of the target variable. Need to be very careful when predicting imbalanced classes
给定一组功能， Logistic回归广泛用于预测类别或分配给类别的概率。因此，它是分类问题的监督模型。 Logistic回归假定所有要素与目标变量的对数奇数(logit)具有线性关系。预测班级不平衡时需要非常小心
k-Nearest Neighbors (KNN) is a supervised model that can be used for both classification and regression problems. It predicts through a simple majority vote of the k number of nearest neighbors of each observation through a distance metric (most commonly Euclidean distance)
k最近邻(KNN)是一种监督模型，可用于分类和回归问题。它通过距离度量(最常见的是欧几里得距离)通过简单的多数表决来预测每个观测值的k个最近邻居
k-Means Clustering is an unsupervised clustering algorithm that assigns observations to groups in a manner that minimizes variance between individual observations within each cluster
k均值聚类是一种无监督的聚类算法，它以最小化每个聚类中各个观察之间的方差的方式将观察分配给组
Decision Trees are supervised models that can be utilized for both regression and classification problems using a sequence of rules. A single decision tree is rarely used in practice, given its risk of overfitting. Instead, ensemble or bagging concepts are utilized to minimize the model variance
决策树是监督模型，可以使用一系列规则来解决回归和分类问题。考虑到过度决策的风险，实际上很少使用单个决策树。取而代之的是，采用集合或装袋的概念来最小化模型差异
Random Forest is an ensemble model that combines multiple decision trees through the concept of bagging to reduce model error
随机森林是一个集成模型，通过装袋的概念组合了多个决策树以减少模型错误
Support Vector Machine (SVM) is a supervised classification model that aims to find an ideal hyperplane or boundary between all the possible classes that maximizes the distance between them. This hyperplane is then used for classification
支持向量机(SVM)是一种监督分类模型，旨在在所有可能的类之间找到理想的超平面或边界，以使它们之间的距离最大化。然后将此超平面用于分类

模型评估 (Model Evaluation)

But how good is our model in making predictions? Model evaluation gives us the answer.

但是我们的模型在做出预测方面有多好？模型评估为我们提供了答案。

评估策略 (Evaluation Strategies)

A brief overview of the various evaluation strategies follows:

各种评估策略的简要概述如下：

Train/Test Split

训练/测试拆分

Divide the complete dataset into two subsets, called training and test (usually in 80/20, 75/25, or 70/30 split). Train the model on the training set and apply evaluation metrics on model predictions made on the test set. This is not the ideal approach as there is no separate dataset to test, evaluate, and compare model parameters (called hyperparameter optimization) or multiple models.

将完整的数据集分为两个子集，称为训练和测试(通常以80 / 20、75 / 25或70/30分割)。在训练集上训练模型，并将评估指标应用于在测试集上进行的模型预测。这不是理想的方法，因为没有单独的数据集可以测试，评估和比较模型参数(称为超参数优化)或多个模型。

Conducting such evaluation on the test set and using the results to tune the model will result in data leakage from the test set to the training set and unreliable final evaluation metrics. This is because we are using information from the test set (that should be considered as new, unseen data that we will encounter in production) to train the model.

在测试集上进行此类评估，并使用结果来调整模型，将导致数据从测试集泄漏到训练集，并导致不可靠的最终评估指标。这是因为我们正在使用测试集中的信息(应将其视为生产中将会遇到的新的，看不见的数据)来训练模型。

Train/Validation/Test Split

训练/验证/测试拆分

Divide the complete dataset into three subsets. Train single or multiple models on the training set and apply preliminary evaluation metrics on model predictions made on the validation set. Use these results to fine-tune a single model or select the optimal model. Once a final model has been selected, apply it to the yet untouched test dataset and evaluate thereon. This prevents any data leakage and is a better, but not the ideal evaluation approach.

将完整的数据集分为三个子集。在训练集上训练单个或多个模型，并将初步评估指标应用于在验证集上做出的模型预测。使用这些结果可以微调单个模型或选择最佳模型。选择最终模型后，将其应用于尚未修改的测试数据集并在其上进行评估。这样可以防止任何数据泄漏，是一种更好的方法 ，但不是理想的评估方法。

Cross-Validation (CV)

交叉验证(CV)

CV is applied to the training set after a train/test split. CV splits the training set into multiple subsets (called folds) and fits the model on all but one sets and evaluates it on the holdout set. This would result in multiple evaluation metrics (dependent upon the number of folds), the average and standard deviation of which is used to select the final model.

训练/测试拆分后，将CV应用于训练集。 CV将训练集分为多个子集(称为折叠)，并将模型拟合到除一组以外的所有子集上，并在保留集上对其进行评估。这将导致多个评估指标(取决于折数)，其平均值和标准差用于选择最终模型。

Once the final model is selected, it is trained again on the whole training set and evaluated on the test set, which was left untouched during the entire process. CV is the ideal model evaluation approach.

一旦选择了最终模型，就可以在整个训练集上对其进行再次训练，并在测试集上进行评估，而测试集在整个过程中都保持不变。简历是理想的模型评估方法 。

Some of the standard CV techniques include:

一些标准的简历技术包括：

Leave One Out CV (LOOCV): fit and train the model on all but one observations
留下一个简历(LOOCV)：除一个观测值外，对模型进行拟合和训练
k-Fold CV: fit and train the model on k-1 number of folds and evaluate on the holdout set
k折CV：以k-1折数拟合并训练模型，并根据保留集进行评估
Repeated k-Fold CV: similar to k-Fold CV but the process is repeated for a specified number of times
重复k折CV：类似于k折CV，但该过程重复指定的次数
Stratified k-Fold CV: similar to k-Fold CV but here the folds are made by preserving the percentage of samples for each target class. Useful for imbalanced data
分层k折CV：类似于k折CV，但此处的折痕是通过保留每种目标类别的样品百分比来进行的。有用的不平衡数据
Repeated Stratified k-Fold CV: a combination of Repeated k-Fold CV and Stratified k-Fold CV
重复的分层k折CV：重复的k折叠CV和分层的k折CV的组合

评估指标 (Evaluation Metrics)

There are tens of model evaluation metrics out there, some of the more widely used are described below:

有数十种模型评估指标，下面介绍一些更广泛使用的指标：

Classification Metrics:

分类指标：

Accuracy: the ratio of correct predictions to the total number of predictions. Not suitable for an imbalanced dataset
准确性：正确预测与预测总数之比。不适合不平衡的数据集
Precision: the ratio of true positives to the total number of predicted positives
精度：真实阳性与预测阳性总数的比率
Recall, also known as Sensitivity or True Positive Rate (TPR): the ratio of true positives to the actual number of positives
回忆，也称为敏感度或真实阳性率(TPR)：真实阳性与实际阳性数量的比率
F-Score: a single score to measure Precision and Recall together
F分数：单个分数，可同时测量“精确度”和“召回率”
Area Under the Receiver Operating Characteristic Curve (AUROC): a single number that summarizes the information of a ROC curve
接收器工作特性曲线( AUROC )下的区域：一个数字，总结了ROC曲线的信息
Brier Score, Cohen’s Kappa Statistic, etc.
Brier得分，科恩的Kappa统计信息等。

Refer to my previous article for further details on these metrics.

有关这些指标的更多详细信息，请参阅我的上一篇文章。

Regression Metrics

回归指标

Mean Absolute Error (MAE): the average absolute difference between actual and predicted values
平均绝对误差(MAE) ：实际值与预测值之间的平均绝对差
Median Absolute Error: the median of absolute differences between actual and predicted values
中位数绝对误差 ：实际值和预测值之间的绝对差的中值
Mean Squared Error (MSE): the average of the squared differences between actual and predicted values
均方误差(MSE) ：实际值和预测值之间的平方差的平均值
Root Mean Squared Error (RMSE): a simple root of MSE
均方根误差(RMSE)： MSE的简单根