Machine Learning Concepts

最新推荐文章于 2023-11-27 15:15:08 发布

jiongjiongai

最新推荐文章于 2023-11-27 15:15:08 发布

阅读量310

点赞数

分类专栏：机器学习文章标签：机器学习

本文链接：https://blog.csdn.net/phoenix198425/article/details/78873920

版权

机器学习专栏收录该内容

28 篇文章 0 订阅

订阅专栏

参考： Machine Learning Concepts ，周志华的西瓜书《机器学习》。

Machine learning (ML) can help you use historical data to make better business decisions. ML algorithms discover patterns in data, and construct mathematical models using these discoveries. Then you can use the models to make predictions on future data. For example, one possible application of a machine learning model would be to predict how likely a customer is to purchase a particular product based on their past behavior.

Building a Machine Learning Application

Building ML applications is an iterative process that involves a sequence of steps. To build an ML application, follow these general steps:

Frame the core ML problem(s) in terms of what is observed and what answer you want the model to predict.
Collect, clean, and prepare data to make it suitable for consumption by ML model training algorithms. Visualize and analyze the data to run sanity checks to validate the quality of the data and to understand the data.
Often, the raw data (input variables) and answer (target) are not represented in a way that can be used to train a highly predictive model. Therefore, you typically should attempt to construct more predictive input representations or features from the raw variables.
Feed the resulting features to the learning algorithm to build models and evaluate the quality of the models on data that was held out from model building.
Use the model to generate predictions of the target answer for new data instances.

概念	含义
data set 数据集
instance 示例， sample 样本，feature vector 特征向量	数据集的一条记录
attribute 属性， feature 特征
attribute space 属性空间，sample space 样本空间，输入空间
dimensionality 维度
learning 学习，training 训练	通过执行学习算法从数据中学得模型的过程
training data 训练数据	训练过程中使用的数据
training sample 训练样本	训练数据中的一个样本
training set 训练集	训练样本组成的集合
hypothesis 假设
ground-truth 真相或真实
prediction 预测
label 标记
example 样例	拥有了标记信息的示例
label space 标记空间，输出空间
testing 测试	使用模型进行测试的过程
testing sample 测试样本， testing instance 测试示例	用于测试的样本
generalization 泛化	将模型应用于新样本
induction 归纳	泛化过程
deduction 演绎
specialization 特殊化
inductive learning 归纳学习
concept 概念
概念学习，概念形成	狭义的归纳学习
version space 版本空间	与训练集相一致的假设集合
inductive bias 归纳偏好
Occam’s razor 奥卡姆剃须刀	选择最简单的那个一致的假设
error rate	错误率，分类最常用的性能度量
accuracy 精度	= 1 - 错误率
error 误差
empirical error 经验误差
training error 训练误差，empirical error 经验误差
generalization error 泛化误差
overfitting 过拟合
underfitting 欠拟合
model selection 模型选择	学习算法，参数的选择
testing set 测试集
testing error 测试误差
hold-out 留出法
sampling 采样
stratified sampling 分层采样	保留类别比例的采样方式
fidelity 保真性	使用数据集训练出的模型与使用训练集训练出的模型的一致性
cross validation 交叉验证法
Leave-One-Out 留一法
bootstrapping 自助法
parameter tuning 调参
validation set 验证集
performance measure 性能度量
MSE mean squared error 均方误差	回归最常用的性能度量