ML机器学习笔记（1）：基本概念

最新推荐文章于 2023-03-02 22:13:19 发布

kickss

最新推荐文章于 2023-03-02 22:13:19 发布

阅读量372

点赞数

本文链接：https://blog.csdn.net/milkbusy/article/details/84190529

版权

Computing 同时被 2 个专栏收录

20 篇文章 0 订阅

订阅专栏

Machine learning

7 篇文章 0 订阅

订阅专栏

吾生也有涯，而知也无涯。庄子

对象：问题的分类

机器学习可以解决什么样子的问题呢？分为两类（三类）问题。

回归问题：监督学习。一种数值连续随机变量进行预测和建模的监督学习算法，比如：股票预测，成绩变化，机票预测等连续变化的案例。回归任务的特点是数据有数值型的目标变量，即每个样本都有数值供监督学习使用。
分类问题：监督学习。一种对离散型随机变量建模或预测的方法，比如：垃圾邮件分类，信用卡异常使用。许多回归算法都有与其相应的分类算法，分类算法的返回值一般是预测一个类别的概率。
聚类问题：无监督学习。基于数据的内部结构寻找样本之间的内在联系，比如：物以类聚，人以群分。文章推荐，新闻聚类。这类的问题数据无标签，多数像是公务员行测题：比如聚类：黄香蕉，红苹果，红帽子。

关系：数据挖掘，优化理论，统计学

Relation to data mining

Machine learning and data mining often employ the same methods and overlap significantly, but while machine learning focuses on prediction, based on known properties learned from the training data, data mining focuses on the discovery of (previously) unknown properties in the data (this is the analysis step of knowledge discovery in databases). Data mining uses many machine learning methods, but with different goals; on the other hand, machine learning also employs data mining methods as “unsupervised learning” or as a preprocessing step to improve learner accuracy. Much of the confusion between these two research communities (which do often have separate conferences and separate journals, ECML PKDD being a major exception) comes from the basic assumptions they work with: in machine learning, performance is usually evaluated with respect to the ability to reproduce known knowledge, while in knowledge discovery and data mining (KDD) the key task is the discovery of previously unknown knowledge. Evaluated with respect to known knowledge, an uninformed (unsupervised) method will easily be outperformed by other supervised methods, while in a typical KDD task, supervised methods cannot be used due to the unavailability of training data.

Relation to optimization

Machine learning also has intimate ties to optimization: many learning problems are formulated as minimization of some loss function on a training set of examples. Loss functions express the discrepancy between the predictions of the model being trained and the actual problem instances (for example, in classification, one wants to assign a label to instances, and models are trained to correctly predict the pre-assigned labels of a set of examples). The difference between the two fields arises from the goal of generalization: while optimization algorithms can minimize the loss on a training set, machine learning is concerned with minimizing the loss on unseen samples.

Relation to statistics

Machine learning and statistics are closely related fields. According to Michael I. Jordan, the ideas of machine learning, from methodological principles to theoretical tools, have had a long pre-history in statistics.[16] He also suggested the term data science as a placeholder to call the overall field.
Leo Breiman distinguished two statistical modelling paradigms: data model and algorithmic model,[17] wherein “algorithmic model” means more or less the machine learning algorithms like Random forest.
Some statisticians have adopted methods from machine learning, leading to a combined field that they call statistical learning.