统计机器学习(机器学习) 概念

统计机器学习(机器学习) 概念

该文章是作者阅读相关书籍和资料后,总结和归纳的一些个人认为有必要整理和了解的知识点介绍。与大家一起分享,如有不妥之处,还望指正。

 

统计(机器)学习

1.统计学习概念

       统计学习也叫统计机器学习(statistical machine learning),是概率论、统计学、信息论、计算理论、最优化理论及计算机科学等多个领域的交叉学科,并且在发展中逐步形成独自的理论体系与方法论。现在提及的机器学习往往指的是统计机器学习。

       Herbert A.Simon对“学习”的定义是:如果一个系统能够通过执行某个过程来改进它的性能,这就是学习。

       Mitchell,1997 给学习的一个形式化定义:假设用P来评估计算机程序在某一任务类T上的性能,若一个程序通过利用经验E在T中任务上获得了性能改善,则我们就说关于T和P,该程序对E进行了学习。

      1956年,在IBM公司研发了西洋跳棋程序的Arthur Samuel 发明了machine learning 一词,将其定义为“不显式编程地赋予计算机能力的研究领域”。

2. 基本假设

       统计学习关于数据的基本假设是同类数据具有一定的统计规律性。这里的同类数据指的是具有同类性质的数据,“某一类”的数据。之所以称之为“某一类”就说明数据呈现相同的性质,而往往也具有相同的统计特性

3. 统计学习方法

        统计学习由监督学习、半监督学习、无监督学习、强化学习等组成。主要讨论有监督学习(有监督学习研究相对而言比较成熟和深入,也具有代表性)。书本的概括如下:

        从给定的、有限的、用于学习的训练数据集合出发,假设数据是独立同分布产生的;并且假设要学习的模型属于某个函数的集合,称为假设空间(hypothesis space);应用某个评价准则,从假设空间中选取一个最优的模型,使得它对已知训练数据及未知测试数据在给定的评价准则下有最优的预测;最优模型的选取由算法实现。这样统计学习方法就包括模型的假设空间、模型的选择准则和模型的学习算法,称为统计学习方法的三要素,简称:模型(model)、策略(strategy)、算法(algorithm)。可以更形象的称之为:

                 方法=设想+指标+方案

4. 基本术语

机器学习:  machine learning

模型:  model

数据集:  data set

示例: instance

样本: sample

属性: attribute

特征: feature

属性值: attribute value

属性空间: attribute space

样本空间(输入空间): sample space

特征向量: feature vector

维数(维度): dimensionality

学习: learning

训练: training

训练样本: training sample

训练集: training set

假设: hypothesis

真实(真相): ground truth

预测: prediction

标记(标签): label

样例: example

标记空间(输出空间):label space

分类: classification

回归:  regression

二分类: binary classification

正类: positive class

反类: negative class

多分类: multi-class classification

测试: testing

测试样本: testing sample

聚类: clustering

簇: cluster

有监督学习: supervised learning

无监督学习: unsupervised learning

泛化: generalization

分布: distribution

独立同分布: independent and identically distributed. i.i.d.

归纳: induction

演绎: deduction

特化: specialization

概念: concept

版本空间: version space

输入空间:input space

输出空间: output space

特征空间: feature space

决策函数: decision function

损失函数: loss function

主要参考资料:

《统计学习方法》李航著

《机器学习实战》Peter Harrington著 李锐等译

《机器学习》周志华 著

 《模式分类》Richard O.Duda 等著

Pratap Dangeti, "Statistics for Machine Learning" English | ISBN: 1788295757 | 2017 | EPUB | 311 pages | 12 MB Key Features Learn about the statistics behind powerful predictive models with p-value, ANOVA, F-statistics. Implement statistical computations programmatically for supervised and unsupervised learning through K-means clustering. Master the statistical aspect of machine learning with the help of this example-rich guide in R & Python. Book Description Complex statistics in machine learning worries a lot of developers. Knowing statistics helps in building strong machine learning models that are optimized for a given problem statement. This book will teach you all it takes to perform complex statistical computations required for machine learning. You will gain information on statistics behind supervised learning, unsupervised learning, reinforcement learning, and more. You will see real-world examples that discuss the statistical side of machine learning and make you comfortable with it. You will come across programs for performing tasks such as model, parameters fitting, regression, classification, density collection, working with vectors, matrices, and more.By the end of the book, you will understand concepts of required statistics for Machine Learning and will be able to apply your new skills to any sort of industry problems. What you will learn Understanding Statistical & Machine learning fundamentals necessary to build models Understanding major differences & parallels between statistics way of solving problem & machine learning way of solving problem Know how to prepare data and "feed" the models by using the appropriate machine learning algorithms from the adequate R & Python packages Analyze the results and tune the model appropriately to his or her own predictive goals Understand concepts of required statistics for Machine Learning Draw parallels between statistics and machine learning Understand each component of machine learning models and see impact of changing them
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值