statistical machine learning 01 introduce

Content List

  • statistical machine-learning

    • 1.1. learning object data

    • 1.2. main machine-learning

    • 1.3. machine-learning steps

    • 1.4. importance of machine-learning

  • supervised-learning introduce

    • 2.1 basic concept

    • 2.2 formalization of problem

  • three factors of machine learning

    • 3.1. model

    • 3.2. strategy

    • 3.3. algorithm

statistical machine learning

1.1 learning object data

它从 数据 出发,提取数据特征, 抽象出数据模型, 发现数据中的知识,又回到数据的分析与预测。

1.2 main machine learning

  1. supervised learning

  2. unsupervised learning

  3. semi-supervised learning

  4. reinforcemnt learning

1.3 machine learning steps

  1. 得到一个有限的训练数据集合

  2. 确定 包含所有可能的 模型的假设空间 model hypothesis space

  3. 确定 模型选择的准则 model strategy

  4. 实现求解最优 模型学习的算法 model algorithm

  5. 通过学习方法选择 best model

  6. 利用learning best model 对新数据进行 预测 或 分析

1.4 importance of ml

  1. 处理海量数据的有效手段

  2. 计算机智能化的有效手段

  3. 计算机科学发展的重要部分

应用领域 : 人工智能、模式识别、数据挖掘、NLP、图像识别、信息检索、生物信息 ...

supervised-learning

监督学习是学习模型,使模型能对 任何输入(input) 都能产生一个预测性的 输出 (output)。

2.1 basic concept

  • feature space

  • output space

每个具体的 input 是一个实例 instance, 通常由 feature vector 表示. 这时,所有 feature vector 存在的空间成为 feature space, feature space 的每一维对应于一个 feature.

将 input, output 看作是定义在 input(feature) space 与 output space 上随机变量的取值.

(1) input instane feature vector 记作 :

$$
x = (x_i^{(1)}, x_i^{(2)}, ..., x_i^{(i)}, ..., x_i^{(n)})^T
$$

$x_i^{(i)}$ 表示 x 的第 i 个 feature

supervised learning learning model from training data sets, then to predict the test data, training data by input(feature vector) and output composition.

training sets 训练集表示为 :

$$
T = { (x_1, y_1), (x_2, y_2), ... , (x_N, y_N) }
$$

test data input、output对 成为 sample(样本) and 样本点

classification、regression、tagging

  • input、output 都为 离散变量 的 prediction problem,称为 classification(分类) 问题

  • input、output 都为 连续变量 的 prediction problem,称为 regression(回归) 问题

  • input、output 都为 变量序列 的 prediction problem,称为 tagging(标注) 问题

(2) 联合概率分布

supervised learning 假设 input、output 的随机变量 X 和 Y 遵循 联合概率分布 P(X, Y)

联合分布 more_info

在学习的过程中,假设 P(X, Y) 存在,但对学习系统而言 P(X, Y) 具体定义是未知的。training data、test data 被看作是 依 联合概率分布 P(X, Y) 独立同分布产生的。这就是 supervised learning 关于数据的基本假设。

hypothesis space 假设空间

supervised learning. model 属于由 input-space to output-space 的映射的集合, 这个集合就是 hypothesis space 假设空间.

y的集合么 ??

model of supervised learning 可以是 概率模型 或 非概率模型。由 条件概率分布 $P(X|Y)$ 或 decision function 决策函数 $Y = f(X)$ 表示. 对具体的 输入 进行输出预测 $P(y|x)$ or $y = f(x)$

2.2 formalization of problem

1356004513_5839.jpg

训练集 :

$$
T = { (x_1, y_1), (x_2, y_2), ... , (x_N, y_N) }
$$

$(x_i, y_i)$ 称为 样本点 sample

$x_i in chi subseteq R^n$ 输入观测值

$y_i in Y$ 输出观测值

three factors of ml

method = model + strategy + algorithm

3.1 model

supervised learning,model 就是所要学习的 条件概率分布 (conditional probability) 或 决策函数(decision function)

hypothesis space 可以定义为 decision function 的集合

$$
F = { f | Y = f(x) }
$$

$$
F = { f | Y = f_theta(x) , theta in R^n }
$$

hypothesis space 可以定义为 conditional probability 的集合

$$
F = { P | P(Y|X) }
$$

$$
F = { P | P_theta(Y|X) , theta in R^n }
$$

3.2 strategy

machine-learning 的目标 在于 从 hypothesis space 选取 best model。

loss function

supervised-learning 问题是在 hypothesis-space F select model $f$ as decision-function, 输出的预测值 $f(X)$ 与 真实值 $Y$, 可能不一致。loss function 记作 : $L(Y, f(X))$

  • 0-1 loss function

$$
L(Y, f(X))=left{
begin{array}{ll}
1, &mbox{$Yne f(x)$}\
0, &mbox{$Y= f(x)$}
end{array}
right.
$$

  • quadratic loss function

  • absolute loss function

  • logarithmic loss function / log-likelihood loss function

3.3 algorithm

algorithm is learning model concrete [ˈkɑ:ŋkri:t] method. learning model from training data sets,by learning strategy, select best-model from hypothesis space.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Pratap Dangeti, "Statistics for Machine Learning" English | ISBN: 1788295757 | 2017 | EPUB | 311 pages | 12 MB Key Features Learn about the statistics behind powerful predictive models with p-value, ANOVA, F-statistics. Implement statistical computations programmatically for supervised and unsupervised learning through K-means clustering. Master the statistical aspect of machine learning with the help of this example-rich guide in R & Python. Book Description Complex statistics in machine learning worries a lot of developers. Knowing statistics helps in building strong machine learning models that are optimized for a given problem statement. This book will teach you all it takes to perform complex statistical computations required for machine learning. You will gain information on statistics behind supervised learning, unsupervised learning, reinforcement learning, and more. You will see real-world examples that discuss the statistical side of machine learning and make you comfortable with it. You will come across programs for performing tasks such as model, parameters fitting, regression, classification, density collection, working with vectors, matrices, and more.By the end of the book, you will understand concepts of required statistics for Machine Learning and will be able to apply your new skills to any sort of industry problems. What you will learn Understanding Statistical & Machine learning fundamentals necessary to build models Understanding major differences & parallels between statistics way of solving problem & machine learning way of solving problem Know how to prepare data and "feed" the models by using the appropriate machine learning algorithms from the adequate R & Python packages Analyze the results and tune the model appropriately to his or her own predictive goals Understand concepts of required statistics for Machine Learning Draw parallels between statistics and machine learning Understand each component of machine learning models and see impact of changing them

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值