Supervised Learning监督学习
- regression problems
- classification problems
Regression:
集合set D,N个点,输入x,输出t
D
=
D =
D= {
(
x
n
,
t
n
)
:
n
=
1
,
.
.
.
,
N
(x_n,t_n):n = 1, ..., N
(xn,tn):n=1,...,N}
goal: is to predict the output t for a test, as of yet unobserved, input x.
Classification:
集合set D,N个点,输入x,输出t
D
=
D =
D= {
(
x
n
,
t
n
)
:
n
=
1
,
.
.
.
,
N
(x_n,t_n):n = 1, ..., N
(xn,tn):n=1,...,N}
Supervised Learning:
Memorizing vs. learning:记忆指记住哪个x对应哪个t。学习指用x和t的对应找到普世的关系
No Free Lunch Theorem无免费午餐:
如果我们不对特征空间有先验假设,则所有算法的平均表现是一样的。
Inductive Bias 归纳偏置:在机器学习中,很多学习算法经常会对学习的问题做一些假设,这些假设就称为归纳偏置
Frequentist Supervised Learning 频率监督学习:
Based on the training set D, learning aims at deriving a hard predictor t ˆ D ( x ) tˆD(x) tˆD(x) or a soft predictor q D ( t ∣ x ) qD(t|x) qD(t∣x), where the subscript is used to emphasize the dependence on D.
Inference vs Learning:
When the population distribution p(x,t) is known, we don’t need data D and we have a standard inference problem, as studied in the previous chapter.
When the distribution p(x,t) is not known or not tractable, we have a learning problem.
EXAMPLE:
Supervised Training of Deterministic Models: ERM 经验风险最小化
对于上一题的example,我们把他当成输出t是许多x的n阶的函数相加。
假设一个model class H,hard predictors为
t
ˆ
(
⋅
∣
θ
)
tˆ(·|θ)
tˆ(⋅∣θ), θ是 set Θ的vector。(就是系数拉!)
H表示为
hard predictors表示为
定义特征向量 a feature vector:
定义模型参数向量model parameter vector:
其中M越多越难学习。越容易过拟合
Loss Function
Loss Function
l
(
t
,
t
ˆ
(
⋅
∣
θ
)
)
l(t,tˆ(·|θ))
l(t,tˆ(⋅∣θ)) 用来评测 hard predictor
t
ˆ
(
⋅
∣
θ
)
tˆ(·|θ)
tˆ(⋅∣θ) 的优劣。
對於regression,常用
对于classification
Population Loss
目的是减少Population Loss,这取决于model parameter vector θ
In the context of learning, the population loss is also known as generalization or out-of-sample loss, since it can be interpreted as the average loss measured on an independently generated test pair (x,t) ∼ p(x,t). Unlike inference, however, we do not know p(x,t)!
TrainingLoss ?
The training loss measures the empirical average of the loss accrued by the predictor tˆ(·|θ) on the examples of the training set. As such, the training loss LD(θ) is an estimate of the population loss Lp(θ) based on the training data set. Note that this estimate is just that, and hence we generally have LD(θ) != Lp(θ).
Law of Large Numbers ?![在这里插入图片描述](https://i-blog.csdnimg.cn/blog_migrate/b079178ee61a6156bd016cf9dd489951.png)
Empirical Risk Minimization (ERM)
已知training loss
ERM要让training loss最小化
比較熟悉单词:
distribution分布
potentially 潜在的
infinite 无限
Inference推论
【var】variance 方差
【arg】argument 自变量
没见过单词:
名词:
hard/soft predictor 软给概率硬直接出答案
quadratic loss : loss function 的一种,开平方
joint distribution 联合分布(当a情况时,b的概率)
true distribution
population distribution
empirical distribution 经验分布
【EDF】Empirical Distribution Functions经验分布函数
【CDF】cumulative distribution function 累积分布函数
i.i.d (independent and identically distributed) 独立且均匀分布的随机变量 ?
prior/posterior 前验/后验(分布)?
empirial risk 经验风险
ERM 经验风险最小化
inductive bias 归纳偏置 (对学习的问题做一些假设)
Population Loss ?
Training Loss ?
网站参考:
https://www.cnblogs.com/shengyang17/p/11037582.html