课堂学习笔记
机器学习—基础算法二
https://blog.csdn.net/fan2312/article/details/100854485
回归
-
均方误差: M S E = 1 m ∑ i = 1 m ( y i − y i ^ ) 2 MSE = \frac{1}{m}\sum^m_{i=1}(y_i-\hat{y_i})^2 MSE=m1∑i=1m(yi−yi^)2
-
均方根误差: 标准误差: R M S E = M S E RMSE = \sqrt{MSE} RMSE=MSE
-
总平方和: T S S = ∑ i = 1 m ( y i − y ‾ ) 2 TSS = \sum^m_{i=1}(y_i-\overline{y})^2 TSS=∑i=1m(yi−y)2
-
伪方差: V a r ( Y ) = T S S / m Var(Y) = TSS/m Var(Y)=TSS/m
-
残差平方和: R S S = ∑ i = 1 m ( y i ^ − y i ) 2 RSS=\sum^m_{i=1}(\hat{y_i} - y_i)^2 RSS=∑i=1m(yi^−yi)2
- 即误差平方和 SSE
-
R 2 = T S S − R S S T S S = 1 − R S S T S S R^2 = \frac{TSS-RSS}{TSS}=1-\frac{RSS}{TSS} R2=TSSTSS−RSS=1−TSSRSS
- R 2 R^2 R2越大,拟合效果越好
- R 2 R^2 R2的最优值为1;若模型预测为随机值, R 2 R^2 R2有可能为负
- 若预测值恒为样本期望, R 2 R^2 R2为0
-
回归平方和: E S S = ∑ i = 1 m ( y i ^ − y ‾ ) 2 ESS=\sum^m_{i=1}(\hat{y_i}-\overline{y})^2 ESS=∑i=1m(yi^−y)2
- T S S ≥ E S S + R S S TSS\geq ESS+RSS TSS≥ESS+RSS
- 只有无偏估计时, T S S = E S S + R S S TSS=ESS+RSS TSS=ESS+RSS
-
局部加权线性回归(LWR)
- 目标函数
- 权值的设置:
- 高斯核函数
- w ( i ) = exp ( − ( x ( i ) − x ) 2 2 τ 2 ) w^{(i)}=\exp(-\frac{(x^{(i)}-x)^2}{2\tau^2}) w(i)=exp(−2τ2(x(i)−x)2)
- τ \tau τ称为带宽,控制训练样本随着与 x ( i ) x^{(i)} x(i)距离的衰减速率
- 多项式核函数:
- κ ( x 1 , x 2 ) = ( < x 1 , x 2 > + R ) d \kappa(x_1,x_2) = (<x_1,x_2>+R)^d κ(x1,x2)=(<x1,x2>+R)d
- 高斯核函数
-
Logistic回归(sigmoid)
- h θ ( x ) = g ( θ T x ) = 1 1 + e − θ T x h_\theta(x)=g(\theta^Tx)=\frac{1}{1+e^{-\theta^{T}x}} hθ(x)=g(θTx)=1+e−θTx1
- g ′ ( x ) = ( 1 1 + e − x ) ′ = g ( x ) ⋅ ( 1 − g ( x ) ) g'(x)=(\frac{1}{1+e^{-x}})'=g(x)\cdot(1-g(x)) g′(x)=(1+e−x1)′=g(x)⋅(1−g(x))
- 参数估计
- 假定
- P ( y = 1 ∣ x ; θ ) = h θ ( x ) P(y=1|x;\theta)=h_\theta(x) P(y=1∣x;θ)=hθ(x)
- P ( y = 0 ∣ x ; θ ) = 1 − h θ ( x ) P(y=0|x;\theta)=1-h_\theta(x) P(y=0∣x;θ)=1−hθ(x)
- p ( y ∣ x ; θ ) = ( h θ ( x ) ) y ( 1 − h θ ( x ) ) 1 − y p(y|x;\theta)=(h_\theta(x))^y(1-h_\theta(x))^{1-y} p(y∣x;θ)=(hθ(x))y(1−hθ(x))1−y
- 似然函数:
L
(
θ
)
=
p
(
y
→
∣
X
;
θ
)
L(\theta)=p(\overrightarrow{y}|X;\theta)
L(θ)=p(y∣X;θ)
= ∏ i = 1 m p ( y ( i ) ∣ x ( i ) ; θ ) =\prod^m_{i=1}p(y^{(i)}|x^{(i)};\theta) =∏i=1mp(y(i)∣x(i);θ)
= ∏ i = 1 m ( h θ ( x ( i ) ) ) y ( i ) ( 1 − h θ ( x ( i ) ) ) 1 − y ( i ) =\prod^m_{i=1}(h_\theta(x^{(i)}))^{y^{(i)}}(1-h_\theta(x^{(i)}))^{1-y^{(i)}} =∏i=1m(hθ(x(i)))y(i)(1−hθ(x(i)))1−y(i) - l ( θ ) = l o g L ( θ ) = ∑ i = 1 m y ( i ) l o g h ( x ( i ) ) + ( 1 − y ( i ) ) l o g ( 1 − h ( x ( i ) ) ) l(\theta)=logL(\theta)=\sum^m_{i=1}y^{(i)}logh(x^{(i)})+(1-y^{(i)})log(1-h(x{^{(i)}})) l(θ)=logL(θ)=∑i=1my(i)logh(x(i))+(1−y(i))log(1−h(x(i)))
- ∂ l ( θ ) ∂ θ j = ∑ i = 1 m ( y ( i ) − g ( θ T x ( i ) ) ) ⋅ x j ( i ) \frac{\partial l(\theta)}{\partial\theta_j}=\sum^m_{i=1}(y^{(i)}-g(\theta^Tx^{(i)}))\cdot x^{(i)}_j ∂θj∂l(θ)=∑i=1m(y(i)−g(θTx(i)))⋅xj(i)
- 参数更新
- θ j : = θ j + α ( y ( i ) − h θ ( x ( i ) ) ) x j ( i ) \theta_j:=\theta_j+\alpha(y^{(i)}-h_\theta(x^{(i)}))x_j^{(i)} θj:=θj+α(y(i)−hθ(x(i)))xj(i)
- 假定
- 用于分类
- 损失
- y i = { − 1 , 1 } y_i=\begin{Bmatrix}-1,1\end{Bmatrix} yi={−1,1}
- y ^ i = { p i y i = 1 1 − p i y i = − 1 \hat{y}_i=\begin{cases} p_i &y_i=1\\ 1-p_i & y_i=-1\end{cases} y^i={pi1−piyi=1yi=−1
- 似然函数 L ( θ ) = ∏ i = 1 m p i ( y i + 1 ) / 2 ( 1 − p i ) − ( y i − 1 ) / 2 L(\theta)=\prod^m_{i=1}p^{(y_i+1)/2}_i(1-p_i)^{-(y_i-1)/2} L(θ)=∏i=1mpi(yi+1)/2(1−pi)−(yi−1)/2
-
l
n
L
(
θ
)
⇒
l
(
θ
)
=
∑
(
i
=
1
)
m
l
n
[
p
i
(
y
i
+
1
)
/
2
(
1
−
p
i
)
−
(
y
i
−
1
)
/
2
]
lnL(\theta)\Rightarrow l(\theta)=\sum^m_{(i=1)}ln[p^{(y_i+1)/2}_i(1-p_i)^{-(y_i-1)/2}]
lnL(θ)⇒l(θ)=∑(i=1)mln[pi(yi+1)/2(1−pi)−(yi−1)/2]
p i = 1 1 + e − f i → l ( θ ) = ∑ i = 1 m l n [ ( 1 1 + e − f i ) ( y i + 1 ) / 2 ( 1 1 + e f i ) ( y i − 1 ) / 2 ) ] \underrightarrow{p_i=\frac{1}{1+e^{-f_i}}} l(\theta)=\sum^m_{i=1}ln[(\frac{1}{1+e^{-f_i}})^{(y_i+1)/2}(\frac{1}{1+e^{f_i}})^{(y_{i}-1)/2})] pi=1+e−fi1l(θ)=∑i=1mln[(1+e−fi1)(yi+1)/2(1+efi1)(yi−1)/2)] -
l
o
s
s
(
y
i
,
y
^
i
)
=
−
l
(
θ
)
loss(y_i,\hat y_i)=-l(\theta)
loss(yi,y^i)=−l(θ)
= ∑ i = 1 m [ 1 2 ( y i + 1 ) l n ( 1 + e − f i ) − 1 2 ( y i − 1 ) l n ( 1 + e f i ) ] \sum^m_{i=1}[\frac{1}{2}(y_i+1)ln(1+e^{-f_i})-\frac{1}{2}(y_i-1)ln(1+e^{f_i})] ∑i=1m[21(yi+1)ln(1+e−fi)−21(yi−1)ln(1+efi)]
= { ∑ i = 1 m [ l n ( 1 + e − f i ) ] y i = 1 ∑ i = 1 m [ l n ( 1 + e f i ) ] y i = − 1 ⇒ l o s s ( y i , y i ^ ) = ∑ i = 1 m [ l n ( 1 + e − y i ⋅ f i ) ] \begin{cases}\sum^m_{i=1}[ln(1+e^{-f_i})] &y_i=1\\ \sum^m_{i=1}[ln(1+e^{f_i})] &y_i=-1\end{cases} \Rightarrow loss(y_i,\hat{y_i})=\sum^m_{i=1}[ln(1+e^{-y_i\cdot f_i})] {∑i=1m[ln(1+e−fi)]∑i=1m[ln(1+efi)]yi=1yi=−1⇒loss(yi,yi^)=∑i=1m[ln(1+e−yi⋅fi)]
-
对数线性模型
- 一个事件的机率odds,是指该事件发生的概率与该事件不发生概率的比值
- 对数几率:logit函数
- P ( y = 1 ∣ x ; θ ) = h θ ( x ) P(y=1|x;\theta)=h_\theta(x) P(y=1∣x;θ)=hθ(x)
- P ( y = 0 ∣ x ; θ ) = 1 − h θ ( x ) P(y=0|x;\theta)=1-h_\theta(x) P(y=0∣x;θ)=1−hθ(x)
- l o g i t ( p ) = l o g p 1 − p = l o g h θ ( x ) 1 − h θ ( x ) = θ T x logit(p)=log\frac{p}{1-p}=log\frac{h_\theta(x)}{1-h_\theta(x)}=\theta^Tx logit(p)=log1−pp=log1−hθ(x)hθ(x)=θTx
-
Softmax回归
- K分类
- 第k类的参数为 θ ⃗ k \vec{\theta}_k θk,组成二维矩阵 θ k × n \theta_{k\times n} θk×n
- 概率
- p ( c = k ∣ x ; θ ) = e x p ( θ k T x ) ∑ l = 1 K e x p ( θ l T x ) , k = 1 , 2 , … , K p(c=k|x;\theta)=\frac{exp(\theta^T_kx)}{\sum^K_{l=1}exp(\theta^T_lx)}, k=1,2,…,K p(c=k∣x;θ)=∑l=1Kexp(θlTx)exp(θkTx),k=1,2,…,K
- 似然函数
- L ( θ ) = ∏ i = 1 m ∏ k = 1 K p ( c = k ∣ x ( i ) ; θ ) y k ( i ) = ∏ i = 1 m ∏ k = 1 K ( e x p ( θ k T x ) ∑ l = 1 K e x p ( θ l T x ) ) y k ( i ) L(\theta)=\prod^m_{i=1}\prod^K_{k=1}p(c=k|x^{(i)};\theta)^{y^{(i)}_k}=\prod^m_{i=1}\prod^K_{k=1}(\frac{exp(\theta^T_kx)}{\sum^K_{l=1}exp(\theta^T_lx)})^{y^{(i)}_k} L(θ)=∏i=1m∏k=1Kp(c=k∣x(i);θ)yk(i)=∏i=1m∏k=1K(∑l=1Kexp(θlTx)exp(θkTx))yk(i)
- 对数似然
- J ( θ ) = ∑ k = 1 K y k ⋅ ( θ k T x − l n ∑ l = 1 K e x p ( θ l T x ( i ) ) ) J(\theta)=\sum^K_{k=1}y_k\cdot(\theta^T_kx-ln\sum^K_{l=1}exp(\theta^T_lx^{(i)})) J(θ)=∑k=1Kyk⋅(θkTx−ln∑l=1Kexp(θlTx(i)))
- 随机梯度
- ∂ J ( θ ) ∂ θ k = ( y k − p ( y k ∣ x ; θ ) ) ⋅ x \frac{\partial J(\theta)}{\partial \theta_k}=(y_k-p(y_k|x;\theta))\cdot x ∂θk∂J(θ)=(yk−p(yk∣x;θ))⋅x
- K分类