逻辑回归
-
逻辑回归属于一个二分类器
- sigmoid函数 s=
σ
(
ω
T
x
+
b
)
\sigma(\omega ^T x+b)
σ(ωTx+b)=
σ
(
ω
1
x
1
+
ω
2
x
2
+
ω
3
x
3...
ω
n
x
n
+
b
)
\sigma(\omega 1x1+\omega 2x2+\omega 3x3...\omega nxn+b)
σ(ω1x1+ω2x2+ω3x3...ωnxn+b)=
σ
(
z
)
\sigma(z)
σ(z) =
1
1
+
e
−
z
\frac1 {1+e^{-z}}
1+e−z1
- sigmoid函数 s=
σ
(
ω
T
x
+
b
)
\sigma(\omega ^T x+b)
σ(ωTx+b)=
σ
(
ω
1
x
1
+
ω
2
x
2
+
ω
3
x
3...
ω
n
x
n
+
b
)
\sigma(\omega 1x1+\omega 2x2+\omega 3x3...\omega nxn+b)
σ(ω1x1+ω2x2+ω3x3...ωnxn+b)=
σ
(
z
)
\sigma(z)
σ(z) =
1
1
+
e
−
z
\frac1 {1+e^{-z}}
1+e−z1
-
如果sigmoid函数 e − z e^{-z} e−z z越大结果越趋近于1 ,z越小越趋近于0
-
损失函数 L ( y ^ − y ) = 1 2 ( y ^ − y ) 2 L(\hat{y}-y)=\frac12(\hat{y}-y)^2 L(y^−y)=21(y^−y)2 会陷入多个局部最小点因此逻辑回归不用它
-
逻辑回归损失函数一般用 L ( y ^ − y ) = − ( y l o g y ^ ) − ( 1 − y ) l o g ( 1 − y ^ ) L(\hat{y}-y)=-(ylog\hat{y})-(1-y)log(1-\hat{y}) L(y^−y)=−(ylogy^)−(1−y)log(1−y^)
- 如果y=1,损失为-log y ^ \hat{y} y^,那么想要损失值越小 y ^ \hat{y} y^的值必须越大 即越趋近或者等于1
- 如果y=0,损失为-log y ^ \hat{y} y^,那么想要损失值越小 y ^ \hat{y} y^的值必须越小 即越趋近或者等于0
- 因为数据一般是多维的应该求所有损失值的平均值,训练全体样本的公式为 J ( w , b ) = 1 m ∑ i = 1 m L ( y ^ ( i ) , y ( i ) ) J(w,b)=\frac1m\sum_{i=1}^m{L(\hat{y}^{(i)},y^{(i)})} J(w,b)=m1∑i=1mL(y^(i),y(i))
-
二分类问题
p ( y = 1 ∣ x , w ) = 1 1 + e ( − w T x + b ) p(y=1|x,w)=\frac{1}{1+e^(-w^Tx+b)} p(y=1∣x,w)=1+e(−wTx+b)1p ( y = 0 ∣ x , w ) = e − w T x + b 1 + e − w T x + b p(y=0|x,w)=\frac{e^{-w^Tx+b}}{1+e^{-w^Tx+b}} p(y=0∣x,w)=1+e−wTx+be−wTx+b
-
对两个公式进行合并
p ( y ∣ x , w ) = p ( y = 1 ∣ x , w ) y [ 1 − p ( y = 0 ∣ x , w ) 1 − y ] p(y|x,w)=p(y=1|x,w)^y[1-p(y=0|x,w)^{1-y}] p(y∣x,w)=p(y=1∣x,w)y[1−p(y=0∣x,w)1−y] -
证明逻辑回归是线性分类器
e − w T x + b = 1 e^{-w^Tx+b}=1 e−wTx+b=1 两边同时加上log得出 -wx+b=0所以为线性分类- 最大释然估计,最大化目标函数,进行逻辑回归目标函数的推导
w ^ M L E , w ^ M L E = a r g m a x ∏ i = 1 n p ( y i ∣ x i , w , b ) \hat{w}_{MLE},\hat{w}_{MLE}=argmax\prod_{i=1}^n{p(y_i|x_i,w,b)} w^MLE,w^MLE=argmax∏i=1np(yi∣xi,w,b)
= a r g m a x l o g ( ∏ i = 1 n p ( y i ∣ x i , w , b ) ) =argmax log(\prod_{i=1}^n{p(y_i|x_i,w,b)}) =argmaxlog(∏i=1np(yi∣xi,w,b))
= a r g m a x ∑ i = 1 n l o g p ( y i ∣ x i , w , b ) =argmax\sum_{i=1}^nlog p(y_i|x_i,w,b) =argmax∑i=1nlogp(yi∣xi,w,b)
= a r g m i n − ∑ i = 1 n l o g p ( y i ∣ x i , w , b ) =argmin -\sum_{i=1}^nlog p(y_i|x_i,w,b) =argmin−∑i=1nlogp(yi∣xi,w,b)
= a r g m i n − ∑ i = 1 n l o g [ p ( y i = 1 ∣ x i , w , b ) y ] ∗ [ 1 − p ( y i = 1 ∣ x i , w , b ) 1 − y ] =argmin -\sum_{i=1}^nlog [p(y_i=1|x_i,w,b)^y]*[1-p(y_i=1|x_i,w,b)^{1-y}] =argmin−∑i=1nlog[p(yi=1∣xi,w,b)y]∗[1−p(yi=1∣xi,w,b)1−y]
= a r g m i n − ∑ i = 1 n l o g [ y ∗ p ( y i = 1 ∣ x i , w , b ) ] + ( 1 − y ) ∗ [ 1 − p ( y i = 1 ∣ x i , w , b ) ] =argmin -\sum_{i=1}^nlog [y*p(y_i=1|x_i,w,b)]+(1-y)*[1-p(y_i=1|x_i,w,b)] =argmin−∑i=1nlog[y∗p(yi=1∣xi,w,b)]+(1−y)∗[1−p(yi=1∣xi,w,b)]
= a r g m i n − ∑ i = 1 n l o g y ∗ σ ( ω T x + b ) + ( 1 − y ) ∗ [ 1 − σ ( ω T x + b ) ] =argmin -\sum_{i=1}^nlog y*\sigma(\omega ^T x+b)+(1-y)*[1-\sigma(\omega ^T x+b)] =argmin−∑i=1nlogy∗σ(ωTx+b)+(1−y)∗[1−σ(ωTx+b)] - 进行求导 推出梯度下降
θ ( ω , b ) θ ω = − ∑ i = 1 n l o g y ∗ σ ( ω T x + b ) ∗ [ 1 − σ ( ω T x + b ) ] σ ( ω T x + b ) ∗ x i + ( 1 − y ) ∗ − σ ( ω T x + b ) ∗ [ 1 − σ ( ω T x + b ) ] 1 − σ ( ω T x + b ) ∗ x i \frac{\theta(\omega ,b)}{\theta_\omega}=-\sum_{i=1}^nlog y*\frac{\sigma(\omega ^T x+b)*[1-\sigma(\omega ^T x+b)]}{\sigma(\omega ^T x+b)}*x_i+(1-y)*\frac{-\sigma(\omega ^T x+b)*[1-\sigma(\omega ^T x+b)]}{1-\sigma(\omega ^T x+b)}*x_i θωθ(ω,b)=−∑i=1nlogy∗σ(ωTx+b)σ(ωTx+b)∗[1−σ(ωTx+b)]∗xi+(1−y)∗1−σ(ωTx+b)−σ(ωTx+b)∗[1−σ(ωTx+b)]∗xi
= − ∑ i = 1 n y i ( 1 − σ ( ω T x + b ) ) x i + ( y − 1 ) ∗ σ ( ω T x + b ) ) x i =-\sum_{i=1}^ny_i(1-\sigma(\omega ^T x+b))x_i+(y-1)*\sigma(\omega ^T x+b))x_i =−∑i=1nyi(1−σ(ωTx+b))xi+(y−1)∗σ(ωTx+b))xi
= − ∑ i = 1 n y i − σ ( ω T x + b ) x i =-\sum_{i=1}^n y_i-\sigma(\omega ^T x+b)x_i =−∑i=1nyi−σ(ωTx+b)xi
= ∑ i = 1 n [ σ ( ω T x + b ) − y i ] x i =\sum_{i=1}^n[\sigma(\omega ^T x+b)-y_i]x_i =∑i=1n[σ(ωTx+b)−yi]xi
θ ( ω , b ) θ b = − ∑ i = 1 n l o g y ∗ σ ( ω T x + b ) ∗ [ 1 − σ ( ω T x + b ) ] σ ( ω T x + b ) + ( 1 − y ) ∗ − σ ( ω T x + b ) ∗ [ 1 − σ ( ω T x + b ) ] 1 − σ ( ω T x + b ) \frac{\theta(\omega ,b)}{\theta_b}=-\sum_{i=1}^nlog y*\frac{\sigma(\omega ^T x+b)*[1-\sigma(\omega ^T x+b)]}{\sigma(\omega ^T x+b)}+(1-y)*\frac{-\sigma(\omega ^T x+b)*[1-\sigma(\omega ^T x+b)]}{1-\sigma(\omega ^T x+b)} θbθ(ω,b)=−∑i=1nlogy∗σ(ωTx+b)σ(ωTx+b)∗[1−σ(ωTx+b)]+(1−y)∗1−σ(ωTx+b)−σ(ωTx+b)∗[1−σ(ωTx+b)]
= − ∑ i = 1 n y i ( 1 − σ ( ω T x + b ) ) + ( y − 1 ) ∗ σ ( ω T x + b ) ) =-\sum_{i=1}^ny_i(1-\sigma(\omega ^T x+b))+(y-1)*\sigma(\omega ^T x+b)) =−∑i=1nyi(1−σ(ωTx+b))+(y−1)∗σ(ωTx+b))
= − ∑ i = 1 n y i − σ ( ω T x + b ) =-\sum_{i=1}^n y_i-\sigma(\omega ^T x+b) =−∑i=1nyi−σ(ωTx+b)
= ∑ i = 1 n [ σ ( ω T x + b ) − y i ] =\sum_{i=1}^n[\sigma(\omega ^T x+b)-y_i] =∑i=1n[σ(ωTx+b)−yi]- 梯度下降
初始化 w ‘ , b ‘ w^`,b^` w‘,b‘
for t=1,2,3…
w ‘ = w t − θ ∑ i = 1 n [ σ ( ω T x + b ) − y i ] x i w^`=w^t-\theta\sum_{i=1}^n[\sigma(\omega ^T x+b)-y_i]x_i w‘=wt−θ∑i=1n[σ(ωTx+b)−yi]xi
b ‘ = w t − θ ∑ i = 1 n [ σ ( ω T x + b ) − y i ] b^`=w^t-\theta\sum_{i=1}^n[\sigma(\omega ^T x+b)-y_i] b‘=wt−θ∑i=1n[σ(ωTx+b)−yi]
- 最大释然估计,最大化目标函数,进行逻辑回归目标函数的推导