Logistic Regression
- 目的:分类还是回归?经典的二分类算法!
- 机器学习算法选择:先逻辑回归再用复杂的,能简单还是用简单的
- 逻辑回归的决策边界:可以是非线性的
Sigmoid函数
- 公式: g ( z ) = 1 1 + e − z g(z)=\frac{1}{1+e^{-z}} g(z)=1+e−z1
- 自变量取值为任意实数,值域为[0,1]
- 解释:将任意的输入映射到看[0,1]区间,我们在线性回归中可以得到一个预测值,再将该值映射到Sigmoid函数中,这样就完成了由值到概率的转换,也就是分类任务
- 预测函数:
h
θ
(
x
)
=
g
(
θ
T
x
)
=
1
1
+
e
−
θ
T
x
h_\theta(x)=g(\theta^Tx)=\frac{1}{1+e^{-\theta^Tx}}
hθ(x)=g(θTx)=1+e−θTx1
其中 θ 0 + θ 1 x 1 + , . . . , + θ n x n = ∑ i = 1 n θ i x i = θ T x \theta_0+\theta_1x_1+,...,+\theta_nx_n=\sum_{i=1}^n\theta_ix_i=\theta^Tx θ0+θ1x1+,...,+θnxn=∑i=1nθixi=θTx
- 分类任务:
P
(
y
=
1
∣
x
;
θ
)
=
h
θ
(
x
)
P(y=1|x;\theta)=h_\theta(x)
P(y=1∣x;θ)=hθ(x)
P
(
y
=
0
∣
x
;
θ
)
=
1
−
h
θ
(
x
)
P(y=0|x;\theta)=1-h_\theta(x)
P(y=0∣x;θ)=1−hθ(x)
- 整合: P ( y ∣ x ; θ ) = ( h θ ( x ) ) y ( 1 − h θ ( x ) ) 1 − y P(y|x;\theta)=(h_\theta(x))^y(1-h_\theta(x))^{1-y} P(y∣x;θ)=(hθ(x))y(1−hθ(x))1−y
- 解释:对于二分类任务(0,1),整合后y取0只保留 ( 1 − h θ ( x ) ) 1 − y (1-h_\theta(x))^{1-y} (1−hθ(x))1−y,y取1只保留 ( h θ ( x ) ) y (h_\theta(x))^y (hθ(x))y
Logistic Regression
-
似然函数: L ( θ ) = ∏ i = 1 m P ( y i ∣ x i ; θ ) = ∏ i = 1 m ( h θ ( x i ) ) y i ( 1 − h θ ( x i ) ) 1 − y L(\theta)=\prod_{i=1}^mP(y_i|x_i;\theta)=\prod_{i=1}^m(h_\theta(x_i))^{y^i}(1-h_\theta(x_i))^{1-y} L(θ)=i=1∏mP(yi∣xi;θ)=i=1∏m(hθ(xi))yi(1−hθ(xi))1−y
-
对数似然: l ( θ ) = log L ( θ ) = ∑ i = 1 m ( y i log h θ ( x i ) + ( 1 − y i ) log ( 1 − h θ ( x i ) ) ) l(\theta)=\log L(\theta)=\sum_{i=1}^m(y_i\log h_\theta(x_i)+(1-y_i)\log(1-h_\theta(x_i))) l(θ)=logL(θ)=i=1∑m(yiloghθ(xi)+(1−yi)log(1−hθ(xi)))
-
此时应用梯度上升求最大值,引入 J ( θ ) = − 1 m l ( θ ) J(\theta)=-\frac{1}{m}l(\theta) J(θ)=−m1l(θ)转换为梯度下降任务
-
参数更新: θ j : θ j − α 1 m ∑ i = 1 m ( h θ ( x i ) − y i ) x i j \theta_j:\theta_j-\alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x_i)-y_i)x_i^j θj:θj−αm1i=1∑m(hθ(xi)−yi)xij
-
多分类的softmax: h θ ( x i ) = [ p ( y i = 1 ∣ x i ; θ ) p ( y i = 2 ∣ x i ; θ ) . . . p ( y i = k ∣ x i ; θ ) ] = 1 ∑ j = 1 k e j T x i [ e θ 1 T x i e θ 2 T x i . . . e θ k T x i ] h_\theta(x^i)= \left[ \begin{matrix} p(y^i=1|x^i;\theta) \\ p(y^i=2|x^i;\theta) \\ . \\ . \\ . \\ p(y^i=k|x^i;\theta) \end{matrix} \right] =\frac{1}{\sum_{j=1}^ke_j^{Tx^i}} \left[ \begin{matrix} e^{\theta_1^Tx^i} \\ e^{\theta_2^Tx^i} \\ . \\ . \\ . \\ e^{\theta_k^Tx^i} \end{matrix} \right] hθ(xi)=⎣⎢⎢⎢⎢⎢⎢⎡p(yi=1∣xi;θ)p(yi=2∣xi;θ)...p(yi=k∣xi;θ)⎦⎥⎥⎥⎥⎥⎥⎤=∑j=1kejTxi1⎣⎢⎢⎢⎢⎢⎢⎡eθ1Txieθ2Txi...eθkTxi⎦⎥⎥⎥⎥⎥⎥⎤
-
总结:逻辑回归真的很好很好用!