# 一、Logistic回归简介

Logistic回归是解决二分类问题的分类算法。假设有 m m $m$个训练样本$\left\{\left({\mathbf{x}}^{\left(1\right)},{y}^{\left(1\right)}\right),\left({\mathbf{x}}^{\left(2\right)},{y}^{\left(2\right)}\right),\cdots ,\left({\mathbf{x}}^{\left(m\right)},{y}^{\left(m\right)}\right)\right\}$$\left \{ \left ( \mathbf{x}^{(1)},y^{(1)} \right ),\left ( \mathbf{x}^{(2)},y^{(2)} \right ),\cdots ,\left ( \mathbf{x}^{(m)},y^{(m)} \right ) \right \}$，对于Logistic回归，其输入特征为： x(i)n+1 x ( i ) ∈ ℜ n + 1 $\mathbf{x}^{(i)}\in \Re ^{n+1}$，类标记为： y(i){0,1} y ( i ) ∈ { 0 , 1 } $y^{(i)}\in \left \{ 0,1 \right \}$，假设函数为Sigmoid函数：

hθ(x)=11+eθTx h θ ( x ) = 1 1 + e − θ T x

J(θ)=1mi=1m[y(i)loghθ(x(i))+(1y(i))log(1hθ(x(i)))] J ( θ ) = − 1 m ∑ i = 1 m [ y ( i ) l o g h θ ( x ( i ) ) + ( 1 − y ( i ) ) l o g ( 1 − h θ ( x ( i ) ) ) ]

θjJ(θ)=1mmi=1[y(i)hθ(x(i))θjhθ(x(i))+1y(i)1hθ(x(i))θj(1hθ(x(i)))]=1mmi=1[y(i)hθ(x(i))θjhθ(x(i))1y(i)1hθ(x(i))θjhθ(x(i))]=1mmi=1[(y(i)hθ(x(i))1y(i)1hθ(x(i)))θjhθ(x(i))] ▽ θ j J ( θ ) = − 1 m ∑ i = 1 m [ y ( i ) h θ ( x ( i ) ) ⋅ ▽ θ j h θ ( x ( i ) ) + 1 − y ( i ) 1 − h θ ( x ( i ) ) ⋅ ▽ θ j ( 1 − h θ ( x ( i ) ) ) ] = − 1 m ∑ i = 1 m [ y ( i ) h θ ( x ( i ) ) ⋅ ▽ θ j h θ ( x ( i ) ) − 1 − y ( i ) 1 − h θ ( x ( i ) ) ⋅ ▽ θ j h θ ( x ( i ) ) ] = − 1 m ∑ i = 1 m [ ( y ( i ) h θ ( x ( i ) ) − 1 − y ( i ) 1 − h θ ( x ( i ) ) ) ⋅ ▽ θ j h θ ( x ( i ) ) ]

=1mmi=1[y(i)hθ(x(i))hθ(x(i))(1hθ(x(i)))θjhθ(x(i))]=1mmi=1[y(i)hθ(x(i))hθ(x(i))(1hθ(x(i)))θTx(i)hθ(x(i))θj(θTx(i))] = − 1 m ∑ i = 1 m [ y ( i ) − h θ ( x ( i ) ) h θ ( x ( i ) ) ( 1 − h θ ( x ( i ) ) ) ⋅ ▽ θ j h θ ( x ( i ) ) ] = − 1 m ∑ i = 1 m [ y ( i ) − h θ ( x ( i ) ) h θ ( x ( i ) ) ( 1 − h θ ( x ( i ) ) ) ⋅ ▽ θ T x ( i ) h θ ( x ( i ) ) ⋅ ▽ θ j ( θ T x ( i ) ) ]

θTx(i)hθ(x(i))=hθ(x(i))(1hθ(x(i))) ▽ θ T x ( i ) h θ ( x ( i ) ) = h θ ( x ( i ) ) ( 1 − h θ ( x ( i ) ) )

θj(θTx(i))=x(i)j ▽ θ j ( θ T x ( i ) ) = x j ( i )

θjJ(θ)=1mi=1m[(y(i)hθ(x(i)))x(i)j] ▽ θ j J ( θ ) = − 1 m ∑ i = 1 m [ ( y ( i ) − h θ ( x ( i ) ) ) ⋅ x j ( i ) ]

θj:=θjαθjJ(θ) θ j := θ j − α ▽ θ j J ( θ )

# 二、Softmax回归

## 2.1、Softmax回归简介

Softmax是Logistic回归在多分类上的推广，即类标签 y y $y$的取值大于等于$2$$2$。假设有 m m $m$个训练样本$\left\{\left({\mathbf{x}}^{\left(1\right)},{y}^{\left(1\right)}\right),\left({\mathbf{x}}^{\left(2\right)},{y}^{\left(2\right)}\right),\cdots ,\left({\mathbf{x}}^{\left(m\right)},{y}^{\left(m\right)}\right)\right\}$$\left \{ \left ( \mathbf{x}^{(1)},y^{(1)} \right ),\left ( \mathbf{x}^{(2)},y^{(2)} \right ),\cdots ,\left ( \mathbf{x}^{(m)},y^{(m)} \right ) \right \}$，对于Softmax回归，其输入特征为： x(i)n+1 x ( i ) ∈ ℜ n + 1 $\mathbf{x}^{(i)}\in \Re ^{n+1}$，类标记为： y(i){0,1,k} y ( i ) ∈ { 0 , 1 , ⋯ k } $y^{(i)}\in \left \{ 0,1,\cdots k \right \}$。假设函数为对于每一个样本估计其所属的类别的概率 p(y=jx) p ( y = j ∣ x ) $p\left ( y=j\mid \mathbf{x} \right )$，具体的假设函数为：

hθ(x(i))=p(y(i)=1x(i);θ)p(y(i)=2x(i);θ)p(y(i)=kx(i);θ)=1kj=1eθTjx(i)eθT1x(i)eθT2x(i)eθTkx(i) h θ ( x ( i ) ) = [ p ( y ( i ) = 1 ∣ x ( i ) ; θ ) p ( y ( i ) = 2 ∣ x ( i ) ; θ ) ⋮ p ( y ( i ) = k ∣ x ( i ) ; θ ) ] = 1 ∑ j = 1 k e θ j T x ( i ) [ e θ 1 T x ( i ) e θ 2 T x ( i ) ⋮ e θ k T x ( i ) ]

p(y(i)=jx(i);θ)=eθTjx(i)kl=1eθTlx(i) p ( y ( i ) = j ∣ x ( i ) ; θ ) = e θ j T x ( i ) ∑ l = 1 k e θ l T x ( i )

## 2.2、Softmax回归的代价函数

I{expression}={01 if expression=false if expression=true I { e x p r e s s i o n } = { 0  if  e x p r e s s i o n = f a l s e 1  if  e x p r e s s i o n = t r u e

J(θ)=1m[i=1mj=1kI{y(i)=j}logeθTjx(i)kl=1eθTlx(i)] J ( θ ) = − 1 m [ ∑ i = 1 m ∑ j = 1 k I { y ( i ) = j } l o g e θ j T x ( i ) ∑ l = 1 k e θ l T x ( i ) ]

## 2.3、Softmax回归的求解

θjJ(θ)=1mi=1m[θjj=1kI{y(i)=j}logeθTjx(i)kl=1eθTlx(i)] ▽ θ j J ( θ ) = − 1 m ∑ i = 1 m [ ▽ θ j ∑ j = 1 k I { y ( i ) = j } l o g e θ j T x ( i ) ∑ l = 1 k e θ l T x ( i ) ]

• y(i)=j y ( i ) = j $y^{(i)}=j$，则 I{y(i)=j}=1 I { y ( i ) = j } = 1 $I\left \{ y^{(i)}=j \right \}=1$

θjJ(θ)=1mmi=1[θjlogeθTjx(i)kl=1eθTlx(i)]=1mmi=1[kl=1eθTlx(i)eθTjx(i)eθTjx(i)x(i)kl=1eθTlx(i)eθTjx(i)x(i)eθTjx(i)(kl=1eθTlx(i))2]=1mmi=1[kl=1eθTlx(i)eθTjx(i)kl=1eθTlx(i)x(i)] ▽ θ j J ( θ ) = − 1 m ∑ i = 1 m [ ▽ θ j l o g e θ j T x ( i ) ∑ l = 1 k e θ l T x ( i ) ] = − 1 m ∑ i = 1 m [ ∑ l = 1 k e θ l T x ( i ) e θ j T x ( i ) ⋅ e θ j T x ( i ) ⋅ x ( i ) ⋅ ∑ l = 1 k e θ l T x ( i ) − e θ j T x ( i ) ⋅ x ( i ) ⋅ e θ j T x ( i ) ( ∑ l = 1 k e θ l T x ( i ) ) 2 ] = − 1 m ∑ i = 1 m [ ∑ l = 1 k e θ l T x ( i ) − e θ j T x ( i ) ∑ l = 1 k e θ l T x ( i ) ⋅ x ( i ) ]

• y(i)j y ( i ) ≠ j $y^{(i)}\neq j$，假设 y(i)j y ( i ) ≠ j ′ $y^{(i)}\neq {j}'$，则 I{y(i)=j}=0 I { y ( i ) = j } = 0 $I\left \{ y^{(i)}=j \right \}=0$ I{y(i)=j}=1 I { y ( i ) = j ′ } = 1 $I\left \{ y^{(i)}={j}' \right \}=1$

θjJ(θ)=1mmi=1[θjlogeθTjx(i)kl=1eθTlx(i)]=1mmi=1[kl=1eθTlx(i)eθTjx(i)eθTjx(i)x(i)eθTjx(i)(kl=1eθTlx(i))2]=1mmi=1[eθTjx(i)kl=1eθTlx(i)x(i)] ▽ θ j J ( θ ) = − 1 m ∑ i = 1 m [ ▽ θ j l o g e θ j ′ T x ( i ) ∑ l = 1 k e θ l T x ( i ) ] = − 1 m ∑ i = 1 m [ ∑ l = 1 k e θ l T x ( i ) e θ j ′ T x ( i ) ⋅ − e θ j ′ T x ( i ) ⋅ x ( i ) ⋅ e θ j T x ( i ) ( ∑ l = 1 k e θ l T x ( i ) ) 2 ] = − 1 m ∑ i = 1 m [ − e θ j T x ( i ) ∑ l = 1 k e θ l T x ( i ) ⋅ x ( i ) ]

1mi=1m[x(i)(I{y(i)=j}p(y(i)=jx(i);θ))] − 1 m ∑ i = 1 m [ x ( i ) ( I { y ( i ) = j } − p ( y ( i ) = j ∣ x ( i ) ; θ ) ) ]

θj:=θjαθjJ(θ) θ j := θ j − α ▽ θ j J ( θ )

## 5、Softmax回归中的参数特点

p(y(i)=jx(i);θ)=e(θjψ)Tx(i)kl=1e(θlψ)Tx(i)=eθTjx(i)eψTx(i)kl=1eθTlx(i)eψTx(i)=eθTjx(i)kl=1eθTlx(i) p ( y ( i ) = j ∣ x ( i ) ; θ ) = e ( θ j − ψ ) T x ( i ) ∑ l = 1 k e ( θ l − ψ ) T x ( i ) = e θ j T x ( i ) ⋅ e − ψ T x ( i ) ∑ l = 1 k e θ l T x ( i ) ⋅ e − ψ T x ( i ) = e θ j T x ( i ) ∑ l = 1 k e θ l T x ( i )

λ2i=1kj=0nθ2ij λ 2 ∑ i = 1 k ∑ j = 0 n θ i j 2

J(θ)=1m[i=1mj=1kI{y(i)=j}logeθTjx(i)kl=1eθTlx(i)]+λ2i=1kj=0nθ2ij J ( θ ) = − 1 m [ ∑ i = 1 m ∑ j = 1 k I { y ( i ) = j } l o g e θ j T x ( i ) ∑ l = 1 k e θ l T x ( i ) ] + λ 2 ∑ i = 1 k ∑ j = 0 n θ i j 2

θjJ(θ)=1mi=1m[x(i)(I{y(i)=j}p(y(i)=jx(i);θ))]+λθj ▽ θ j J ( θ ) = − 1 m ∑ i = 1 m [ x ( i ) ( I { y ( i ) = j } − p ( y ( i ) = j ∣ x ( i ) ; θ ) ) ] + λ θ j

## 5、Softmax与Logistic回归的关系

Logistic回归算法是Softmax回归的特征情况，即 k=2 k = 2 $k=2$时的情况，当
k=2 k = 2 $k=2$时，Softmax回归为：

hθ(x)=1eθT1x+eθT2x[eθT1xeθT2x] h θ ( x ) = 1 e θ 1 T x + e θ 2 T x [ e θ 1 T x e θ 2 T x ]

hθ(x)=1e(θ1ψ)Tx+e(θ2ψ)Tx[e(θ1ψ)Txe(θ2ψ)Tx]=11+e(θ2θ1)Txe(θ2θ1)Tx1+e(θ2θ1)Tx=11+e(θ2θ1)Tx111+e(θ2θ1)Tx h θ ( x ) = 1 e ( θ 1 − ψ ) T x + e ( θ 2 − ψ ) T x [ e ( θ 1 − ψ ) T x e ( θ 2 − ψ ) T x ] = [ 1 1 + e ( θ 2 − θ 1 ) T x e ( θ 2 − θ 1 ) T x 1 + e ( θ 2 − θ 1 ) T x ] = [ 1 1 + e ( θ 2 − θ 1 ) T x 1 − 1 1 + e ( θ 2 − θ 1 ) T x ]

## 6、多分类算法和二分类算法的选择

• 是互斥的 –> Softmax回归
• 不是互斥的 –> 多个独立的Logistic回归

# 参考文献

• 点赞 6
• 评论 6
• 分享
x

海报分享

扫一扫，分享海报

• 收藏 1
• 手机看

分享到微信朋友圈

x

扫一扫，手机阅读

• 打赏

打赏

zhiyong_will

你的鼓励将是我创作的最大动力

C币 余额
2C币 4C币 6C币 10C币 20C币 50C币
• 一键三连

点赞Mark关注该博主, 随时了解TA的最新博文