# 一、Logistic回归简介

Logistic回归是解决二分类问题的分类算法。假设有$m$$m$个训练样本$\left\{\left({\mathbf{x}}^{\left(1\right)},{y}^{\left(1\right)}\right),\left({\mathbf{x}}^{\left(2\right)},{y}^{\left(2\right)}\right),\cdots ,\left({\mathbf{x}}^{\left(m\right)},{y}^{\left(m\right)}\right)\right\}$$\left \{ \left ( \mathbf{x}^{(1)},y^{(1)} \right ),\left ( \mathbf{x}^{(2)},y^{(2)} \right ),\cdots ,\left ( \mathbf{x}^{(m)},y^{(m)} \right ) \right \}$，对于Logistic回归，其输入特征为：${\mathbf{x}}^{\left(i\right)}\in {\mathrm{\Re }}^{n+1}$$\mathbf{x}^{(i)}\in \Re ^{n+1}$，类标记为：${y}^{\left(i\right)}\in \left\{0,1\right\}$$y^{(i)}\in \left \{ 0,1 \right \}$，假设函数为Sigmoid函数：

${h}_{\theta }\left(x\right)=\frac{1}{1+{e}^{-{\theta }^{T}x}}$

$J\left(\theta \right)=-\frac{1}{m}\sum _{i=1}^{m}\left[{y}^{\left(i\right)}log{h}_{\theta }\left({\mathbf{x}}^{\left(i\right)}\right)+\left(1-{y}^{\left(i\right)}\right)log\left(1-{h}_{\theta }\left({\mathbf{x}}^{\left(i\right)}\right)\right)\right]$

$\begin{array}{c}{▽}_{{\theta }_{j}}J\left(\theta \right)=-\frac{1}{m}\sum _{i=1}^{m}\left[\frac{{y}^{\left(i\right)}}{{h}_{\theta }\left({\mathbf{x}}^{\left(i\right)}\right)}\cdot {▽}_{{\theta }_{j}}{h}_{\theta }\left({\mathbf{x}}^{\left(i\right)}\right)+\frac{1-{y}^{\left(i\right)}}{1-{h}_{\theta }\left({\mathbf{x}}^{\left(i\right)}\right)}\cdot {▽}_{{\theta }_{j}}\left(1-{h}_{\theta }\left({\mathbf{x}}^{\left(i\right)}\right)\right)\right]\\ =-\frac{1}{m}\sum _{i=1}^{m}\left[\frac{{y}^{\left(i\right)}}{{h}_{\theta }\left({\mathbf{x}}^{\left(i\right)}\right)}\cdot {▽}_{{\theta }_{j}}{h}_{\theta }\left({\mathbf{x}}^{\left(i\right)}\right)-\frac{1-{y}^{\left(i\right)}}{1-{h}_{\theta }\left({\mathbf{x}}^{\left(i\right)}\right)}\cdot {▽}_{{\theta }_{j}}{h}_{\theta }\left({\mathbf{x}}^{\left(i\right)}\right)\right]\\ =-\frac{1}{m}\sum _{i=1}^{m}\left[\left(\frac{{y}^{\left(i\right)}}{{h}_{\theta }\left({\mathbf{x}}^{\left(i\right)}\right)}-\frac{1-{y}^{\left(i\right)}}{1-{h}_{\theta }\left({\mathbf{x}}^{\left(i\right)}\right)}\right)\cdot {▽}_{{\theta }_{j}}{h}_{\theta }\left({\mathbf{x}}^{\left(i\right)}\right)\right]\end{array}$

$\begin{array}{c}=-\frac{1}{m}\sum _{i=1}^{m}\left[\frac{{y}^{\left(i\right)}-{h}_{\theta }\left({\mathbf{x}}^{\left(i\right)}\right)}{{h}_{\theta }\left({\mathbf{x}}^{\left(i\right)}\right)\left(1-{h}_{\theta }\left({\mathbf{x}}^{\left(i\right)}\right)\right)}\cdot {▽}_{{\theta }_{j}}{h}_{\theta }\left({\mathbf{x}}^{\left(i\right)}\right)\right]\\ =-\frac{1}{m}\sum _{i=1}^{m}\left[\frac{{y}^{\left(i\right)}-{h}_{\theta }\left({\mathbf{x}}^{\left(i\right)}\right)}{{h}_{\theta }\left({\mathbf{x}}^{\left(i\right)}\right)\left(1-{h}_{\theta }\left({\mathbf{x}}^{\left(i\right)}\right)\right)}\cdot {▽}_{{\theta }^{T}{\mathbf{x}}^{\left(i\right)}}{h}_{\theta }\left({\mathbf{x}}^{\left(i\right)}\right)\cdot {▽}_{{\theta }_{j}}\left({\theta }^{T}{\mathbf{x}}^{\left(i\right)}\right)\right]\end{array}$

${▽}_{{\theta }^{T}{\mathbf{x}}^{\left(i\right)}}{h}_{\theta }\left({\mathbf{x}}^{\left(i\right)}\right)={h}_{\theta }\left({\mathbf{x}}^{\left(i\right)}\right)\left(1-{h}_{\theta }\left({\mathbf{x}}^{\left(i\right)}\right)\right)$

${▽}_{{\theta }_{j}}\left({\theta }^{T}{\mathbf{x}}^{\left(i\right)}\right)={x}_{j}^{\left(i\right)}$

${▽}_{{\theta }_{j}}J\left(\theta \right)=-\frac{1}{m}\sum _{i=1}^{m}\left[\left({y}^{\left(i\right)}-{h}_{\theta }\left({\mathbf{x}}^{\left(i\right)}\right)\right)\cdot {x}_{j}^{\left(i\right)}\right]$

${\theta }_{j}:={\theta }_{j}-\alpha {▽}_{{\theta }_{j}}J\left(\theta \right)$

# 二、Softmax回归

## 2.1、Softmax回归简介

Softmax是Logistic回归在多分类上的推广，即类标签$y$$y$的取值大于等于$2$$2$。假设有$m$$m$个训练样本$\left\{\left({\mathbf{x}}^{\left(1\right)},{y}^{\left(1\right)}\right),\left({\mathbf{x}}^{\left(2\right)},{y}^{\left(2\right)}\right),\cdots ,\left({\mathbf{x}}^{\left(m\right)},{y}^{\left(m\right)}\right)\right\}$$\left \{ \left ( \mathbf{x}^{(1)},y^{(1)} \right ),\left ( \mathbf{x}^{(2)},y^{(2)} \right ),\cdots ,\left ( \mathbf{x}^{(m)},y^{(m)} \right ) \right \}$，对于Softmax回归，其输入特征为：${\mathbf{x}}^{\left(i\right)}\in {\mathrm{\Re }}^{n+1}$$\mathbf{x}^{(i)}\in \Re ^{n+1}$，类标记为：${y}^{\left(i\right)}\in \left\{0,1,\cdots k\right\}$$y^{(i)}\in \left \{ 0,1,\cdots k \right \}$。假设函数为对于每一个样本估计其所属的类别的概率$p\left(y=j\mid \mathbf{x}\right)$$p\left ( y=j\mid \mathbf{x} \right )$，具体的假设函数为：

${h}_{\theta }\left({\mathbf{x}}^{\left(i\right)}\right)=\left[\begin{array}{c}p\left({y}^{\left(i\right)}=1\mid {\mathbf{x}}^{\left(i\right)};\theta \right)\\ p\left({y}^{\left(i\right)}=2\mid {\mathbf{x}}^{\left(i\right)};\theta \right)\\ ⋮\\ p\left({y}^{\left(i\right)}=k\mid {\mathbf{x}}^{\left(i\right)};\theta \right)\end{array}\right]=\frac{1}{\sum _{j=1}^{k}{e}^{{\theta }_{j}^{T}{\mathbf{x}}^{\left(i\right)}}}\left[\begin{array}{c}{e}^{{\theta }_{1}^{T}{\mathbf{x}}^{\left(i\right)}}\\ {e}^{{\theta }_{2}^{T}{\mathbf{x}}^{\left(i\right)}}\\ ⋮\\ {e}^{{\theta }_{k}^{T}{\mathbf{x}}^{\left(i\right)}}\end{array}\right]$

$p\left({y}^{\left(i\right)}=j\mid {\mathbf{x}}^{\left(i\right)};\theta \right)=\frac{{e}^{{\theta }_{j}^{T}{\mathbf{x}}^{\left(i\right)}}}{\sum _{l=1}^{k}{e}^{{\theta }_{l}^{T}{\mathbf{x}}^{\left(i\right)}}}$

## 2.2、Softmax回归的代价函数

$J\left(\theta \right)=-\frac{1}{m}\left[\sum _{i=1}^{m}\sum _{j=1}^{k}I\left\{{y}^{\left(i\right)}=j\right\}log\frac{{e}^{{\theta }_{j}^{T}{\mathbf{x}}^{\left(i\right)}}}{\sum _{l=1}^{k}{e}^{{\theta }_{l}^{T}{\mathbf{x}}^{\left(i\right)}}}\right]$

## 2.3、Softmax回归的求解

${▽}_{{\theta }_{j}}J\left(\theta \right)=-\frac{1}{m}\sum _{i=1}^{m}\left[{▽}_{{\theta }_{j}}\sum _{j=1}^{k}I\left\{{y}^{\left(i\right)}=j\right\}log\frac{{e}^{{\theta }_{j}^{T}{\mathbf{x}}^{\left(i\right)}}}{\sum _{l=1}^{k}{e}^{{\theta }_{l}^{T}{\mathbf{x}}^{\left(i\right)}}}\right]$

• ${y}^{\left(i\right)}=j$$y^{(i)}=j$，则$I\left\{{y}^{\left(i\right)}=j\right\}=1$$I\left \{ y^{(i)}=j \right \}=1$

$\begin{array}{c}{▽}_{{\theta }_{j}}J\left(\theta \right)=-\frac{1}{m}\sum _{i=1}^{m}\left[{▽}_{{\theta }_{j}}log\frac{{e}^{{\theta }_{j}^{T}{\mathbf{x}}^{\left(i\right)}}}{\sum _{l=1}^{k}{e}^{{\theta }_{l}^{T}{\mathbf{x}}^{\left(i\right)}}}\right]\\ =-\frac{1}{m}\sum _{i=1}^{m}\left[\frac{\sum _{l=1}^{k}{e}^{{\theta }_{l}^{T}{\mathbf{x}}^{\left(i\right)}}}{{e}^{{\theta }_{j}^{T}{\mathbf{x}}^{\left(i\right)}}}\cdot \frac{{e}^{{\theta }_{j}^{T}{\mathbf{x}}^{\left(i\right)}}\cdot {\mathbf{x}}^{\left(i\right)}\cdot \sum _{l=1}^{k}{e}^{{\theta }_{l}^{T}{\mathbf{x}}^{\left(i\right)}}-{e}^{{\theta }_{j}^{T}{\mathbf{x}}^{\left(i\right)}}\cdot {\mathbf{x}}^{\left(i\right)}\cdot {e}^{{\theta }_{j}^{T}{\mathbf{x}}^{\left(i\right)}}}{{\left(\sum _{l=1}^{k}{e}^{{\theta }_{l}^{T}{\mathbf{x}}^{\left(i\right)}}\right)}^{2}}\right]\\ =-\frac{1}{m}\sum _{i=1}^{m}\left[\frac{\sum _{l=1}^{k}{e}^{{\theta }_{l}^{T}{\mathbf{x}}^{\left(i\right)}}-{e}^{{\theta }_{j}^{T}{\mathbf{x}}^{\left(i\right)}}}{\sum _{l=1}^{k}{e}^{{\theta }_{l}^{T}{\mathbf{x}}^{\left(i\right)}}}\cdot {\mathbf{x}}^{\left(i\right)}\right]\end{array}$

• ${y}^{\left(i\right)}\ne j$$y^{(i)}\neq j$，假设${y}^{\left(i\right)}\ne {j}^{\prime }$$y^{(i)}\neq {j}'$，则$I\left\{{y}^{\left(i\right)}=j\right\}=0$$I\left \{ y^{(i)}=j \right \}=0$$I\left\{{y}^{\left(i\right)}={j}^{\prime }\right\}=1$$I\left \{ y^{(i)}={j}' \right \}=1$

$\begin{array}{c}{▽}_{{\theta }_{j}}J\left(\theta \right)=-\frac{1}{m}\sum _{i=1}^{m}\left[{▽}_{{\theta }_{j}}log\frac{{e}^{{\theta }_{{j}^{\prime }}^{T}{\mathbf{x}}^{\left(i\right)}}}{\sum _{l=1}^{k}{e}^{{\theta }_{l}^{T}{\mathbf{x}}^{\left(i\right)}}}\right]\\ =-\frac{1}{m}\sum _{i=1}^{m}\left[\frac{\sum _{l=1}^{k}{e}^{{\theta }_{l}^{T}{\mathbf{x}}^{\left(i\right)}}}{{e}^{{\theta }_{{j}^{\prime }}^{T}{\mathbf{x}}^{\left(i\right)}}}\cdot \frac{-{e}^{{\theta }_{{j}^{\prime }}^{T}{\mathbf{x}}^{\left(i\right)}}\cdot {\mathbf{x}}^{\left(i\right)}\cdot {e}^{{\theta }_{j}^{T}{\mathbf{x}}^{\left(i\right)}}}{{\left(\sum _{l=1}^{k}{e}^{{\theta }_{l}^{T}{\mathbf{x}}^{\left(i\right)}}\right)}^{2}}\right]\\ =-\frac{1}{m}\sum _{i=1}^{m}\left[-\frac{{e}^{{\theta }_{j}^{T}{\mathbf{x}}^{\left(i\right)}}}{\sum _{l=1}^{k}{e}^{{\theta }_{l}^{T}{\mathbf{x}}^{\left(i\right)}}}\cdot {\mathbf{x}}^{\left(i\right)}\right]\end{array}$

$-\frac{1}{m}\sum _{i=1}^{m}\left[{\mathbf{x}}^{\left(i\right)}\left(I\left\{{y}^{\left(i\right)}=j\right\}-p\left({y}^{\left(i\right)}=j\mid {\mathbf{x}}^{\left(i\right)};\theta \right)\right)\right]$

${\theta }_{j}:={\theta }_{j}-\alpha {▽}_{{\theta }_{j}}J\left(\theta \right)$

## 5、Softmax回归中的参数特点

$\begin{array}{c}p\left({y}^{\left(i\right)}=j\mid {\mathbf{x}}^{\left(i\right)};\theta \right)=\frac{{e}^{\left({\theta }_{j}-\psi {\right)}^{T}{\mathbf{x}}^{\left(i\right)}}}{\sum _{l=1}^{k}{e}^{\left({\theta }_{l}-\psi {\right)}^{T}{\mathbf{x}}^{\left(i\right)}}}\\ =\frac{{e}^{{\theta }_{j}^{T}{\mathbf{x}}^{\left(i\right)}}\cdot {e}^{-{\psi }^{T}{\mathbf{x}}^{\left(i\right)}}}{\sum _{l=1}^{k}{e}^{{\theta }_{l}^{T}{\mathbf{x}}^{\left(i\right)}}\cdot {e}^{-{\psi }^{T}{\mathbf{x}}^{\left(i\right)}}}\\ =\frac{{e}^{{\theta }_{j}^{T}{\mathbf{x}}^{\left(i\right)}}}{\sum _{l=1}^{k}{e}^{{\theta }_{l}^{T}{\mathbf{x}}^{\left(i\right)}}}\end{array}$

$\frac{\lambda }{2}\sum _{i=1}^{k}\sum _{j=0}^{n}{\theta }_{ij}^{2}$

$J\left(\theta \right)=-\frac{1}{m}\left[\sum _{i=1}^{m}\sum _{j=1}^{k}I\left\{{y}^{\left(i\right)}=j\right\}log\frac{{e}^{{\theta }_{j}^{T}{\mathbf{x}}^{\left(i\right)}}}{\sum _{l=1}^{k}{e}^{{\theta }_{l}^{T}{\mathbf{x}}^{\left(i\right)}}}\right]+\frac{\lambda }{2}\sum _{i=1}^{k}\sum _{j=0}^{n}{\theta }_{ij}^{2}$

$▽{\theta }_{j}J\left(\theta \right)=-\frac{1}{m}\sum _{i=1}^{m}\left[{\mathbf{x}}^{\left(i\right)}\left(I\left\{{y}^{\left(i\right)}=j\right\}-p\left({y}^{\left(i\right)}=j\mid {\mathbf{x}}^{\left(i\right)};\theta \right)\right)\right]+\lambda {\theta }_{j}$

## 5、Softmax与Logistic回归的关系

Logistic回归算法是Softmax回归的特征情况，即$k=2$$k=2$时的情况，当
$k=2$$k=2$时，Softmax回归为：

${h}_{\theta }\left(x\right)=\frac{1}{{e}^{{\theta }_{1}^{T}x}+{e}^{{\theta }_{2}^{T}x}}\left[\begin{array}{c}{e}^{{\theta }_{1}^{T}x}\\ {e}^{{\theta }_{2}^{T}x}\end{array}\right]$

$\begin{array}{c}{h}_{\theta }\left(\mathbf{x}\right)=\frac{1}{{e}^{\left({\theta }_{1}-\psi {\right)}^{T}\mathbf{x}}+{e}^{\left({\theta }_{2}-\psi {\right)}^{T}\mathbf{x}}}\left[\begin{array}{c}{e}^{\left({\theta }_{1}-\psi {\right)}^{T}\mathbf{x}}\\ {e}^{\left({\theta }_{2}-\psi {\right)}^{T}\mathbf{x}}\end{array}\right]\\ =\left[\begin{array}{c}\frac{1}{1+{e}^{\left({\theta }_{2}-{\theta }_{1}{\right)}^{T}\mathbf{x}}}\\ \frac{{e}^{\left({\theta }_{2}-{\theta }_{1}{\right)}^{T}\mathbf{x}}}{1+{e}^{\left({\theta }_{2}-{\theta }_{1}{\right)}^{T}\mathbf{x}}}\end{array}\right]\\ =\left[\begin{array}{c}\frac{1}{1+{e}^{\left({\theta }_{2}-{\theta }_{1}{\right)}^{T}\mathbf{x}}}\\ 1-\frac{1}{1+{e}^{\left({\theta }_{2}-{\theta }_{1}{\right)}^{T}\mathbf{x}}}\end{array}\right]\end{array}$

## 6、多分类算法和二分类算法的选择

• 是互斥的 –> Softmax回归
• 不是互斥的 –> 多个独立的Logistic回归