softmax回归详解

基本模型

Softmax 回归是logistic回归是用的一般形式,它将logistic 激活函数推广到C类(C是神经网络模型的输出),而不仅仅是两类,是一种多分类器,如果C = 2,那么Softmax实际上变回了 logistic 回归。

逻辑回归使用的是sigmoid函数,将 w x + b \mathbf wx+b wx+b 的值映射到(0, 1)的区间,输出的结果为样本标签等于1的概率值;而softmax回归采用的是softmax函数,将 w x + b \mathbf wx+b wx+b的值映射到[0, 1]的区间,输出的结果为一个向量,向量里的值为样本属于每个标签的概率值。

如下图所示:

在这里插入图片描述

假设sigmoid模型共有 n n n个输入,记 w i = ( w i 1 , w i 2 , ⋯   , w i n , b ) T , i = 1 , 2 , … , c ; x ( j ) = ( x j 1 , x j 2 , … , x j n , 1 ) , j = 1 , 2 , … , m ; w_i = (w_{i1}, w_{i2} , \cdots , w_{in} ,b )^T,i=1,2,\ldots,c;\quad x^{(j)} = (x_{j1},x_{j2},\ldots,x_{jn},1),j=1,2,\ldots,m; wi=(wi1,wi2,,win,b)T,i=1,2,,c;x(j)=(xj1,xj2,,xjn,1),j=1,2,,m; ,一共 k k k 类,m个样本。

设:
z i = w i x + b i z_i = w_i x + b_i zi=wix+bi

h w ( x ( j ) ) = [ p 1 p 2 ⋮ p c ] = 1 ∑ i = 1 K e z i [ e z 1 e z 2 ⋮ e z k ] h_w(x^{(j)}) = \begin{bmatrix}p_1\\p_2 \\ \vdots \\p_{c} \end{bmatrix} = \frac{1}{\sum_{i=1}^K e^{z_i}} \begin{bmatrix}e^{z_1}\\e^{z_2 } \\ \vdots \\e^{z_k} \end{bmatrix} hw(x(j))=p1p2pc=i=1Kezi1ez1ez2ezk
一共 k k k个类别。上式结果向量中最大值得对应类别为最终类别。

损失函数

softmax分类的损失函数是最小化对数似然函数的负数:
L ( w ) = − log ⁡ P ( y ( i ) ∣ x ( i ) ; w ) = − ∏ k = 1 K log ⁡ ( e z i ∑ j = 1 K e z j ) y k = − ∑ k = 1 K y k log ⁡ ( e z k ∑ j = 1 K e z j ) \begin{aligned} L(w) &= - \log P(y^{(i)}|x^{(i)};w) \\ &= -\prod_{k=1}^{K} \log\left(\frac{e^{z_i}}{\sum_{j=1}^K e^{z_j}} \right)^{y_k} \\&=-\sum_{k=1}^K y_k \log\left(\frac{e^{z_k}}{\sum_{j=1}^K e^{z_j}} \right) \end{aligned} L(w)=logP(y(i)x(i);w)=k=1Klog(j=1Kezjezi)yk=k=1Kyklog(j=1Kezjezk)
注: y k = I { y ( j ) = k } y_k = I\{y^{(j)} = k\} yk=I{y(j)=k} 是指示函数,当 y ( j ) = k y^{(j)} = k y(j)=k,即当第 j j j个样本属于第 k k k个类别时,指示函数为1。 或者理解为:某个样本 x x x对应的标签 y y y为一个向量: y = ( y 1 , y 2 , … , y K ) y=(y_1,y_2,\ldots,y_K) y=(y1,y2,,yK),其中只有一个元素是1,如 y = ( 1 , 0 , … , 0 ) y=(1,0,\ldots,0) y=(1,0,,0)

我们的目标是:
min ⁡ L ( w ) \min L(w) minL(w)

求解最优参数

通过梯度下降法则求解最优参数。

设:某个样本的第 i i i 个输出
s i = e z i ∑ j = 1 K e z j i = 1 , 2 , … , K s_{i} = \frac{e^{z_i}}{\sum_{j=1}^K e^{z_j}} \quad i=1,2,\ldots,K si=j=1Kezjezii=1,2,,K
针对某一个样本:
∂ L ∂ w i = ∂ L ∂ z i ∂ z i ∂ w i ∂ L ∂ b i = ∂ L ∂ z i ∂ z i ∂ b i \begin{aligned} \frac{\partial L}{\partial w_i} &= \frac{\partial L}{\partial z_i} \frac{\partial z_i}{\partial w_i} \\ \frac{\partial L}{\partial b_i} &= \frac{\partial L}{\partial z_i} \frac{\partial z_i}{\partial b_i} \end{aligned} wiLbiL=ziLwizi=ziLbizi
显然:
∂ z i ∂ w i = x ∂ z i ∂ b i = 1 \frac{\partial z_i}{\partial w_i} = x \\ \frac{\partial z_i}{\partial b_i} = 1 wizi=xbizi=1
所以核心问题是求 ∂ L ∂ z i \frac{\partial L}{\partial z_i} ziL
∂ L ∂ z i = ∑ k = 1 K [ ∂ L ∂ s k ∂ s k ∂ z i ] \frac{\partial L}{\partial z_i} = \sum_{k=1}^K \left[ \frac{\partial L}{\partial s_k} \frac{\partial s_k}{\partial z_i} \right] ziL=k=1K[skLzisk]
先求 ∂ L ∂ s k \frac{\partial L}{\partial s_k} skL
∂ L ∂ s k = ∂ ( − ∑ k = 1 K y k log ⁡ s k ) ∂ s k = − y k s k \frac{\partial L}{\partial s_k} = \frac{\partial \left(-\sum_{k=1}^K y_k \log s_k \right)}{\partial s_k} = - \frac{y_k}{s_k} skL=sk(k=1Kyklogsk)=skyk

再求 ∂ s k ∂ z i \frac{\partial s_k}{\partial z_i} zisk :

先来复习一下复合求导:
f ( x ) = g ( x ) h ( x ) f ′ ( x ) = g ′ ( x ) h ( x ) − g ( x ) h ′ ( x ) [ h ( x ) ] 2 f(x) = \frac{g(x)}{h(x)} \\ f'(x) = \frac{g'(x) h(x) - g(x)h'(x)}{[h(x)]^2} f(x)=h(x)g(x)f(x)=[h(x)]2g(x)h(x)g(x)h(x)
所以,分两种情况讨论:

(1)当 k ≠ i k \ne i k=i时,那么:
∂ s k ∂ z i = ∂ e z k ∑ j = 1 K e z j ∂ z i = − e z k ⋅ e z i ( ∑ j = 1 K e z j ) 2 = − e z k ∑ j = 1 K e z j e z i ∑ j = 1 K e z j = − s k s i \begin{aligned} \frac{\partial s_k}{\partial z_i} &= \frac{\partial \frac{e^{z_k}}{\sum_{j=1}^K e^{z_j}} }{\partial z_i} \\ &= \frac{-e^{z_k}\cdot e^{z_i}}{(\sum_{j=1}^K e^{z_j})^2} \\ &=-\frac{e^{z_k}}{\sum_{j=1}^K e^{z_j}} \frac{ e^{z_i}} {\sum_{j=1}^K e^{z_j}} \\ &= -s_k s_i \end{aligned} zisk=zij=1Kezjezk=(j=1Kezj)2ezkezi=j=1Kezjezkj=1Kezjezi=sksi
(2)当 k = i k = i k=i时,那么:
∂ s k ∂ z i = ∂ s i ∂ z i = ∂ e z i ∑ j = 1 K e z j ∂ z i = e z i ∑ j = 1 K e z j − ( e z i ) 2 ( ∑ j = 1 K e z j ) 2 = e z i ∑ j = 1 K e z j ∑ j = 1 K e z j − e z i ∑ j = 1 K e z j = s i ( 1 − s i ) \begin{aligned} \frac{\partial s_k}{\partial z_i} &= \frac{\partial s_i}{\partial z_i} =\frac{\partial \frac{e^{z_i}}{\sum_{j=1}^K e^{z_j}} }{\partial z_i} \\ &= \frac{e^{z_i}\sum_{j=1}^K e^{z_j} - (e^{z_i})^2}{(\sum_{j=1}^K e^{z_j})^2} \\ &=\frac{e^{z_i}}{\sum_{j=1}^K e^{z_j}} \frac{\sum_{j=1}^K e^{z_j} - e^{z_i}} {\sum_{j=1}^K e^{z_j}} \\ &= s_i(1-s_i) \end{aligned} zisk=zisi=zij=1Kezjezi=(j=1Kezj)2ezij=1Kezj(ezi)2=j=1Kezjezij=1Kezjj=1Kezjezi=si(1si)
所以:
∂ L ∂ z i = ∑ k = 1 K [ ∂ L ∂ s k ∂ s k ∂ z i ] = ∑ k = 1 K [ − y k s k ∂ s k ∂ z i ] = − y i s i ∂ s i ∂ z i + ∑ k = 1 , k ≠ i K [ − y k s k ∂ s k ∂ z i ] = − y i s i s i ( 1 − s i ) + ∑ k = 1 , k ≠ i K [ − y k s k ⋅ − s k s l ] = y i ( s i − 1 ) + ∑ k = 1 , k ≠ i K y k s i = − y i + y i s i + ∑ k = 1 , k ≠ i K y k s i = − y i + s i ∑ k = 1 K y k \begin{array}{l} \frac{\partial \mathrm{L}}{\partial \mathrm{z}_{i}}=\sum_{k=1}^{K}\left[\frac{\partial L}{\partial s_{k}} \frac{\partial s_{k}}{\partial z_{i}}\right]=\sum_{k=1}^{K}\left[-\frac{y_{k}}{s_{k}} \frac{\partial s_{k}}{\partial z_{i}}\right] \\ =-\frac{y_{i}}{s_{i}} \frac{\partial s_{i}}{\partial z_{i}}+\sum_{k=1, k \neq i}^{K}\left[-\frac{y_{k}}{s_{k}} \frac{\partial s_{k}}{\partial z_{i}}\right] \\ =-\frac{y_{i}}{s_{i}} s_{i}\left(1-s_{i}\right)+\sum_{k=1, k \neq i}^{K}\left[-\frac{y_{k}}{s_{k}} \cdot-s_{k} s_{l}\right] \\ =y_{i}\left(s_{i}-1\right)+\sum_{k=1, k \neq i}^{K} y_{k} s_{i} \\ =-y_{i}+y_{i} s_{i}+\sum_{k=1, k \neq i}^{K} y_{k} s_{i} \\ =-y_{i}+s_{i} \sum_{k=1}^{K} y_{k} \end{array} ziL=k=1K[skLzisk]=k=1K[skykzisk]=siyizisi+k=1,k=iK[skykzisk]=siyisi(1si)+k=1,k=iK[skyksksl]=yi(si1)+k=1,k=iKyksi=yi+yisi+k=1,k=iKyksi=yi+sik=1Kyk
对于某个样本 x x x对应的标签 y y y为一个向量: y = ( y 1 , y 2 , … , y K ) y=(y_1,y_2,\ldots,y_K) y=(y1,y2,,yK),其中只有一个元素是1,如 y = ( 1 , 0 , … , 0 ) y=(1,0,\ldots,0) y=(1,0,,0) 。所以有: ∑ k = 1 K y k = 1 \sum_{k=1}^{K} y_{k} = 1 k=1Kyk=1,所以:
∂ L ∂ z i = s i − y i \frac{\partial \mathrm{L}}{\partial \mathrm{z}_{i}}= s_i - y_i ziL=siyi
所以最终结果为:
∂ L ∂ w i = ( s i − y i ) x ∂ L ∂ b i = s i − y i \frac{\partial L}{\partial w_i} = (s_i - y_i)x \\ \frac{\partial L}{\partial b_i} = s_i - y_i wiL=(siyi)xbiL=siyi
所以,更新法则如下:
w i = w i − η ( s i − y i ) x b i = b i − η ( s i − y i ) w_i = w_i - \eta (s_i - y_i)x \\ b_i = b_i - \eta (s_i - y_i) \\ wi=wiη(siyi)xbi=biη(siyi)
直至收敛为之。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值