初始做法
在softmax回归中,定义
y ^ = s o f t m a x ( o ) y ^ j = exp ( o j ) ∑ k exp ( o k ) ( 1 ) ( i = 1... n , k = 1... q ) \hat{\mathbf{y}} = \mathrm{softmax}(\mathbf{o})\quad \text\quad \hat{y}_j = \frac{\exp(o_j)}{\sum_k \exp(o_k)} \text\quad (1)\\ (i=1...n,k=1...q) y^=softmax(o)y^j=∑kexp(ok)exp(oj)(1)(i=1...n,k=1...q)
对于任何标签 y y y 和模型预测 y ^ \hat{y} y^ ,损失函数为:
l ( y , y ^ ) = − ∑ j = 1 q y j log y ^ j ( 2 ) l(\mathbf{y}, \hat{\mathbf{y}}) = - \sum_{j=1}^q y_j \log \hat{y}_j \text\quad (2) l(y,y^)=−j=1∑qyjlogy^j(2)
将 ( 1 ) (1) (1) 代入 ( 2 ) (2) (2) 中:
l ( y , y ^ ) = − ∑ j = 1 q y j log exp ( o j ) ∑ k = 1 q exp ( o k ) = ∑ j = 1 q y j log ∑ k = 1 q exp ( o k ) − ∑ j = 1 q y j o j = log ∑ k = 1 q exp ( o k ) − ∑ j = 1 q y j o j . ( 3 ) \begin{split}\begin{aligned} l(\mathbf{y}, \hat{\mathbf{y}}) &= - \sum_{j=1}^q y_j \log \frac{\exp(o_j)}{\sum_{k=1}^q \exp(o_k)} \\ &= \sum_{j=1}^q y_j \log \sum_{k=1}^q \exp(o_k) - \sum_{j=1}^q y_j o_j\\ &= \log \sum_{k=1}^q \exp(o_k) - \sum_{j=1}^q y_j o_j. \end{aligned}\end{split} \text\quad (3) l(y,y^)=−j=1∑qyjlo