损失函数
为了学习神经网络中的参数,我们要为神经网络模型定义损失函数。回想一下,逻辑回归可以将数据分成正例和负例两类,因此它的损失函数为:
J ( θ ) = − 1 n ∑ i = 1 n [ y ( i ) log ( h θ ( x ( i ) ) ) + ( 1 − y ( i ) ) log ( 1 − h θ ( x ( i ) ) ) ] + λ 2 n ∑ j = 1 n θ j 2 J(\theta) = -\frac{1}{n}\sum_{i=1}^n \left[y^{(i)}\log(h_\theta(x^{(i)}) ) + (1-y^{(i)})\log(1-h_\theta(x^{(i)}))\right] + \frac{\lambda}{2n}\sum_{j=1}^n\theta^2_j J(θ)=−n1i=1∑n[y(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i)))]+2nλj=1∑nθj2
而对于神经网络模型,其输出层可能有多个神经元,因此我们需要在逻辑回归损失函数的基础上考虑多个类别的问题。假设模型共有 K K K个输出层神经元,那么其损失函数为:
J ( Θ ) = − 1 n ∑ i = 1 n ∑ k = 1 K [ y k ( i ) log ( ( h Θ ( x ( i ) ) ) k ) + ( 1 − y k ( i ) ) log ( 1 − ( h Θ ( x ( i ) ) ) k ) ] + λ 2 n ∑ l = 1 L − 1 ∑ i = 1 s l ∑ j = 1 s l + 1 ( Θ j , i ( l ) ) 2 \begin{gathered} J(\Theta) = - \frac{1}{n} \sum_{i=1}^n \sum_{k=1}^K \left[y^{(i)}_k \log ((h_\Theta (x^{(i)}))_k) + (1 - y^{(i)}_k)\log (1 - (h_\Theta(x^{(i)}))_k)\right] + \frac{\lambda}{2n}\sum_{l=1}^{L-1} \sum_{i=1}^{s_l} \sum_{j=1}^{s_{l+1}} ( \Theta_{j,i}^{(l)})^2\end{gathered} J(Θ)=−n1i=1∑nk=1∑K[yk(i)log((hΘ(x(i)))k)+(1