# Pytorch中18种损失函数的数学原理

66 篇文章 82 订阅

x x 表示输出序列， y y 表示目标序列， rms ⁡ \operatorname{rms} 表示均方根， L = { l 1 ⋯ l n } L=\{l_1\cdots l_n\}

1 L1Loss

L 1 = ∣ x − y ∣ L_1=|x-y|

2 MSELoss

L = rms ⁡ ( L 1 ) L=\operatorname{rms}(L_1)

3 CrossEntropyLoss

L = ∑ j y j ∑ i y i ( − log ⁡ e x j ∑ i e x i ) L =\sum_j \frac{y_j}{\sum_i y_i}(-\log\frac{e^{x_j}}{\sum_i e^{x_i}})

H ( x ) = P ( x ) I ( x ) + P ( x ˉ ) I ( x ˉ ) = − p log ⁡ p − ( 1 − p ) log ⁡ ( 1 − p ) \begin{aligned} H(x) &= P(x)I(x)+P(\bar x)I(\bar x)\\ &= -p\log p-(1-p)\log(1-p) \end{aligned}

H ′ ( p ) = − ln ⁡ p − p 1 p ln ⁡ a + ln ⁡ ( 1 − p ) − ( 1 − p ) − 1 ( 1 − p ) ln ⁡ a = − ln ⁡ p − 1 ln ⁡ a + ln ⁡ ( 1 − p ) − ( − 1 ln ⁡ a ) = ln ⁡ 1 − p p \begin{aligned} H'(p)&=-\ln p-p\frac{1}{p\ln a}+\ln(1-p)-(1-p)\frac{-1}{(1-p)\ln a}\\ &=-\ln p-\frac{1}{\ln a}+\ln(1-p)-(-\frac{1}{\ln a})\\ &=\ln\frac{1-p}{p} \end{aligned}

H ( X ) = ∑ − p i log ⁡ p i , ∑ p i = 1 H(X)=\sum -p_i\log p_i,\quad \sum p_i=1

D K L = ∑ p i ( log ⁡ p i − log ⁡ q i ) D_{KL}=\sum p_i(\log p_i- \log q_i)

D K L = ∑ p i log ⁡ p i − ∑ p i log ⁡ q i D_{KL}=\sum p_i\log p_i- \sum p_i\log q_i

H ( p , q ) = − ∑ p i log ⁡ q i H(p,q)=- \sum p_i\log q_i

4 KLDivLoss

5 BCELoss

H ( p , q ) = − ∑ p i log ⁡ q i H(p,q)=- \sum p_i\log q_i

− y ln ⁡ y ^ − ( 1 − y ) ln ⁡ ( 1 − y ^ ) -y\ln\hat y-(1-y)\ln(1-\hat y)

L B S E = ∑ i − y i ln ⁡ x i − ( 1 − y i ) ln ⁡ ( 1 − x i ) L_{BSE}=\sum_i-y_i\ln x_i-(1-y_i)\ln(1-x_i)

i i y i y_i x i x_i
010.8
100.1
200.1
310.9

6 BCEWithLogits

L = ∑ i − y i ln ⁡ σ ( x i ) − ( 1 − y i ) ln ⁡ ( 1 − σ ( x i ) ) L=\sum_i-y_i\ln\sigma(x_i)-(1-y_i)\ln(1-\sigma(x_i))

7 MarginRanking

L ( x 1 , x 2 , y ) = max ⁡ ( 0 , − y ⋅ ( x 1 − x 2 ) + m ⁡ ) L(x_1,x_2,y)=\max(0,-y\cdot(x_1-x_2)+\operatorname{m})

y y 可取值1或者-1，从而上式变为

L 1 ( x 1 , x 2 ) = max ⁡ ( 0 , x 2 − x 1 + m ⁡ ) L − 1 ( x 1 , x 2 ) = max ⁡ ( 0 , x 1 − x 2 + m ⁡ ) L_1(x_1,x_2)=\max(0,x_2-x_1+\operatorname{m})\\ L_{-1}(x_1,x_2)=\max(0,x_1-x_2+\operatorname{m})

8 HingeEmbedding

Hinge损失也针对二分类的情况，对于标签 y n y_n ，可以取值为1或者-1，则损失为

l n = { x n , y n = 1 max ⁡ 0 , Δ − x ) n , y n = − 1 l_n=\left\{\begin{aligned} &x_n,&y_n&=1\\ &\max{0,\Delta-x)n},&y_n&=-1 \end{aligned} \right.

9 MultiLabelMargin

L ( x , y ) = ∑ i , j max ⁡ ( 0 , 1 − ( x [ y j ] − x [ i ] ) ) x . s i z e [ 0 ] L(x,y)=\sum_{i,j}\frac{\max(0,1-(x[y_j]-x[i]))}{x.size[0]}

10 HuberLoss

Huber损失结合了L1和MSE损失的优点，

l n = { 0.5 ( x n − y n ) 2 , if ∣ x n − y n ∣ < δ δ ( ∣ x n − y n ∣ − 0.5 δ ) l_n=\left\{\begin{aligned} &0.5(x_n-y_n)^2,\quad \text{if} \vert x_n-y_n \vert < \delta\\ &\delta(\vert x_n-y_n \vert-0.5\delta) \end{aligned}\right.

δ → ∞ \delta\to\infty 时，即退化为MSELoss。

11 SmoothL1

l n = { 0.5 ( x n − y n ) 2 / β , if ∣ x n − y n ∣ < β ∣ x n − y n ∣ − 0.5 β l_n=\left\{\begin{aligned} &0.5(x_n-y_n)^2/\beta,\quad \text{if}\quad\vert x_n-y_n \vert < \beta\\ &\vert x_n-y_n \vert-0.5\beta \end{aligned}\right.

12 SoftMargin

L ( x , y ) = 1 N ∑ i log ⁡ [ 1 + exp ⁡ ( − x i y i ) ] L(x,y)=\frac{1}{N}\sum_i\log[1+\exp(-x_iy_i)]

13 MultiLabelSoftMargin

L ( x , y ) = − 1 C ∑ i y i log ⁡ 1 1 + exp ⁡ ( − x i ) + ( 1 − y i ) log ⁡ exp ⁡ ( − x i ) 1 + exp ⁡ ( − x i ) L(x,y)=-\frac{1}{C}\sum_i y_i\log\frac{1}{1+\exp(-x_i)}+(1-y_i)\log\frac{\exp(-x_i)}{1+\exp(-x_i)}

14 CosinieEmbedding

L ( x , y ) = { 1 − cos ⁡ ( x 1 , x 2 ) , if y = 1 max ⁡ ( 0 , cos ⁡ ( x 1 , x 2 ) − M ) if y = − 1 L(x,y)=\left\{\begin{aligned} &1-\cos(x_1,x_2),&\text{if}&\quad y=1\\ &\max(0,\cos(x_1,x_2)-M)&\text{if}&\quad y=-1 \end{aligned}\right.

cos ⁡ ( x , y ) \cos(x,y) 为余弦距离，表达式为

cos ⁡ ( x , y ) = ∑ i x i y i ∑ i x i 2 ∑ i y i 2 \cos(x,y)=\frac{\sum_i{x_iy_i}}{\sqrt{\sum_ix_i^2}\sqrt{\sum_iy_i^2}}

15 MultiMargin

L ( x , y ) = ∑ i ≠ y max ⁡ ( 0 , M − x y + x i ) p N L(x,y)=\frac{\sum_{i\not=y}\max(0,M-x_y+x_i)^p}{N}

16 TripletMargin

L ( α , β , γ ) = max ⁡ ( ∥ α i − β i ∥ p − ∥ α i − γ i ∥ p + m , 0 ) L(\alpha,\beta,\gamma)=\max\big(\Vert \alpha_i-\beta_i\Vert_p-\Vert\alpha_i-\gamma_i\Vert_p+m,0\big)

17 CTC

L ( s ) = − ln ⁡ Π ( x , z ) ∈ S P ( z ∣ x ) = − ∑ ( x , z ) ∈ S ln ⁡ P ( z ∣ x ) L(s)=-\ln\Pi_{(x,z)\in S}P(z|x)=-\sum_{(x,z)\in S}\ln P(z|x)

x x 为输入，记 y k t y_k^t t t 输出 k k 的概率， π t \pi_t 表示路径 π \pi t t 时刻的值。若 y y 在不同时刻是互相独立的，则输入 x x 输出 π \pi 路径的概率为

p ( π ∣ x ) = ∏ t = 1 T y π t t , ∀ π ∈ L ′ T p(\pi|x)=\prod^T_{t=1}y^t_{\pi_t},\forall\pi\in L'^T

z z 表示最终的标签，则

p ( z ∣ x ) = ∑ π ∈ B − 1 ( z ) p ( π ∣ x ) , B ( π ) = z p(z|x)=\sum_{\pi\in B^{-1}(z)}p(\pi|x),\quad B(\pi)=z

18 NLL

L = ∑ − log ⁡ x [ y ] L=\sum-\log x[y]

softmax是一种概率归一化方法，定义为

S ( x i ) = exp ⁡ x i ∑ j exp ⁡ x j S(x_i)=\frac{\exp x_i}{\sum_j\exp x_j}

L S ( x i ) = log ⁡ exp ⁡ x i ∑ j exp ⁡ x j LS(x_i)=\log\frac{\exp x_i}{\sum_j\exp x_j}

• 1
点赞
• 1
收藏
觉得还不错? 一键收藏
• 打赏
• 0
评论
03-21 4966
05-05 3949
08-06 592
10-19 1873
11-30 2447
11-18 1620
11-21 2676
01-19 3万+
10-10 4166
04-09 584

### “相关推荐”对你有帮助么？

• 非常没帮助
• 没帮助
• 一般
• 有帮助
• 非常有帮助

¥1 ¥2 ¥4 ¥6 ¥10 ¥20

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。