本文提出了一种多任务loss权重学习的方式,通过训练协调了三个任务的loss,有语义分割、实力分割和depth regression,结构主要是:
文中主要分为了三种情况来介绍:
-
回归任务
定义一个概率分布函数,假设其符合高斯分布,令 f W ( x ) f^W(x) fW(x)为输入为x、权重为W的网络的输出,并将 f W ( x ) f^W(x) fW(x)看作均值 μ \mu μ,则有:
p ( y ∣ f W ( x ) ) = N ( f W ( x ) , σ 2 ) p(y|f^W(x))=N(f^W(x),\sigma^2) p(y∣fW(x))=N(fW(x),σ2)
取log可得:
l o g p ( y ∣ f W ( x ) ) = − 1 2 σ 2 ∣ ∣ y − f W ( x ) ∣ ∣ 2 − l o g 2 π σ ∝ − 1 2 σ 2 ∣ ∣ y − f W ( x ) ∣ ∣ 2 − l o g σ \begin{aligned} logp(y|f^W(x)) &=-\frac{1}{2\sigma^2}||y-f^W(x)||^2-log\sqrt{2\pi}\sigma\\ &∝-\frac{1}{2\sigma^2}||y-f^W(x)||^2-log\sigma \end{aligned} logp(y∣fW(x))=−2σ21∣∣y−fW(x)∣∣2−log2πσ∝−2σ21∣∣y−fW(x)∣∣2−logσ
当有两个输出 y 1 y_1 y1和 y 2 y_2 y2时:
p ( y 1 , y 2 ∣ f W ( x ) ) = p ( y 1 ∣ f W ( x ) ) ⋅ p ( y 2 ∣ f W ( x ) ) = N ( y 1 ; f W ( x ) , σ 1 2 ) ⋅ N ( y 2 ; f W ( x ) , σ 2 2 ) \begin{aligned} p(y_1,y_2|f^W(x))&=p(y_1|f^W(x))\cdot p(y_2|f^W(x))\\ &=N(y_1;f^W(x),\sigma_1^2)\cdot N(y_2;f^W(x),\sigma_2^2) \end{aligned} p(y1,y2∣fW(x))=p(y1∣fW(x))⋅p(y2∣fW(x))=N(y1;fW(x),σ12)⋅N(y2;fW(x),σ22)
则定义:
L ( W , σ 1 , σ 2 ) = − l o g p ( y 1 , y 2 ∣ f W ( x ) ) ∝ 1 2 σ 1 2 ∣ ∣ y 1 − f W ( x ) ∣ ∣ 2 + 1 2 σ 2 2 ∣ ∣ y 2 − f W ( x ) ∣ ∣ 2 + l o g σ 1 σ 2 = 1 2 σ 1 2 L 1 ( W ) + 1 2 σ 2 2 L 2 ( W ) + l o g σ 1 σ 2 , L 1 ( W ) = ∣ ∣ y 1 − f W ( x ) ∣ ∣ 2 , L 2 ( W ) = ∣ ∣ y 2 − f W ( x ) ∣ ∣ 2 \begin{aligned} L(W,\sigma_1,\sigma_2)&=-logp(y_1,y_2|f^W(x))\\& ∝\frac{1}{2\sigma_1^2}||y_1-f^W(x)||^2+\frac{1}{2\sigma_2^2}||y_2-f^W(x)||^2+log\sigma_1\sigma_2\\&=\frac{1}{2\sigma_1^2}L_1(W)+\frac{1}{2\sigma_2^2}L_2(W)+log\sigma_1\sigma_2,\\&L_1(W)=||y_1-f^W(x)||^2,L_2(W)=||y_2-f^W(x)||^2 \end{aligned} L(W,σ1,σ2)=−logp(y1,y2∣fW(x))∝2σ121∣∣y1−fW(x)∣∣2+2σ221∣∣y2−fW(x)∣∣2+logσ1σ2=2σ121L1(W)+2σ221L2(W)+logσ1σ2,L1(W)=∣∣y1−fW(x)∣∣2,L2(W)=∣∣y2−fW(x)∣∣2 -
分类任务
定义概率分布函数:
P ( y ∣ f W ( x ) , σ ) = S o f t m a x ( 1 σ 2 f W ( x ) ) P(y|f^W(x),\sigma)=Softmax(\frac{1}{\sigma^2}f^W(x)) P(y∣fW(x),σ)=Softmax(σ21fW(x))
考虑到 S o f t m a x ( x ) = e x p ( x i ) ∑ i e x p ( x i ) Softmax(x)=\frac{exp(x_i)}{\sum_i exp(x_i)} Softmax(x)=∑iexp(xi)exp(xi),则log似然函数可化简为:
l o g p ( y = c ∣ f W ( x ) , σ ) = 1 σ 2 f c W ( x ) − l o g ∑ c ′ e x p ( 1 σ 2 f c ′ W ( x ) ) , 其 中 , f c ’ W ( x ) 是 f c W ( x ) 向 量 的 第 c 个 元 素 . \begin{aligned}&logp(y=c|f^W(x),\sigma)=\frac{1}{\sigma^2}f_c^W(x)-log\sum_{c'}exp(\frac{1}{\sigma^2}f_{c'}^W(x)),\\&其中,f_{c’}^W(x)是f_c^W(x)向量的第c个元素.\end{aligned} logp(y=c∣fW(x),σ)=σ21fcW(x)−logc′∑exp(σ21fc′W(x)),其中,fc’W(x)是fcW(x)向量的第c个元素.
接下来,取log,定义新的Loss:
L ( W , σ ) = − l o g p ( y = c ∣ f W ( x ) , σ ) L(W,\sigma)=-logp(y=c|f^W(x),\sigma) L(W,σ)=−logp(y=c∣fW(x),σ)令 L 1 ( W ) = − l o g S o f t m a x ( y , f W ( x ) ) L_1(W)=-log Softmax(y,f^W(x)) L1(W)=−logSoftmax(y,fW(x)),即y的cross entropy loss.那么上式可继续化简:
L ( W , σ ) = − l o g p ( y = c ∣ f W ( x ) , σ ) = − l o g p ( y = c ∣ f W ( x ) , σ ) + 1 σ 2 L 1 ( W ) − 1 σ 2 L 1 ( W ) = 1 σ 2 L 1 ( W ) − l o g p ( y = c ∣ f W ( x ) , σ ) − 1 σ 2 L 1 ( W ) = 1 σ 2 L 1 ( W ) − 1 σ 2 f c W ( x ) + l o g ∑ c ′ e x p ( 1 σ 2 f c ′ W ( x ) ) + 1 σ 2 l o g S o f t m a x ( y = c , f W ( x ) ) = 1 σ 2 L 1 ( W ) − 1 σ 2 f c W ( x ) + l o g ∑ c ′ e x p ( 1 σ 2 f c ′ W ( x ) ) + 1 σ 2 [ f c W ( x ) − l o g ∑ c ′ e x p ( f c ′ W ( x ) ) ] = 1 σ 2 L 1 ( W ) + l o g ∑ c ′ e x p ( 1 σ 2 f c ′ W ( x ) [ ∑ c ′ e x p ( f c ′ W ( x ) ) ] 1 σ 2 \begin{aligned} L(W,\sigma)&=-logp(y=c|f^W(x),\sigma)\\ &=-logp(y=c|f^W(x),\sigma)+\frac{1}{\sigma^2} L_1(W)-\frac{1}{\sigma^2} L_1(W)\\ &=\frac{1}{\sigma^2} L_1(W)-logp(y=c|f^W(x),\sigma)-\frac{1}{\sigma^2} L_1(W)\\ &=\frac{1}{\sigma^2} L_1(W)-\frac{1}{\sigma^2}f_c^W(x)+log\sum_{c'}exp(\frac{1}{\sigma^2}f_{c'}^W(x))+\frac{1}{\sigma^2}log Softmax(y=c,f^W(x))\\ &=\frac{1}{\sigma^2} L_1(W)-\frac{1}{\sigma^2}f_c^W(x)+log\sum_{c'}exp(\frac{1}{\sigma^2}f_{c'}^W(x))+\frac{1}{\sigma^2}[f_c^W(x)-log\sum_{c'}exp(f_{c'}^W(x))]\\ &=\frac{1}{\sigma^2} L_1(W)+log{\frac{\sum_{c'}exp(\frac{1}{\sigma^2}f_{c'}^W(x)}{[\sum_{c'}exp(f_{c'}^W(x))]^{\frac{1}{\sigma^2}}}} \end{aligned} L(W,σ)=−logp(y=c∣fW(x),σ)=−logp(y=c∣fW(x),σ)+σ21L1(W)−σ21L1(W)=σ21L1(W)−logp(y=c∣fW(x),σ)−σ21L1(W)=σ21L1(W)−σ21fcW(x)+logc′∑exp(σ21fc′W(x))+σ21logSoftmax(y=c,fW(x))=σ21L1(W)−σ21fcW(x)+logc′∑exp(σ21fc′W(x))+σ21[fcW(x)−logc′∑exp(fc′W(x))]=σ21L1(W)+log[∑c′exp(fc′W(x))]σ21∑c′exp(σ21fc′W(x)
”上方的推导过程加入了个人的理解,如有问题,感谢指出“当 σ → 1 \sigma \rightarrow 1 σ→1时, 1 σ ∑ c ′ e x p ( 1 σ 2 f c ′ W ( x ) ) ≈ [ ∑ c ′ e x p ( f c ′ W ( x ) ) ] 1 σ 2 \frac{1}{\sigma}\sum_{c'}exp(\frac{1}{\sigma^2}f_{c'}^W(x))\approx[\sum_{c'}exp(f_{c'}^W(x))]^{\frac{1}{\sigma^2}} σ1∑c′exp(σ21fc′W(x))≈[∑c′exp(fc′W(x))]σ21,则上式可继续化简为:
L ( W , σ ) = 1 σ 2 L 1 ( W ) + l o g ∑ c ′ e x p ( 1 σ 2 f c ′ W ( x ) [ ∑ c ′ e x p ( f c ′ W ( x ) ) ] 1 σ 2 = 1 σ 2 L 1 ( W ) + l o g σ 1 σ ∑ c ′ e x p ( 1 σ 2 f c ′ W ( x ) [ ∑ c ′ e x p ( f c ′ W ( x ) ) ] 1 σ 2 ≈ 1 σ 2 L 1 ( W ) + l o g σ \begin{aligned} L(W,\sigma)&=\frac{1}{\sigma^2} L_1(W)+log{\frac{\sum_{c'}exp(\frac{1}{\sigma^2}f_{c'}^W(x)}{[\sum_{c'}exp(f_{c'}^W(x))]^{\frac{1}{\sigma^2}}}}\\ &=\frac{1}{\sigma^2} L_1(W)+log \sigma{\frac{\frac{1}{\sigma}\sum_{c'}exp(\frac{1}{\sigma^2}f_{c'}^W(x)}{[\sum_{c'}exp(f_{c'}^W(x))]^{\frac{1}{\sigma^2}}}}\\ &\approx\frac{1}{\sigma^2} L_1(W)+log\sigma \end{aligned} L(W,σ)=σ21L1(W)+log[∑c′exp(fc′W(x))]σ21∑c′exp(σ21fc′W(x)=σ21L1(W)+logσ[∑c′exp(fc′W(x))]σ21σ1∑c′exp(σ21fc′W(x)≈σ21L1(W)+logσ -
两种类型的混合任务,假设 y 1 y_1 y1是连续输出, y 2 y_2 y2是离散输出
根据1.和2.的推导可得联合Loss L ( W , σ 1 , σ 2 ) L(W,\sigma_1,\sigma_2) L(W,σ1,σ2)如下图:
其中, L 1 ( W ) = ∣ ∣ y 1 − f W ( x ) ∣ ∣ 2 , L 2 ( W ) = − l o g S o f t m a x ( y 2 , f W ( x ) ) L_1(W)=||y_1-f^W(x)||^2,L_2(W)=-log Softmax(y_2,f^W(x)) L1(W)=∣∣y1−fW(x)∣∣2,L2(W)=−logSoftmax(y2,fW(x)).