神经网络中常用损失函数汇总
假设训练集有N个数据对,输入数据 X X X: x 1 , x 2 , ⋯ , x N x_1, x_2, \cdots, x_N x1,x2,⋯,xN,输入数据预测值为 Y p r e d i c t Y_{predict} Ypredict: y p r e d i c t 1 , y p r e d i c t 2 , ⋯ , y p r e d i c t N y_{predict}^1, y_{predict}^2, \cdots, y_{predict}^N ypredict1,ypredict2,⋯,ypredictN,输入数据真实值为 Y t r u e Y_{true} Ytrue: y t r u e 1 , y t r u e 2 , ⋯ , y t r u e N y_{true}^1, y_{true}^2, \cdots, y_{true}^N ytrue1,ytrue2,⋯,ytrueN
mean_squared_error 均方误差
M S E ( Y t r u e , Y p r e d i c t ) = 1 N ∑ i = 1 N ( y t r u e i − y p r e d i c t i ) 2 MSE(Y_{true}, Y_{predict})=\frac{1}{N} \sum_{i=1}^{N}(y_{true}^i-y_{predict}^i)^2 MSE(Ytrue,Ypredict)=N1i=1∑N(ytruei−ypredicti)2
def mean_squared_error(y_true, y_pred):
return K.mean(K.square(y_pred - y_true), axis=-1)
mean_absolute_error:平均绝对误差
M A E ( Y t r u e , Y p r e d i c t ) = 1 N ∑ i = 1 N ∣ y t r u e i − y p r e d i c t i ∣ MAE(Y_{true}, Y_{predict})=\frac{1}{N} \sum_{i=1}^{N}|y_{true}^i-y_{predict}^i| MAE(Ytrue,Ypredict)=N1i=1∑N∣ytruei−ypredicti∣
def mean_absolute_error(y_true, y_pred):
return K.mean(K.abs(y_pred - y_true), axis=-1)
mean_absolute_percentage_error:平均绝对百分比误差
M A P E ( Y t r u e , Y p r e d i c t ) = 1 N ∑ i = 1 N ∣ y t r u e i − y p r e d i c t i y t r u e i ∣ MAPE(Y_{true}, Y_{predict})=\frac{1}{N} \sum_{i=1}^{N}|\frac{y_{true}^i-y_{predict}^i}{y_{true}^i}| MAPE(Ytrue,Ypredict)=N1i=1∑N∣ytrueiytruei−ypredicti∣
这里要注意分母不能除0
def mean_absolute_percentage_error(y_true, y_pred):
diff = K.abs((y_true - y_pred) / K.clip(K.abs(y_true),
K.epsilon(),
None))
return 100. * K.mean(diff, axis=-1)
hinge:合页误差:
h i n g e ( Y t r u e , Y p r e d i c t ) = 1 N ∑ i = 1 N ( m a x ( 1 − y t r u e i y p r e d i c t i , 0 ) hinge(Y_{true}, Y_{predict})=\frac{1}{N} \sum_{i=1}^{N}(max(1-y_{true}^iy_{predict}^i, 0) hinge(Ytrue,Ypredict)=N1i=1∑N(max(1−ytrueiypredicti,0)
def hinge(y_true, y_pred):
return K.mean(K.maximum(1. - y_true * y_pred, 0.), axis=-1)
squared_hinge
h i n g e ( Y t r u e , Y p r e d i c t ) = 1 N ∑ i = 1 N ( m a x ( 1 − y t r u e i y p r e d i c t i , 0 ) 2 hinge(Y_{true}, Y_{predict})=\frac{1}{N} \sum_{i=1}^{N}(max(1-y_{true}^iy_{predict}^i, 0)^2 hinge(Ytrue,Ypredict)=N1i=1∑N(max(1−ytrueiypredicti,0)2
def squared_hinge(y_true, y_pred):
return K.mean(K.square(K.maximum(1. - y_true * y_pred, 0.)), axis=-1)
categorical_hinge
def categorical_hinge(y_true, y_pred):
pos = K.sum(y_true * y_pred, axis=-1)
neg = K.max((1. - y_true) * y_pred, axis=-1)
return K.maximum(0., neg - pos + 1.)
categorical_crossentropy:多分类交叉熵
多类交叉熵损失针对的是多分类问题。真实值采用one-hot编码。例如总共有3个类,第0个类表示为 (1,0,0)。假设有n个类,则第 i i i个样本真实值为 y i = ( y 1 i , y 2 i , ⋯ , y n i ) y^i=(y_1^i, y_2^i, \cdots, y_n^i) yi=(y1i,y2i,⋯,yni),预测值为 y ^ i = ( y ^ 1 i , y ^ 2 i , ⋯ , y ^ n i ) \hat{y}^i=(\hat{y}_1^i, \hat{y}_2^i, \cdots, \hat{y}_n^i) y^i=(y^1i,y^2i,⋯,y^ni),
c a t e g o r i c a l _ c r o s s e n t r o p y ( Y t r u e , Y p r e d i c t ) = 1 N ∑ i = 1 N ∑ j = 1 n y j i log y ^ j i categorical\_ crossentropy(Y_{true}, Y_{predict})=\frac{1}{N}\sum_{i=1}^{N}\sum_{j=1}^{n}y^i_j \log \hat{y}^{i}_{j} categorical_crossentropy(Ytrue,Ypredict)=N1i=1∑Nj=1∑nyjilogy^ji
sparse_categorical_crossentropy:稀疏的分类交叉熵
原理与categorical_crossentropy一样,不过真实值采用的整数编码。例如第0个类用数字0表示,第3个类用数字3表示。
binary_crossentropy:二分类交叉熵。
b i n a r y _ c r o s s e n t r o p y ( Y t r u e , Y p r e d i c t ) = − 1 N ∑ i = 1 N [ y t r u e i log ( y p r e d i ) + ( 1 − y t r u e i ) log ( 1 − y p r e d i ) ] binary \_crossentropy(Y_{true}, Y_{predict})=-\frac{1}{N}\sum_{i=1}^{N}[y^i_{true} \log (y^i_{pred}) + (1-y_{true}^i)\log (1-y^i_{pred})] binary_crossentropy(Ytrue,Ypredict)=−N1i=1∑N[ytrueilog(ypredi)+(1−ytruei)log(1−ypredi)]
当 y p r e d i y_{pred}^i ypredi和 y t r u e i y_{true}^i ytruei一致时(都为0或1),交叉熵为0,否则为无穷。
先汇总这么多,以后在更。