Focal loss主要是为了解决one-stage目标检测中正负样本比例严重失衡的问题。该损失函数降低了大量简单负样本在训练中所占的权重,也可理解为一种困难样本挖掘。
1.因子gamma>0使得减少易分类样本的损失。使得更关注于困难的、错分的样本.
2.平衡因子alpha,用来平衡正负样本本身的比例不均:文中alpha取0.25,即正样本要比负样本占比小,这是因为负例易分。
作者认为one-stage和two-stage的表现差异主要原因是大量前景背景类别不平衡导致。作者设计了一个简单密集型网络RetinaNet来训练在保证速度的同时达到了精度最优。在双阶段算法中,在候选框阶段,通过得分和nms筛选过滤掉了大量的负样本,然后在分类回归阶段又固定了正负样本比例,或者通过OHEM在线困难挖掘使得前景和背景相对平衡。而one-stage阶段需要产生约100k的候选位置,虽然有类似的采样,但是训练仍然被大量负样本所主导。
def binary_focal_loss(self,gamma=2., alpha=.25):
"""
Binary form of focal loss.
FL(p_t) = -alpha * (1 - p_t)**gamma * log(p_t)
where p = sigmoid(x), p_t = p or 1 - p depending on if the label is 1 or 0, respectively.
References:
https://arxiv.org/pdf/1708.02002.pdf
Usage:
model.compile(loss=[binary_focal_loss(alpha=.25, gamma=2)], metrics=["accuracy"], optimizer=adam)
"""
def binary_focal_loss_fixed(y_true, y_pred):
"""
:param y_true: A tensor of the same shape as `y_pred`
:param y_pred: A tensor resulting from a sigmoid
:return: Output tensor.
"""
y_true = tf.cast(y_true, tf.float32)
# Define epsilon so that the back-propagation will not result in NaN for 0 divisor case
epsilon = K.epsilon()
# Add the epsilon to prediction value
# y_pred = y_pred + epsilon
# Clip the prediciton value
y_pred = K.clip(y_pred, epsilon, 1.0 - epsilon)
# Calculate p_t
p_t = tf.where(K.equal(y_true, 1), y_pred, 1 - y_pred)
# Calculate alpha_t
alpha_factor = K.ones_like(y_true) * alpha
alpha_t = tf.where(K.equal(y_true, 1), alpha_factor, 1 - alpha_factor)
# Calculate cross entropy
cross_entropy = -K.log(p_t)
weight = alpha_t * K.pow((1 - p_t), gamma)
# Calculate focal loss
loss = weight * cross_entropy
# Sum the losses in mini_batch
loss = K.mean(K.sum(loss, axis=1))
return loss
return binary_focal_loss_fixed
def categorical_focal_loss(self,alpha=[[.25, .25, .25]], gamma=2.):
"""
Softmax version of focal loss.
When there is a skew between different categories/labels in your data set, you can try to apply this function as a
loss.
m
FL = ∑ -alpha * (1 - p_o,c)^gamma * y_o,c * log(p_o,c)
c=1
where m = number of classes, c = class and o = observation
Parameters:
alpha -- the same as weighing factor in balanced cross entropy. Alpha is used to specify the weight of different
categories/labels, the size of the array needs to be consistent with the number of classes.
gamma -- focusing parameter for modulating factor (1-p)
Default value:
gamma -- 2.0 as mentioned in the paper
alpha -- 0.25 as mentioned in the paper
References:
Official paper: https://arxiv.org/pdf/1708.02002.pdf
https://www.tensorflow.org/api_docs/python/tf/keras/backend/categorical_crossentropy
Usage:
model.compile(loss=[categorical_focal_loss(alpha=[[.25, .25, .25]], gamma=2)], metrics=["accuracy"], optimizer=adam)
"""
alpha = np.array(alpha, dtype=np.float32)
def categorical_focal_loss_fixed(y_true, y_pred):
"""
:param y_true: A tensor of the same shape as `y_pred`
:param y_pred: A tensor resulting from a softmax
:return: Output tensor.
"""
# Clip the prediction value to prevent NaN's and Inf's
epsilon = K.epsilon()
y_pred = K.clip(y_pred, epsilon, 1. - epsilon)
# Calculate Cross Entropy
cross_entropy = -y_true * K.log(y_pred)
# Calculate Focal Loss
loss = alpha * K.pow(1 - y_pred, gamma) * cross_entropy
# Compute mean loss in mini_batch
return K.mean(K.sum(loss, axis=-1))
return categorical_focal_loss_fixed
def model_compile(self,model):
adam = keras.optimizers.Adam(lr=self.lr)
if(self.classNumber==2):
model.compile(loss=[self.categorical_focal_loss(alpha=[[.25, .25]], gamma=2)], metrics=["accuracy"], optimizer=adam)
if (self.classNumber == 6):
model.compile(loss=[self.categorical_focal_loss(alpha=[[.25, .25, .25, .25, .25, .25]], gamma=2)],
metrics=["accuracy"], optimizer=adam)
if (self.classNumber == 4):
model.compile(loss=[self.categorical_focal_loss(alpha=[[.25, .25, .25, .25]], gamma=2)],
metrics=["accuracy"], optimizer=adam)
参考文章:https://www.cnblogs.com/king-lps/p/9497836.html