SSD Keras版源码史上最详细解读系列之SSDLoss解析
损失函数keras_ssd_loss.py解析
根据论文,他的损失函数也不难理解,只是具体编码的时候还是有些复杂的,毕竟维数比较多,还要统一格式,我们来看看吧,首先是smooth_L1_loss
方法:
def smooth_L1_loss(self, y_true, y_pred):
'''
Compute smooth L1 loss, see references.
Arguments:
y_true (nD tensor): A TensorFlow tensor of any shape containing the ground truth data.
In this context, the expected tensor has shape `(batch_size, #boxes, 4)` and
contains the ground truth bounding box coordinates, where the last dimension
contains `(xmin, xmax, ymin, ymax)`.
y_pred (nD tensor): A TensorFlow tensor of identical structure to `y_true` containing
the predicted data, in this context the predicted bounding box coordinates.
Returns:
The smooth L1 loss, a nD-1 Tensorflow tensor. In this context a 2D tensor
of shape (batch, n_boxes_total).
References:
https://arxiv.org/abs/1504.08083
'''
# 绝对值误差 |x|
absolute_loss = tf.abs(y_true - y_pred)
# 均方误差 0.5x^2
square_loss = 0.5 * (y_true - y_pred)**2
# 如果absolute_loss小于1就用square_loss的值 否则用absolute_loss - 0.5的值替换square_loss里对应值,也就是smoothL1的公式
l1_loss = tf.where(tf.less(absolute_loss, 1.0), square_loss, absolute_loss - 0.5)
return tf.reduce_sum(l1_loss, axis=-1)
这个就是论文里的smooth_L1
,只是要搞清楚tf的一些方法就可以理解了。
然后是交叉熵,这个也好理解:
# 交叉熵
def log_loss(self, y_true, y_pred):
'''
Compute the softmax log loss.
Arguments:
y_true (nD tensor): A TensorFlow tensor of any shape containing the ground truth data.
In this context, the expected tensor has shape (batch_size, #boxes, #classes)
and contains the ground truth bounding box categories.
y_pred (nD tensor): A TensorFlow tensor of identical structure to `y_true` containing
the predicted data, in this context the predicted bounding box categories.
Returns:
The softmax log loss, a nD-1 Tensorflow tensor. In this context a 2D tensor
of shape (batch, n_boxes_total).
'''
# Make sure that `y_pred` doesn't contain any zeros (which would break the log function)
y_pred = tf.maximum(y_pred, 1e-15)
# Compute the log loss
log_loss = -tf.reduce_sum(y_true * tf.log(y_pred), axis=-1)
return log_loss
比较难理解的还是总的损失,不过基本我也都注释了,只是里面用了很多的tf的一些函数,搞清楚这些函数理解起来也就不麻烦了:
# 总的损失
def compute_loss(self, y_true, y_pred):
'''
Compute the loss of the SSD model prediction against the ground truth.
Arguments:
# 真实值 (batch_size, #boxes, #classes + 12) 注意他说了,分类已经是onhout编码了,
而且最后的8个信息这个方法用不到,只是为了和预测的数据形状一样
y_true (array): A Numpy array of shape `(batch_size, #boxes, #classes + 12)`,
where `#boxes` is the total number of boxes that the model predicts
per image. Be careful to make sure that the index of each given
box in `y_true` is the same as the index for the corresponding
box in `y_pred`. The last axis must have length `#classes + 12` and contain
`[classes one-hot encoded, 4 ground truth box coordinate offsets, 8 arbitrary entries]`
in this order, including the background class. The last eight entries of the
last axis are not used by this function and therefore their contents are
irrelevant, they only exist so that `y_true` has the same shape as `y_pred`,
where the last four entries of the last axis contain the anchor box
coordinates, which are needed during inference. Important: Boxes that
you want the cost function to ignore need to have a one-hot
class vector of all zeros.
# 预测值
y_pred (Keras tensor): The model prediction. The shape is identical
to that of `y_true`, i.e. `(batch_size, #boxes, #classes + 12)`.
The last axis must contain entries in the format
`[classes one-hot encoded, 4 predicted box coordinate offsets, 8 arbitrary entries]`.
Returns:
A scalar, the total multitask loss for classification and localization.
'''
self.neg_pos_ratio = tf.constant(self.neg_pos_ratio)
self.n_neg_min = tf.constant(self.n_neg_min)
self.alpha = tf.constant(self.alpha)
#批量个数
batch_size = tf.shape(y_pred)[0] # Output dtype: tf.int32
# 预测框个数
n_boxes = tf.shape(y_pred)[1] # Output dtype: tf.int32, note that `n_boxes` in this context denotes the total number of boxes per image, not the number of boxes per cell.
# 1: Compute the losses for class and box predictions for every box.
# 计算分类和回归的误差 21个类别 4个坐标信息
classification_loss = tf.to_float(self.log_loss(y_true[:,:,:-12], y_pred[:,:,:-12])) # Output shape: (batch_size, n_boxes)
localization_loss = tf.to_float(self.smooth_L1_loss(y_true[:,:,-12:-8], y_pred[:,:,-12:-8])) # Output shape: (batch_size, n_boxes)
# 2: Compute the classification losses for the positive and negative targets.
# 给正负例计算分类误差
# Create masks for the positive and negative ground truth classes.
# 索引0是背景类别,这类置信度为1的是负例
negatives = y_true[:,:,0] # Tensor of shape (batch_size, n_boxes)
# 找出分类置信度为1的正例,因为前面说了分类onehot编码了,下表从1开始,也就是不算背景
positives = tf.to_float(tf.reduce_max(y_true[:,:,1:-12], axis=-1)) # Tensor of shape (batch_size, n_boxes)
# 计算正例的个数 因为是onehot,刚好值是1,只要累加起来就是算总个数
# Count the number of positive boxes (classes 1 to n) in y_true across the whole batch.
n_positive = tf.reduce_sum(positives)
# Now mask all negative boxes and sum up the losses for the positive boxes per batch item
# (Keras loss functions must output one scalar loss value per batch item, rather than just
# one scalar for the entire batch, that's why we're not summing across all axes).
# 计算正例的分类误差和 onehot的,只会计算对应的类的误差,其他是0
pos_class_loss = tf.reduce_sum(classification_loss * positives, axis=-1) # Tensor of shape (batch_size,)
# Compute the classification loss for the negative default boxes (if there are any).
# First, compute the classification loss for all negative boxes.
# 所有负例的分类误差
neg_class_loss_all = classification_loss * negatives # Tensor of shape (batch_size, n_boxes)
# 负例误差的个数
n_neg_losses = tf.count_nonzero(neg_class_loss_all, dtype=tf.int32) # The number of non-zero loss entries in `neg_class_loss_all`
# What's the point of `n_neg_losses`? For the next step, which will be to compute which negative boxes enter the classification
# loss, we don't just want to know how many negative ground truth boxes there are, but for how many of those there actually is
# a positive (i.e. non-zero) loss. This is necessary because `tf.nn.top-k()` in the function below will pick the top k boxes with
# the highest losses no matter what, even if it receives a vector where all losses are zero. In the unlikely event that all negative
# classification losses are actually zero though, this behavior might lead to `tf.nn.top-k()` returning the indices of positive
# boxes, leading to an incorrect negative classification loss computation, and hence an incorrect overall loss computation.
# We therefore need to make sure that `n_negative_keep`, which assumes the role of the `k` argument in `tf.nn.top-k()`,
# is at most the number of negative boxes for which there is a positive classification loss.
# Compute the number of negative examples we want to account for in the loss.
# We'll keep at most `self.neg_pos_ratio` times the number of positives in `y_true`, but at least `self.n_neg_min` (unless `n_neg_loses` is smaller).
# 负例个数最多正例个数的3倍
n_negative_keep = tf.minimum(tf.maximum(self.neg_pos_ratio * tf.to_int32(n_positive), self.n_neg_min), n_neg_losses)
# In the unlikely case when either (1) there are no negative ground truth boxes at all
# or (2) the classification loss for all negative boxes is zero, return zero as the `neg_class_loss`.
def f1():
# 返回全0 (batch_size, )
return tf.zeros([batch_size])
# Otherwise compute the negative loss.
def f2():
# 返回对应的负例分类损失 形状(batch_size, )
# Now we'll identify the top-k (where k == `n_negative_keep`) boxes with the highest confidence loss that
# belong to the background class in the ground truth data. Note that this doesn't necessarily mean that the model
# predicted the wrong class for those boxes, it just means that the loss for those boxes is the highest.
# 改变新装,变成1维 (batch_size * n_boxes,)
# To do this, we reshape `neg_class_loss_all` to 1D...
neg_class_loss_all_1D = tf.reshape(neg_class_loss_all, [-1]) # Tensor of shape (batch_size * n_boxes,)
# 获取置信度最大的K个损失和相对应的索引
# ...and then we get the indices for the `n_negative_keep` boxes with the highest loss out of those...
values, indices = tf.nn.top_k(neg_class_loss_all_1D,
k=n_negative_keep,
sorted=False) # We don't need them sorted.
# 创建一个遮罩,形状是负例损失的形状,把对应位置都设置成1,其他都是0
# ...and with these indices we'll create a mask...
negatives_keep = tf.scatter_nd(indices=tf.expand_dims(indices, axis=1),
updates=tf.ones_like(indices, dtype=tf.int32),
shape=tf.shape(neg_class_loss_all_1D)) # Tensor of shape (batch_size * n_boxes,)
# 重新设置形状 形状(batch_size, n_boxes)
negatives_keep = tf.to_float(tf.reshape(negatives_keep, [batch_size, n_boxes])) # Tensor of shape (batch_size, n_boxes)
# 将对应是1的负例分类损失的地方乘以分类的损失就是损失,0的地方乘了也是0,然后所有框都的损失求和 形状(batch_size,)
# ...and use it to keep only those boxes and mask all other classification losses
neg_class_loss = tf.reduce_sum(classification_loss * negatives_keep, axis=-1) # Tensor of shape (batch_size,)
return neg_class_loss
# 根据负例数来判断调用哪个损失函数 负例数为0用f1, 否则用f2 形状(batch_size,)
neg_class_loss = tf.cond(tf.equal(n_neg_losses, tf.constant(0)), f1, f2)
# 正例和负例的分类损失相加
class_loss = pos_class_loss + neg_class_loss # Tensor of shape (batch_size,)
# 3: Compute the localization loss for the positive targets.
# We don't compute a localization loss for negative predicted boxes (obviously: there are no ground truth boxes they would correspond to).
# 回归损失值要求正例的即可
loc_loss = tf.reduce_sum(localization_loss * positives, axis=-1) # Tensor of shape (batch_size,)
# 4: Compute the total loss.
# 总的回归和分类损失 正例最少也是1
total_loss = (class_loss + self.alpha * loc_loss) / tf.maximum(1.0, n_positive) # In case `n_positive == 0`
# Keras has the annoying habit of dividing the loss by the batch size, which sucks in our case
# because the relevant criterion to average our loss over is the number of positive boxes in the batch
# (by which we're dividing in the line above), not the batch size. So in order to revert Keras' averaging
# over the batch size, we'll have to multiply by it.
# 本来貌似应该是除以N的,但是他说keras有求平均的习惯,我们要把他给还原回来,所以要乘以批量数,
# 其实我觉得乘不乘对求优化问题来说关系不大
total_loss = total_loss * tf.to_float(batch_size)
return total_loss
论文里看起来一个公式就好了,其实编码没那么简单,要考虑一些数据的格式,比如分类要转化为onehot格式,要考虑正负样本的比例不超过3倍,然后回归损失只算正样本。还要注意这里分类数据第一个是背景类,看成负例,还有个小技巧,onehot编码后,统计正例个数只要统计这个编码的和就行。最后他还成了批量个数,说是因为keras内部可能求了平均,不过我觉得损失乘以一个常数关系不大,主要还是loss要下降,作者这么做其实也可以。
好了,今天就到这里了,希望对学习理解有帮助,大神看见勿喷,仅为自己的学习理解,能力有限,请多包涵。