SSD Keras版源码史上最详细解读系列之SSDLoss解析

SSD Keras版源码史上最详细解读系列之SSDLoss解析

损失函数keras_ssd_loss.py解析

根据论文,他的损失函数也不难理解,只是具体编码的时候还是有些复杂的,毕竟维数比较多,还要统一格式,我们来看看吧,首先是smooth_L1_loss方法:

 def smooth_L1_loss(self, y_true, y_pred):
        '''
        Compute smooth L1 loss, see references.

        Arguments:
            y_true (nD tensor): A TensorFlow tensor of any shape containing the ground truth data.
                In this context, the expected tensor has shape `(batch_size, #boxes, 4)` and
                contains the ground truth bounding box coordinates, where the last dimension
                contains `(xmin, xmax, ymin, ymax)`.
            y_pred (nD tensor): A TensorFlow tensor of identical structure to `y_true` containing
                the predicted data, in this context the predicted bounding box coordinates.

        Returns:
            The smooth L1 loss, a nD-1 Tensorflow tensor. In this context a 2D tensor
            of shape (batch, n_boxes_total).

        References:
            https://arxiv.org/abs/1504.08083
        '''
        # 绝对值误差 |x|
        absolute_loss = tf.abs(y_true - y_pred)
        # 均方误差 0.5x^2
        square_loss = 0.5 * (y_true - y_pred)**2
        # 如果absolute_loss小于1就用square_loss的值 否则用absolute_loss - 0.5的值替换square_loss里对应值,也就是smoothL1的公式
        l1_loss = tf.where(tf.less(absolute_loss, 1.0), square_loss, absolute_loss - 0.5)
        return tf.reduce_sum(l1_loss, axis=-1)

这个就是论文里的smooth_L1,只是要搞清楚tf的一些方法就可以理解了。
然后是交叉熵,这个也好理解:

  # 交叉熵
    def log_loss(self, y_true, y_pred):
        '''
        Compute the softmax log loss.

        Arguments:
            y_true (nD tensor): A TensorFlow tensor of any shape containing the ground truth data.
                In this context, the expected tensor has shape (batch_size, #boxes, #classes)
                and contains the ground truth bounding box categories.
            y_pred (nD tensor): A TensorFlow tensor of identical structure to `y_true` containing
                the predicted data, in this context the predicted bounding box categories.

        Returns:
            The softmax log loss, a nD-1 Tensorflow tensor. In this context a 2D tensor
            of shape (batch, n_boxes_total).
        '''
        # Make sure that `y_pred` doesn't contain any zeros (which would break the log function)
        y_pred = tf.maximum(y_pred, 1e-15)
        # Compute the log loss
        log_loss = -tf.reduce_sum(y_true * tf.log(y_pred), axis=-1)
        return log_loss

比较难理解的还是总的损失,不过基本我也都注释了,只是里面用了很多的tf的一些函数,搞清楚这些函数理解起来也就不麻烦了:

 # 总的损失
    def compute_loss(self, y_true, y_pred):
        '''
        Compute the loss of the SSD model prediction against the ground truth.

        Arguments:
            # 真实值 (batch_size, #boxes, #classes + 12) 注意他说了,分类已经是onhout编码了,
            而且最后的8个信息这个方法用不到,只是为了和预测的数据形状一样

            y_true (array): A Numpy array of shape `(batch_size, #boxes, #classes + 12)`,
                where `#boxes` is the total number of boxes that the model predicts
                per image. Be careful to make sure that the index of each given
                box in `y_true` is the same as the index for the corresponding
                box in `y_pred`. The last axis must have length `#classes + 12` and contain
                `[classes one-hot encoded, 4 ground truth box coordinate offsets, 8 arbitrary entries]`
                in this order, including the background class. The last eight entries of the
                last axis are not used by this function and therefore their contents are
                irrelevant, they only exist so that `y_true` has the same shape as `y_pred`,
                where the last four entries of the last axis contain the anchor box
                coordinates, which are needed during inference. Important: Boxes that
                you want the cost function to ignore need to have a one-hot
                class vector of all zeros.
            # 预测值

            y_pred (Keras tensor): The model prediction. The shape is identical
                to that of `y_true`, i.e. `(batch_size, #boxes, #classes + 12)`.
                The last axis must contain entries in the format
                `[classes one-hot encoded, 4 predicted box coordinate offsets, 8 arbitrary entries]`.

        Returns:
            A scalar, the total multitask loss for classification and localization.
        '''
        self.neg_pos_ratio = tf.constant(self.neg_pos_ratio)
        self.n_neg_min = tf.constant(self.n_neg_min)
        self.alpha = tf.constant(self.alpha)

        #批量个数
        batch_size = tf.shape(y_pred)[0] # Output dtype: tf.int32
        # 预测框个数
        n_boxes = tf.shape(y_pred)[1] # Output dtype: tf.int32, note that `n_boxes` in this context denotes the total number of boxes per image, not the number of boxes per cell.

        # 1: Compute the losses for class and box predictions for every box.
        # 计算分类和回归的误差 21个类别 4个坐标信息
        classification_loss = tf.to_float(self.log_loss(y_true[:,:,:-12], y_pred[:,:,:-12])) # Output shape: (batch_size, n_boxes)
        localization_loss = tf.to_float(self.smooth_L1_loss(y_true[:,:,-12:-8], y_pred[:,:,-12:-8])) # Output shape: (batch_size, n_boxes)

        # 2: Compute the classification losses for the positive and negative targets.
        # 给正负例计算分类误差
        # Create masks for the positive and negative ground truth classes.
        # 索引0是背景类别,这类置信度为1的是负例
        negatives = y_true[:,:,0] # Tensor of shape (batch_size, n_boxes)
        # 找出分类置信度为1的正例,因为前面说了分类onehot编码了,下表从1开始,也就是不算背景
        positives = tf.to_float(tf.reduce_max(y_true[:,:,1:-12], axis=-1)) # Tensor of shape (batch_size, n_boxes)

        # 计算正例的个数 因为是onehot,刚好值是1,只要累加起来就是算总个数
        # Count the number of positive boxes (classes 1 to n) in y_true across the whole batch.
        n_positive = tf.reduce_sum(positives)

        # Now mask all negative boxes and sum up the losses for the positive boxes per batch item
        # (Keras loss functions must output one scalar loss value per batch item, rather than just
        # one scalar for the entire batch, that's why we're not summing across all axes).
        # 计算正例的分类误差和 onehot的,只会计算对应的类的误差,其他是0
        pos_class_loss = tf.reduce_sum(classification_loss * positives, axis=-1) # Tensor of shape (batch_size,)

        # Compute the classification loss for the negative default boxes (if there are any).

        # First, compute the classification loss for all negative boxes.
        # 所有负例的分类误差
        neg_class_loss_all = classification_loss * negatives # Tensor of shape (batch_size, n_boxes)

        # 负例误差的个数
        n_neg_losses = tf.count_nonzero(neg_class_loss_all, dtype=tf.int32) # The number of non-zero loss entries in `neg_class_loss_all`
        # What's the point of `n_neg_losses`? For the next step, which will be to compute which negative boxes enter the classification
        # loss, we don't just want to know how many negative ground truth boxes there are, but for how many of those there actually is
        # a positive (i.e. non-zero) loss. This is necessary because `tf.nn.top-k()` in the function below will pick the top k boxes with
        # the highest losses no matter what, even if it receives a vector where all losses are zero. In the unlikely event that all negative
        # classification losses are actually zero though, this behavior might lead to `tf.nn.top-k()` returning the indices of positive
        # boxes, leading to an incorrect negative classification loss computation, and hence an incorrect overall loss computation.
        # We therefore need to make sure that `n_negative_keep`, which assumes the role of the `k` argument in `tf.nn.top-k()`,
        # is at most the number of negative boxes for which there is a positive classification loss.

        # Compute the number of negative examples we want to account for in the loss.
        # We'll keep at most `self.neg_pos_ratio` times the number of positives in `y_true`, but at least `self.n_neg_min` (unless `n_neg_loses` is smaller).

        # 负例个数最多正例个数的3倍
        n_negative_keep = tf.minimum(tf.maximum(self.neg_pos_ratio * tf.to_int32(n_positive), self.n_neg_min), n_neg_losses)

        # In the unlikely case when either (1) there are no negative ground truth boxes at all
        # or (2) the classification loss for all negative boxes is zero, return zero as the `neg_class_loss`.
        def f1():
            # 返回全0 (batch_size, )
            return tf.zeros([batch_size])
        # Otherwise compute the negative loss.
        def f2():
            # 返回对应的负例分类损失  形状(batch_size, )
            # Now we'll identify the top-k (where k == `n_negative_keep`) boxes with the highest confidence loss that
            # belong to the background class in the ground truth data. Note that this doesn't necessarily mean that the model
            # predicted the wrong class for those boxes, it just means that the loss for those boxes is the highest.

            # 改变新装,变成1维 (batch_size * n_boxes,)
            # To do this, we reshape `neg_class_loss_all` to 1D...
            neg_class_loss_all_1D = tf.reshape(neg_class_loss_all, [-1]) # Tensor of shape (batch_size * n_boxes,)

            # 获取置信度最大的K个损失和相对应的索引
            # ...and then we get the indices for the `n_negative_keep` boxes with the highest loss out of those...
            values, indices = tf.nn.top_k(neg_class_loss_all_1D,
                                          k=n_negative_keep,
                                          sorted=False) # We don't need them sorted.

            # 创建一个遮罩,形状是负例损失的形状,把对应位置都设置成1,其他都是0
            # ...and with these indices we'll create a mask...
            negatives_keep = tf.scatter_nd(indices=tf.expand_dims(indices, axis=1),
                                           updates=tf.ones_like(indices, dtype=tf.int32),
                                           shape=tf.shape(neg_class_loss_all_1D)) # Tensor of shape (batch_size * n_boxes,)

            # 重新设置形状 形状(batch_size, n_boxes)
            negatives_keep = tf.to_float(tf.reshape(negatives_keep, [batch_size, n_boxes])) # Tensor of shape (batch_size, n_boxes)

            # 将对应是1的负例分类损失的地方乘以分类的损失就是损失,0的地方乘了也是0,然后所有框都的损失求和 形状(batch_size,)
            # ...and use it to keep only those boxes and mask all other classification losses
            neg_class_loss = tf.reduce_sum(classification_loss * negatives_keep, axis=-1) # Tensor of shape (batch_size,)
            return neg_class_loss

        # 根据负例数来判断调用哪个损失函数  负例数为0用f1, 否则用f2 形状(batch_size,)
        neg_class_loss = tf.cond(tf.equal(n_neg_losses, tf.constant(0)), f1, f2)

        # 正例和负例的分类损失相加
        class_loss = pos_class_loss + neg_class_loss # Tensor of shape (batch_size,)

        # 3: Compute the localization loss for the positive targets.
        #    We don't compute a localization loss for negative predicted boxes (obviously: there are no ground truth boxes they would correspond to).

        # 回归损失值要求正例的即可
        loc_loss = tf.reduce_sum(localization_loss * positives, axis=-1) # Tensor of shape (batch_size,)

        # 4: Compute the total loss.

        # 总的回归和分类损失 正例最少也是1
        total_loss = (class_loss + self.alpha * loc_loss) / tf.maximum(1.0, n_positive) # In case `n_positive == 0`
        # Keras has the annoying habit of dividing the loss by the batch size, which sucks in our case
        # because the relevant criterion to average our loss over is the number of positive boxes in the batch
        # (by which we're dividing in the line above), not the batch size. So in order to revert Keras' averaging
        # over the batch size, we'll have to multiply by it.
        # 本来貌似应该是除以N的,但是他说keras有求平均的习惯,我们要把他给还原回来,所以要乘以批量数,
        # 其实我觉得乘不乘对求优化问题来说关系不大
        total_loss = total_loss * tf.to_float(batch_size)

        return total_loss

论文里看起来一个公式就好了,其实编码没那么简单,要考虑一些数据的格式,比如分类要转化为onehot格式,要考虑正负样本的比例不超过3倍,然后回归损失只算正样本。还要注意这里分类数据第一个是背景类,看成负例,还有个小技巧,onehot编码后,统计正例个数只要统计这个编码的和就行。最后他还成了批量个数,说是因为keras内部可能求了平均,不过我觉得损失乘以一个常数关系不大,主要还是loss要下降,作者这么做其实也可以。

好了,今天就到这里了,希望对学习理解有帮助,大神看见勿喷,仅为自己的学习理解,能力有限,请多包涵。

  • 0
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
### 回答1: 要下载SSD(Single Shot MultiBox Detector)目标检测模型的h5文件,首先需要确定keras本。因为Keras从2.4.0本开始已经停止维护,官方建议用户迁移至TensorFlow中的tf.keras。以下是根据tf.keras的使用方法进行的回答。 要下载SSD目标检测模型的h5文件,可以通过TensorFlow官方的GitHub仓库获取。在GitHub上搜索"tensorflow/models",进入该仓库的主页。 进入该仓库后,在仓库的顶部导航栏中,点击"Code"按钮,然后选择"Download ZIP"选项,即可下载该仓库的ZIP压缩文件到本地。 解压缩下载的ZIP文件后,在解压缩后的文件夹中,可以找到ssd目录。在该目录下,可以找到预训练的SSD模型的h5文件,以及与其相关的代码和配置文件。 根据需求,可以选择下载不同本的SSD模型。模型的名称通常会包含网络架构和数据集名称的信息,例如,ssd_mobilenet_v1_coco.h5表示使用MobileNet V1作为基础网络架构,并在COCO数据集上进行训练的模型。 下载所需的h5文件后,可以将其用于目标检测任务。通过加载该h5文件,可以使用tf.keras模型的加载函数来读取模型,并可以在输入图像上进行目标检测。 需要注意的是,SSD模型是一个深度学习模型,通常需要GPU的支持来进行训练和推理。在使用模型之前,需要确保安装了适当的GPU驱动和CUDA、cuDNN等深度学习库,并正确配置了TensorFlow的GPU支持。 总之,通过在TensorFlow的官方GitHub仓库中下载适应于你的keras本的SSD模型的h5文件,你可以开始在目标检测任务中使用这个模型。 ### 回答2: 在Keras中,SSD(Single Shot MultiBox Detector)是一种常用的目标检测模型。如果你想下载SSD模型的H5文件,可以按照以下步骤进行: 1. 打开Keras官方网站(https://keras.io/)。 2. 在网站的顶部导航栏中找到"Models"选项,并点击进入。 3. 在“Models”页面中,搜索框中输入"SSD",点击搜索按钮。 4. 在搜索结果中,找到相应的SSD模型,并点击该模型进入详情页。 5. 在详情页中,你可以找到该模型的各个本的下载链接,包括H5文件。 6. 找到对应本的H5文件下载链接,并点击下载。 7. 下载完成后,你将得到一个包含SSD目标检测模型的H5文件。 请注意,具体的下载方式可能因为Keras官网的更新而有所差异,以上步骤仅供参考。另外,你也可以通过搜索引擎或者Keras相关社区找到可靠的下载来。 ### 回答3: 要下载Keras本的SSD(Single Shot MultiBox Detector)目标检测模型的h5文件,可以按照以下步骤进行: 1. 打开Keras官方网站或GitHub仓库,找到SSD目标检测模型的h5文件下载链接。 2. 点击下载链接,选择保存文件的位置,并等待下载完成。 3. 下载完成后,可以将h5文件保存在指定的目录中,以备后续使用。 4. 如果下载过程中出现问题,可以尝试使用下载工具或其他浏览器进行下载。 SSD是一种常用的目标检测模型,通过结合不同尺度的特征图来检测并定位图像中的多个目标。Keras是一个高级神经网络API,它可以在不同的深度学习框架上运行,如TensorFlow和CNTK。因此,寻找合适的Keras本的SSD模型并下载其h5文件是实现目标检测任务的重要一步。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值