monodepth2代码详解：def process_batch(self, inputs):

本文链接：https://blog.csdn.net/u011059143/article/details/133135686

输入：inputs

第一，将inputs数据载入到gpu中，

for key, ipt in inputs.items():
    inputs[key] = ipt.to(self.device)

第二，选择pose估计网络的方式，此时默认的是separate_resnet。网络输入的是扩张后的原始图像。

if self.opt.pose_model_type == "shared":
    # If we are using a shared encoder for both depth and pose (as advocated
    # in monodepthv1), then all images are fed separately through the depth encoder.
    all_color_aug = torch.cat([inputs[("color_aug", i, 0)] for i in self.opt.frame_ids])
    all_features = self.models["encoder"](all_color_aug)
    all_features = [torch.split(f, self.opt.batch_size) for f in all_features]

    features = {}
    for i, k in enumerate(self.opt.frame_ids):
        features[k] = [f[i] for f in all_features]

    outputs = self.models["depth"](features[0])
else:
    # 这里表示只输入了原始帧，因为是预测当前帧 2023年9月20日
    # Otherwise, we only feed the image with frame_id 0 through the depth encoder
    features = self.models["encoder"](inputs["color_aug", 0, 0])
    outputs = self.models["depth"](features)

features的数据格式如下：

features[0].data.data (12, 64, 96, 320)

outputs的数据格式如下：

outputs[('disp', 0)] (12, 1, 192, 640)

第三，估计predictive_mask，当前默认False。

if self.opt.predictive_mask:
    outputs["predictive_mask"] = self.models["predictive_mask"](features)

第四，使用posenet，更新outputs中的外参

if self.use_pose_net:
    outputs.update(self.predict_poses(inputs, features))

更新后的outputs：

可以看到有两对相机参数，分别是0和-1帧，0和1帧。

第五，产生预测和计算loss

self.generate_images_pred(inputs, outputs)
losses = self.compute_losses(inputs, outputs)

loss的格式如下：