【6PACK代码注解】测试过程eval.py

momo_vv

已于 2022-08-06 13:17:08 修改

阅读量321

点赞数

分类专栏： 6pack 文章标签：人工智能深度学习计算机视觉

于 2022-08-04 10:28:34 首次发布

本文链接：https://blog.csdn.net/weixin_44695308/article/details/126073809

版权

6pack 专栏收录该内容

9 篇文章 2 订阅

订阅专栏

文章目录

前言
一、参数设置
二、评估过程
- 2.1 首帧处理
- 2.2 后续帧处理

前言

【6PACK全记录】6-PACK论文学习及复现记录

一、参数设置

choose_cate_list = [1,2,3,4,5,6]
resume_models = ['model_112_0.184814792342484_bottle.pth',#每个类别训练后最优的模型
                 'model_120_0.10268432162888348_bowl.pth',
                 'model_118_0.2008235973417759_camera.pth', 
                 'model_107_0.18291547849029302_can.pth',
                 'model_117_0.12762234719470145_laptop.pth',
                 'model_102_0.1468337191492319_mug.pth']

parser = argparse.ArgumentParser()
parser.add_argument('--dataset_root', type=str, default = 'My_NOCS', help='dataset root dir')
parser.add_argument('--eval_id', type=int, default = 1, help='the evaluation id')#第几次评估
parser.add_argument('--ite', type=int, default=10, help='first frame fix iteration')#第1帧迭代几次
parser.add_argument('--num_kp', type=int, default = 8, help='num of kp')
parser.add_argument('--num_points', type=int, default = 500, help='num of input points')
parser.add_argument('--num_cates', type=int, default = 6, help='number of categories')
parser.add_argument('--outf', type=str, default = 'models/', help='load model dir')
opt = parser.parse_args()

if not os.path.exists('eval_results'):
    os.makedirs('eval_results')#创建目录

if not os.path.exists('eval_results/TEST_{0}'.format(opt.eval_id)):
    os.makedirs('eval_results/TEST_{0}'.format(opt.eval_id))
    for item in choose_cate_list:
        os.makedirs('eval_results/TEST_{0}/temp_{1}'.format(opt.eval_id, item))

其中

- resume_models：经过训练，各类别最优模型
- eval_id：第几次评估
- ite：对初始帧迭代几次来优化RT

二、评估过程

以下过程均在循环for choose_cate in choose_cate_list:中，即对所有类别都有相同的处理。

一些参数的定义

model = KeyNet(num_points = opt.num_points, num_key = opt.num_kp, num_cates = opt.num_cates)
    model.cuda()
    model.eval()

    model.load_state_dict(torch.load('{0}/{1}'.format(opt.outf, resume_models[choose_cate-1])))#加载对应的最优模型

    pconf = torch.ones(opt.num_kp) / opt.num_kp#【8】均为0.125
    pconf = Variable(pconf).cuda()

    #与train不同，这里是dataset.eval_dataset_nocs
    test_dataset = Dataset('val', opt.dataset_root, False, opt.num_points, choose_cate, 1000)
    criterion = Loss(opt.num_kp, opt.num_cates)

    #文件内一行例如1 bottle_red_stanford_norm scene_4分别对应评估类别、所选实例、所选视频
    eval_list_file = open('dataset/eval_list/eval_list_{0}.txt'.format(choose_cate), 'r')
    while 1:
        input_line = eval_list_file.readline()
        if not input_line:
            break
        if input_line[-1:] == '\n':
            input_line = input_line[:-1]
        _, choose_obj, choose_video = input_line.split(' ')
        #文件内一行例如1 bottle_red_stanford_norm scene_4分别对应评估类别、所选实例、所选视频

其中eval_list_file内容如下图所示，故例如第一行，choose_obj为bottle_red_stanford_norm，choose_video为scene_4
在这里插入图片描述

2.1 首帧处理

try:
            #目标视频中出现目标实例的第一帧的位姿
            current_r, current_t = test_dataset.getfirst(choose_obj, choose_video)
            rad_t = np.array([random.uniform(-0.02, 0.02) for i in range(3)]) * 1000.0#3个[-20，20]间的随机值
            current_t += rad_t#为了与其他方法公平对比，给第一帧的位姿添加4cm的平移噪声，让模型自调整误差

            if opt.ite != 0:
                min_dis = 1000.0
                for iterative in range(opt.ite):  
                    img_fr, choose_fr, cloud_fr, anchor, scale = test_dataset.getone(current_r, current_t)
                    img_fr, choose_fr, cloud_fr, anchor, scale = Variable(img_fr).cuda(), \
                                                         Variable(choose_fr).cuda(), \
                                                         Variable(cloud_fr).cuda(), \
                                                         Variable(anchor).cuda(), \
                                                         Variable(scale).cuda()
                    #img：color_crop
                    #choose：点云idx  cloud：归一化点云
                    #anchor：锚点网格(世界坐标)   scale：归一化系数
                    Kp_fr, att_fr = model.eval_forward(img_fr, choose_fr, cloud_fr, anchor, scale, 0.0, True)
                    new_t, att, kp_dis = criterion.ev_zero(Kp_fr[0], att_fr[0])
                    #new_t：8个关键点的坐标均值，3个数
                    #att:att_fr[0]
                    #kp_dis：new_t向量长度，即8个关键点质心向量长度

                    if min_dis > kp_dis:
                        min_dis = kp_dis
                        best_current_r = copy.deepcopy(current_r)
                        best_current_t = copy.deepcopy(current_t)
                        best_att = copy.deepcopy(att)
                        print(min_dis)

                    current_t = current_t + np.dot(new_t, current_r.T)
                current_r, current_t, att = best_current_r, best_current_t, best_att

上述代码中for iterative in range(opt.ite)部分将重复ite次，每次都只读取目标视频中出现目标实例的第一帧。通过逐次修正位姿R、T，使得关键点的中心（各坐标平均）距离原点最近(对应变量min_dis)，以减小平移噪声的影响。

			img_fr, choose_fr, cloud_fr, anchor, scale = test_dataset.getone(current_r, current_t)
            img_fr, choose_fr, cloud_fr, anchor, scale = Variable(img_fr).cuda(), \
                                                 Variable(choose_fr).cuda(), \
                                                 Variable(cloud_fr).cuda(), \
                                                 Variable(anchor).cuda(), \
                                                 Variable(scale).cuda()
            #未经过projection，此处还是第一帧
            Kp_fr, att_fr = model.eval_forward(img_fr, choose_fr, cloud_fr, anchor, scale, 0.0, True)

            test_dataset.projection('eval_results/TEST_{0}/temp_{1}/{2}_{3}'.format(opt.eval_id, choose_cate, choose_obj, choose_video), Kp_fr[0], current_r, current_t, scale, att_fr[0], True, 0.0)
            #保存第一帧RT，对下一帧操作
            min_dis = 0.0005

该部分通过调用eval_dataset_nocs.py中的projection()函数，将初始帧的位姿R、T写入文件。

2.2 后续帧处理

该部分代码结构与2.1 相似，差别在于函数的参数变动。先给出完整代码实现：

while 1:
                img_fr, choose_fr, cloud_fr, anchor, scale = test_dataset.getone(current_r, current_t)
                img_fr, choose_fr, cloud_fr, anchor, scale = Variable(img_fr).cuda(), \
                                                     Variable(choose_fr).cuda(), \
                                                     Variable(cloud_fr).cuda(), \
                                                     Variable(anchor).cuda(), \
                                                     Variable(scale).cuda()
                Kp_to, att_to = model.eval_forward(img_fr, choose_fr, cloud_fr, anchor, scale, min_dis, False)
                #Kp_to：加入不同平移噪声，共27组8个关键点
                #att_to：27组最优锚点的idx

                min_dis = 1000.0
                lenggth = len(Kp_to)
                for idx in range(lenggth):
                    Kp_real, new_r, new_t, kp_dis, att = criterion.ev(Kp_fr[0], Kp_to[idx], att_to[idx])
                    #Kp_real：Kp_fr[0]，初始帧的关键点
                    #new_r, new_t：由初始帧关键点和当前帧关键点推测出的当前帧的位姿RT（是两帧间的相对值，不是相机系与世界系之间）
                    #kp_dis：预测误差(RT反推初始帧坐标与真实坐标的距离)
                    #att：当前帧目标锚点idx

                    #保留误差最小的一组关键点、RT、锚点
                    if min_dis > kp_dis:
                        best_kp = Kp_to[idx]
                        min_dis = kp_dis
                        best_r = new_r
                        best_t = new_t
                        best_att = copy.deepcopy(att)
                print(min_dis)

                current_t = current_t + np.dot(best_t, current_r.T)#将两帧间位姿转为实际物体位姿
                current_r = np.dot(current_r, best_r)

                test_dataset.projection('eval_results/TEST_{0}/temp_{1}/{2}_{3}'.format(opt.eval_id, choose_cate, choose_obj, choose_video), Kp_real, current_r, current_t, scale, best_att, True, min_dis)

                print("NEXT FRAME!!!")

首先，对于非首帧，Kp_to, att_to = model.eval_forward(img_fr, choose_fr, cloud_fr, anchor, scale, min_dis, False)调用函数时，min_dis不为0，而是前一帧的最小误差。由此将得到27组关键点Kp_to和27组对应的目标锚点idx：att_to 。
for循环中Kp_real, new_r, new_t, kp_dis, att = criterion.ev(Kp_fr[0], Kp_to[idx], att_to[idx])调用loss.py中的ev函数，该函数通过当前帧和初始帧的关键点坐标预测两帧间的位姿变化new_r、t，再由RT和当前帧反推初始帧的点坐标(预测值)，将其与真实坐标的距离定义为预测误差kp_dis。在将位姿变换转换为实际物体位姿current_r、t。
每一帧的27组关键点都做相同操作后，调用projection函数保存该帧最优RT和最小误差，然后更新index，对下一帧操作。