论文笔记《Combining Events and Frames Using Recurrent Asynchronous Multimodal Networks for Monocular ...》

最新推荐文章于 2024-10-17 12:25:05 发布

紫金山赵火龙

最新推荐文章于 2024-10-17 12:25:05 发布

阅读量3.3k

点赞数 11

分类专栏： DVS 所有文章标签：深度学习计算机视觉人工智能 DVS

本文链接：https://blog.csdn.net/qq_26751117/article/details/122325560

版权

所有同时被 2 个专栏收录

20 篇文章

订阅专栏

DVS

1 篇文章

订阅专栏

博客作者在尝试复现DVS（事件视觉传感器）相关算法时，遇到了开源代码不完整的问题，包括缺失的数据转换和配置文件错误。通过手动转换事件数据到voxel格式，并调整配置文件参数以匹配EventScape数据集，作者成功运行了测试和评估代码，验证了结果与论文中报告的误差一致。此外，还解决了代码中因Python包版本引发的错误。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

最近在学习DVS相关的算法，读到这篇把event和frame结合的论文，找到了他们在github上开源的代码。

然而这个work的开源并不完全，包括这篇work前面的《Learning Monocular Dense Depth from Events》，开源的代码实际上隐藏了大量的东西。我本来是想复现这一篇的，结果发现所有训练的细节都被隐藏掉，用新repo里的代码不能跟他提供的权重匹配，要复现的话整个训练代码要重写，所以选择了他们最新的代码几乎完整的工作。

这个repo整体上代码比较完整，主要问题有2点。

一是提供的EventScape数据集里面，event文件夹里并没有voxel格式的数据，需要自己手动转换生成，具有相关功能的函数其实在repo里面是有的（utils/event_tensor_utils.py），只是需要自己操作一下。这里提供我用的代码，作用就是在events文件夹下生成相应的voxel文件夹。

from utils.event_tensor_utils import *
import cv2
import os
import shutil


def convert_from_event_to_voxel(path: str):
    files = os.listdir(path)
    width = 512
    height = 256
    data_path = os.path.join(path, 'data')
    boundary_list = []
    timestamp_list = []
    new_path = os.path.join(path, 'voxels')
    if not os.path.exists(new_path):
        os.mkdir(new_path)
    # t0 = -1
    i = 0
    for root, dirs, files in os.walk(data_path):
        for file in files:
            file_name = os.path.join(root, file)
            if file == 'boundary_timestamps.txt':
                shutil.copyfile(file_name, os.path.join(new_path, file))
                f = open(file_name, 'r')
                lines = f.readlines()
                for line in lines:
                    boundary_list.append(line.strip().split(' '))
            elif file == 'timestamps.txt':
                shutil.copyfile(file_name, os.path.join(new_path, file))
                f = open(file_name, 'r')
                lines = f.readlines()
                for line in lines:
                    timestamp_list.append(line.strip().split(' '))
            else:
                events = np.load(file_name)
                x, y, p, t = events['x'], events['y'], events['p'], events['t']
                # if t0 < 0:
                #     t0 = t[0]
                # t = t - t0
                t = t / 1e6
                single_np_event = np.vstack([t, x, y, p]).T
                voxel = events_to_voxel_grid(single_np_event, 5, width, height)
                # np.save(os.path.join(new_path, '05_{:03d}_{:04d}_voxel.npy'.format(name_idx, i)), voxel)
                new_file = file.replace('events', 'voxel').replace('npz', 'npy')
                np.save(os.path.join(new_path, new_file), voxel)
                i += 1
                pass


if __name__ == '__main__':
    file_path = '/data/EventScape/Town05_val/'
    for f in os.listdir(file_path):
        idx = f.split('_')[-1]
        dir = os.path.join(file_path, f) + '/events'
        convert_from_event_to_voxel(dir)
        print(dir)

第二个问题在于该repo提供的配置文件train_e2depth_si_grad_loss_statenet_ergb.json里面参数设置是错的。文章中有提到过

For EventScape we choose α = 5.7 and Dmax = 1000 m whereas for MVSEC we choose α = 3.7 and Dmax = 80 m.

所以配置文件里的两个参数应该是训练MVSEC数据集时用的，而不是EventScape数据集用的，我的修改如图。这里估计是为了MVSEC预训练改了参数忘记改回来了。

修改配置文件之后，可以运行test.py文件进行测试。参数列表为

--path_to_model
./checkpoints/ramnet_sim.pth.tar
--output_path
./test_output
--data_folder
D:/datasets/EventScape/Town05_test
--config
configs/train_e2depth_si_grad_loss_statenet_ergb.json

然后运行evaluation.py文件进行评价。参数列表为

--target_dataset
D:/code/rpg_ramnet/RAM_Net/test_output/ground_truth/npy/depth_image
--predictions_dataset
D:/code/rpg_ramnet/RAM_Net/test_output/npy/image
--clip_distance
1000
--reg_factor
5.70378

最后输出的结果如下。可以看到跟原论文中0.198的error一致。

D:\ProgramData\Anaconda3\envs\torch1_7\python.exe D:/code/rpg_ramnet/RAM_Net/evaluation.py --target_dataset D:/code/rpg_ramnet/RAM_Net/test_output/ground_truth/npy/depth_image --predictions_dataset D:/code/rpg_ramnet/RAM_Net/test_output/npy/image --clip_distance 1000 --reg_factor 5.70378
len of prediction files 4986
len of target files 4986
D:/code/rpg_ramnet/RAM_Net/test_output/npy/image
D:/code/rpg_ramnet/RAM_Net/test_output/ground_truth/npy/depth_image
100%|██████████| 4986/4986 [09:10<00:00,  9.06it/s]
_abs_rel_diff : 0.197506
_squ_rel_diff : 2.963319
_RMS_linear : 69.756133
_RMS_log : 0.354825
_SILog : 0.116411
_mean_depth_error : 13.876676
_median_diff : 4.526404
_threshold_delta_1.25 : 0.790054
_threshold_delta_1.25^2 : 0.890809
_threshold_delta_1.25^3 : 0.945827
_10_abs_rel_diff : 0.109038
_10_squ_rel_diff : 5.124024
_10_RMS_linear : 3.889526
_10_RMS_log : 0.127677
_10_SILog : 0.024356
_10_mean_depth_error : 0.607854
_10_median_diff : 0.235457
_10_threshold_delta_1.25 : 0.955517
_10_threshold_delta_1.25^2 : 0.976612
_10_threshold_delta_1.25^3 : 0.988100
_20_abs_rel_diff : 0.152294
_20_squ_rel_diff : 5.031413
_20_RMS_linear : 11.796929
_20_RMS_log : 0.191990
_20_SILog : 0.037839
_20_mean_depth_error : 1.477377
_20_median_diff : 0.358616
_20_threshold_delta_1.25 : 0.903873
_20_threshold_delta_1.25^2 : 0.956314
_20_threshold_delta_1.25^3 : 0.981820
_30_abs_rel_diff : 0.184544
_30_squ_rel_diff : 4.910906
_30_RMS_linear : 18.877968
_30_RMS_log : 0.231542
_30_SILog : 0.048849
_30_mean_depth_error : 2.527397
_30_median_diff : 0.493752
_30_threshold_delta_1.25 : 0.864171
_30_threshold_delta_1.25^2 : 0.937585
_30_threshold_delta_1.25^3 : 0.976516
_80_abs_rel_diff : 0.237754
_80_squ_rel_diff : 4.390356
_80_RMS_linear : 32.252956
_80_RMS_log : 0.309250
_80_SILog : 0.073911
_80_mean_depth_error : 6.420871
_80_median_diff : 0.919114
_80_threshold_delta_1.25 : 0.774259
_80_threshold_delta_1.25^2 : 0.891655
_80_threshold_delta_1.25^3 : 0.953438
_250_abs_rel_diff : 0.246407
_250_squ_rel_diff : 4.181770
_250_RMS_linear : 36.837379
_250_RMS_log : 0.330677
_250_SILog : 0.081939
_250_mean_depth_error : 8.593698
_250_median_diff : 1.186443
_250_threshold_delta_1.25 : 0.750698
_250_threshold_delta_1.25^2 : 0.876096
_250_threshold_delta_1.25^3 : 0.944128
_500_abs_rel_diff : 0.248392
_500_squ_rel_diff : 4.145463
_500_RMS_linear : 40.929285
_500_RMS_log : 0.339089
_500_SILog : 0.085610
_500_mean_depth_error : 9.872512
_500_median_diff : 1.230972
_500_threshold_delta_1.25 : 0.745905
_500_threshold_delta_1.25^2 : 0.871389
_500_threshold_delta_1.25^3 : 0.940171
----------------------------------------------
0.197506
2.963319
69.756133
0.354825
0.116411
13.876676
4.526404
0.790054
0.890809
0.945827
0.109038
5.124024
3.889526
0.127677
0.024356
0.607854
0.235457
0.955517
0.976612
0.988100
0.152294
5.031413
11.796929
0.191990
0.037839
1.477377
0.358616
0.903873
0.956314
0.981820
0.184544
4.910906
18.877968
0.231542
0.048849
2.527397
0.493752
0.864171
0.937585
0.976516
0.237754
4.390356
32.252956
0.309250
0.073911
6.420871
0.919114
0.774259
0.891655
0.953438
0.246407
4.181770
36.837379
0.330677
0.081939
8.593698
1.186443
0.750698
0.876096
0.944128
0.248392
4.145463
40.929285
0.339089
0.085610
9.872512
1.230972
0.745905
0.871389
0.940171
total metrics:  [6.27290632e+03 1.97505700e-01 5.88659272e+03 3.79455030e-01
 1.38766762e+01 6.97561326e+01]

Process finished with exit code 0

这里面有些参数的含义还是没有搞懂，但是算出的指标和文章里一致。如果我的方法有问题的话，欢迎大家评论留言。

补充：

代码中还存在一些Python包版本问题导致的错误。

model/metric.py文件中第2行报错

from skimage.measure import compare_ssim as ssim

改为

from skimage.metrics import structural_similarity as ssim

utils/training_utils.py文件中第99-101行，改为

ave_grads.append(lr*p.grad.abs().mean().cpu())
max_grads.append(lr*p.grad.abs().max().cpu())
min_grads.append(lr*p.grad.abs().min().cpu())

即在matplotlib画图之前把数据转到cpu上，否则容易报错。