(self-supervised learning)Event Camera Data Pre-training

Publisher: ICCV 2023 

MOTIVATION OF READING: 自监督学习、稀疏事件 = NILM

link: https://arxiv.org/pdf/2301.01928.pdf

Code: GitHub - Yan98/Event-Camera-Data-Pre-training


1. Overview

Contributions are summarized as follows:

1. A self-supervised framework for event camera data pre-training. The pre-trained model can be transferred to diverse downstream tasks;

2. A family of event data augmentations, generating meaningful event images;

3. A conditional masking strategy, sampling informative event patches for network training;

4. An embedding projection loss, using paired RGB embeddings to regularize event embeddings to avoid model collapse;

5. A probability distribution alignment loss for aligning embeddings from the paired event and RGB images.

6. We achieve state-of-the-art performance in standard event benchmark datasets.

2. Related work

The SSL frameworks can be generally divided into two categories: contrastive learning and masked modeling.

2.1 Contrastive learning

This approach generally assumes augmentation invariance of images. one notable drawback
of contrastive learning is suffering from model collapse and training instability.

2.2 Masked modeling

Reconstructing masked inputs from the (i. e., unmasked) visible ones is a popular selfsupervised
learning objective motivated by the idea of autoencoding. (Bert, GPT)

3. Methodology

For pre-training, our method takes event data E and its paired natural RGB image I as inputs, and outputs a pre-trained network fe.

Firstly, consecutively perform data augmentations, event image generation, and conditional masking to obtain two patch sets (xq, xk).

Secondly, fe extracts features from event patch set xq, and he_img and he_evt separately project features from fe to latent embeddings q_img and q_evt.

fm and hm_evt are the momentum of fe and he_evt, and are updated by the exponential moving average (EMA). (momentum的含义可以参考MOCO论文)

The momentum network takes patch set xk as input and generates an embedding k_evt.

At the same time, the natural RGB image I is embeded into y = f1(h1(I)).

Finally, we perform event discrimination, and event and natural RGB image discrimination to train our model. 这里不用INFONCE直接对q_evt和k_evt进行相似度计算是因为这么做会导致embedding collapse使得embedding过于相似。原因是事件图像是稀疏离散。因此使用RGB图像的映射。

L_evt is an event embedding projection loss aiming to pull together paired event embeddings qevt and kevt, for event discrimination.

L_RGB aims to pull together paired event and RGB embeddings q_evt and y, for event and natural RGB image discrimination.

L_k1 aims to drive fe learning discriminative event embeddings, towards well-structured embedding space of natural RGB images.


InfoNCE loss Contrastive learning aims to pull together embeddings q and k+, and pushes away embeddings q and {k−}.

Event embedding projection loss

ζ(v1, v2) is the projection function.

Event and RGB image discrimination

Considering the sparsity of the event image, a single event image is less informative than an RGB image, possessing difficulty for self-supervised event network training.

We pull together embeddings of paired event and RGB images, xq and I.

we first compute the pairwise embedding similarity and then fit an exponential kernel to the similarities to compute probability scores. The probability score of the (i, j)-th pair is given by,

Our probability distribution alignment loss is given by,

Total Loss

where λ1 is a hyper-parameter for balancing the losses.

4. Experiment

We evaluate our method on three downstream tasks: object recognition, optical flow estimation, and semantic segmentation.

  • 19
    点赞
  • 22
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
自我监督学习是一种机器学习方法,通过对数据进行合理的预测任务,从中获得有用的表示。与传统的监督学习不同,自我监督学习不需要人工标注的标签来指导训练,而是利用数据自身的信息进行训练。 自我监督学习的基本思想是从未标记的数据中构造有意义的标签,然后将这些标签用作训练数据,以学习有用的特征表示。通过对输入数据进行某种形式的变换或遮挡,可以生成一对相关的样本。其中一个样本称为正样本,另一个则被视为负样本。例如,在图像领域中,可以通过将图像进行旋转、裁剪或遮挡等变换来生成正负样本对。模型的目标是通过学习从一个样本到另一个样本的映射,从而使得正样本对之间的相似度更高,负样本对之间的相似度更低。 自我监督学习在许多任务中都取得了很好的效果。例如,在自然语言处理任务中,可以通过遮挡句子中的某些单词或短语来生成正负样本对,然后通过学习从一个句子到另一个句子的映射来进行训练。在计算机视觉任务中,可以通过图像的旋转、裁剪、遮挡或色彩变换等方式来生成正负样本对。 自我监督学习的优点是不需要人工标注的标签,可以利用大量的未标记数据来进行训练,从而扩大训练数据的规模。此外,自我监督学习还可以通过学习到的特征表示来提高其他任务的性能,如分类、目标检测和语义分割等。 总之,自我监督学习是一种有效的无监督学习方法,通过构造有意义的预测任务,从未标记的数据中学习有用的特征表示。它在各种任务中都有广泛的应用,并具有很高的潜力。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值