非视距追踪论文阅读笔记：CVPR 2023, Propagate And Calibrate: Real-time Passive Non-line-of-sight Tracking-CSDN博客

本文链接：https://blog.csdn.net/qazwsxrx/article/details/131591182

原文链接：https://mp.weixin.qq.com/s?__biz=Mzg4MjgxMjgyMg==&mid=2247486139&idx=1&sn=8420f3c7e95476f755ea8ebbcc13c86a&chksm=cf51b842f8263154377952245c4a26043fcbbd292fcaac68657669d9226bf83812073b14b32e#rd

CVPR 2023 | Propagate And Calibrate: Real-time Passive Non-line-of-sight Tracking

非视距追踪论文阅读笔记：CVPR 2023, Propagate And Calibrate: Real-time Passive Non-line-of-sight Tracking

0 Abstract

动机
- 现有主动NLOS tracking方法不实用 (due to oversimplified settings)
研究内容
- 提出了一种纯被动方法：通过中继墙跟踪房间中的人
贡献：
- 1 提出了PAC-Net
  
  ✅ 由交替传播和校准组成，使其能够在帧级粒度上利用动态和静态消息
- 2 制作了NLOS Track数据集
  
  ✅ 第一个passive NLOS tracking数据集
  
  ✅ 视频剪辑 + 相应的轨迹
  
  ✅ 包括实拍数据和合成数据

代码+数据：https://againstentropy.github.io/NLOS-Track/

1 Introduction

Passive NLOS tracking:
- 不可见目标的跟踪，如下图
已有工作的缺点：
- 主动式的方法：设备昂贵、不实用
- 算法层面：（对所有已有方法）
  
  ❌ 在每个帧中独立定位对象 (没有考虑相邻帧时刻之间的位置关系) $\Rightarrow$ 直接导致轨迹抖动，从而导致跟踪不准确
本文：
- 考虑了利用运动信息和利用运动连续性先验的重要性
  
  🚩 有助于获得更连贯和准确的跟踪结果
- 解决了信噪比低的问题
  
  ❌ 之前的方法：用视频的时间平均值进行背景估计，并对每一帧应用背景减法 $\Rightarrow$ 能够放大帧间的差异，从而提高信噪比
  
  ❌ 时间均值减法缺点：inevitably mixes up information from early period, 将额外的噪声重新引入到原本低信噪比的信号中，这仍然不利于挖掘帧之间的微弱差异
- 本文的具体解决方案：
  
  ✅ 使用差异帧，表示即时的运动信息，而不会引入其他周期的噪声 $\Rightarrow$ 实验表明，差异帧确实传达了基本的动态消息
  
  ✅ 运动连续性先验的集成：PAC-Net 由两个双模块组成 $\Rightarrow$ Propagation-Cell 和 Calibration-Cell，通过用不同帧传播然后用原始帧交替校准来保持轨迹的良好连续性
- 实验结果：行走时厘米级的精度
制作的数据集：NLOS-Track
- 首个passive NLOS tracking数据集
- 视频剪辑 + 相应的轨迹
- 包括500个实拍视频和1000多个合成视频
贡献总结：
- 1 propose and formulate the purely passive NLOS tracking task
- 2 提出了一种被动 NLOS 跟踪网络 PACNet，它能够在帧级别上利用动态和静态消息。
- 3 建立了第一个被动NLOS 轨迹跟踪数据集NLOS-Track (含数千个具有各种场景设置的视频剪辑)

2 Related Work

Passive NLOS
- 本文跟踪方法相比于其他被动跟踪的特点：没有引入任何额外的结构或特殊设备
- 本文：一个可见的空白墙和一个传统的 RGB 相机，即可实时跟踪
Active NLOS localization and tracking
- 依赖主动照明和特殊设备
NLOS datasets
- 提出了首个passive NLOS tracking数据集

3 Problem Formulation and Signal Extraction

3.1 NLOS Tracking Problem

被动成像：
- $I=\mathcal{F}(\vec{x}, \Theta)$
- $\mathcal{F}$ : imaging function; $\vec{x}$ : position of a person; $\Theta$ : scene configuration; $I$ : photo of the relay wall
- $\mathcal{F}$ : 压缩隐藏区域内的光场并将其投射到中继墙上
场景的变化（如人的位置）会引起中继墙上的影子相应变化:
- 数学上，可用偏导数进行表示：
- $\frac{\partial I}{\partial \vec{x}}=\frac{\partial \mathcal{F}(\vec{x}, \Theta)}{\partial \vec{x}}$
网络的目标：
- 输入 relay wall上离散的观察： $\left\{I_0, \ldots, I_t, \ldots\right\}$ (raw frames of a video)
- 功能：find an inverse imaging function $\mathcal{F}^{-1}$
- 目标：重建出the causes (人的轨迹) $\left\{x_0, \ldots, x_t, \ldots\right\}$

3.2 Difference Frames

已有工作的缺点：忽略了跟踪任务中的运动信息，而它可以在指导跟踪过程中发挥重要作用。
如何提取运动信息？ $\Rightarrow$ 使用差分：

$\begin{aligned}\left.\frac{\Delta I}{\Delta \vec{x}}\right|_t & \left.\approx \frac{\partial \mathcal{F}(\vec{x}, \Theta)}{\partial \vec{x}}\right|_t \\ \Longrightarrow I_{t+1}-I_t=\Delta I_t & \left.\approx \frac{\partial \mathcal{F}(\vec{x}, \Theta)}{\partial \vec{x}}\right|_{\vec{x}=\vec{x}_t} \Delta \overrightarrow{x_t} \\ & =\mathcal{G}\left(\overrightarrow{x_t}, \Delta \overrightarrow{x_t}, \Theta\right),\end{aligned}$

$\Delta I_t$ : difference frame，获得方法如下：
$\mathcal{G}$ : the imaging function of the difference frame
因此，利用了difference frames, 就可以 leverage dynamic motion information beyond static
positions
- significantly benefits the NLOS tracking task
使用 difference frame的另一个好处：干净
- 背景帧估计和减法虽然可以提高信噪比，但这种做法不可避免地将其他时间的信息引入每一帧，从而使静态信息“脏”

4 Tracking Method

PAC-Net: Propagation And Calibration Network
- Two streams:
  
  🚩 Raw frame stream ${I_t}$ $\Rightarrow$ static position information
  
  🚩 Difference frame stream ${\Delta I_t}$ $\Rightarrow$ dynamic motion information
- 两个流的结合：通过交替传播和校准来保持轨迹的良好连续性
- PCA-Net: integrates the motion continuity prior to its workflow with a specially designed alternating recurrent architecture.

4.1 PAC-Net

网络结构如下：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-5pD2BAT1-1688696119108)(http://qiniu.ruixu.top/b2bd6f477785bc7d508672bb466d368f4bcb9483d443c533cd4243cbedec871d.png)]

设计理念：
- 以交替的方式处理差分流和原始流，即传播和校准
- PAC-Cell 内，有两个对称的单元：传播单元（Propagation-Cell，简称 P-Cell）和校准单元（Calibration-Cell，简称 C-Cell）
P-cell
- 使用差分帧将两个观测值之间的隐藏状态进行传播
C-cell
- 原始帧引入绝对位置信息, C-Cell校准隐藏状态
网络结构：
- 使用 GRU cell 作为循环单元，ResNet-18 作为特征提取器
损失函数：
- 由位置误差 $loss_x$ 和速度（位移）误差 $loss_v$ 组成
- 均方误差（MSE）的方式进行计算
- 引入了调整参数 $\alpha_v$ 以控制 $loss_v$ 的权重
  
  $\begin{aligned} \text { Loss } & =\operatorname{loss}_x+\alpha_v \cdot \operatorname{loss}_v \\ & =\operatorname{MSE}\left(\left\{\tilde{\vec{x}}_t\right\},\left\{\vec{x}_t\right\}\right)+\alpha_v \cdot \operatorname{MSE}\left(\left\{\Delta \tilde{\vec{x}}_t\right\},\left\{\Delta \vec{x}_t\right\}\right),\end{aligned}$

4.2 Warm-up

预热阶段的动机：
- 在视频流开始之前，模型对隐藏状态 $\mathbf{h}$ 的 GRU cell 进行零初始化
- 如果没有预热，跟踪轨迹在早期步骤中偏离真实轨迹并随着时间逐渐收敛
预热阶段：
- 将预热阶段从原始跟踪过程中分离出来，并在 PAC-Net 中建立两个独立的 PAC-Cells
- 第一个称为“预热 PAC-Cell”，负责将隐藏状态 $\mathbf{h}$ 从零初始化拉到合理的分布范围内
- PAC-Cell（跟踪 PAC-Cell）则专注于准确跟踪，通过将每个后续帧编码为更精确的嵌入
注意：
- if there is a Warm-up Stage, \ie, $W > 0$ ,仅在Tracking Stage use the inferred trajectory to supervise the model training.

5 Experiments

5.1 NLOS-Track dataset

NLOS-Track 专注于拟合逼真的动态场景 / 使用真实的数据
真实数据：
- 两个相机拍摄
- 一个拍摄中继面（测量数据）
- 另一个俯视拍摄，获得轨迹
仿真数据
- 基于Adobe的免费动画平台Mixamo
- 使用A100进行render

其他设置
- 角色随机
- 使用了不同的衣服
- 更改光源位置、亮度和相机位置

5.2 Metrics

RMS
- 均方根误差

三个衡量曲线之间相似度的度量：

Area
- 两条曲线之间的面积 area
DTW
- 动态时间扭曲
PCM
- 部分区县映射

5.3 Results

Compared with baseline methods:
- vanilla CNN
- C-Net (without dynamic motion information)
- P-Net (without static position information)

验证warm-up的有效性：

定量比较

6 Limitations and Future work

仅限于二维室内跟踪
- 不适用于野外
- 不适用于三维跟踪
仅限于单目标跟踪
- 不适用于多目标跟踪
半监督/自监督的NLOS跟踪

7 Conclusion

propose and formulate the task of real-time passive NLOS tracking
- difference frame
PCA-Net
- maintaining good continuity and stability via processing raw frames and difference frames alternately
- warm-up
NLOS-Track dataset
- first passive NLOS tracking dataset
- 500 real videos and 1000+ synthetic videos