【点云阅读笔记】Learned Point Cloud Geometry Compression

Jonathan_Paul 10

已于 2022-03-19 11:24:29 修改

阅读量1.9k

点赞数 3

分类专栏：点云阅读笔记文章标签：深度学习机器学习计算机视觉

于 2022-03-18 11:56:24 首次发布

本文链接：https://blog.csdn.net/weixin_43444175/article/details/123570884

版权

点云阅读笔记专栏收录该内容

10 篇文章 10 订阅

订阅专栏

本文介绍了一种基于深度学习的点云编码方法，LearnedPointCloudGeometryCompression (PCGC)，利用超先验知识改善潜在特征建模，通过3D CNN实现高效的空间相关性探索。模型轻量化，具有更好的并行性和性能，涉及预处理、编码器结构、量化策略及熵率优化。

摘要由CSDN通过智能技术生成

阅读报告——Learned Point Cloud Geometry Compression

本文提出了一个具有端到端（End-to-end）超先验知识（Learned, hyperprior ）的PCGC的模型。本文所描述的成果参数少、性能好，并且可以有更好并行性。

Worth noticing in Intro

超先验信息被用来提升潜在特征的条件概率模型性能（Hyperpriors are used to improve the conditional probability modeling of latent features. ）。本文使用variational autoencoder (VAE)来利用超先验信息。
end-to-end已经被证明有更好的率失真表现。

Model & Structure

在这里插入图片描述

大致的框架和昨天所介绍的【点云阅读笔记】Point Cloud Coding: Adopting a Deep Learning-based Approach 并大致相同，但是可能这里会更加地详细而已。

Pre-processing

在这里插入图片描述

在预处理中，经过了体素化、缩放、划分。

体素化其实就是将其分为一个又一个的小正方体。

同时，我们要引入一个概念：精度。对于一个 $(i, j, k)$ 的体素，其精度为每个维度中可达到的最大值（Point cloud precision sets the maximum achievable value in each dimension）。例如，10b的精度，则 $\leq i, j, k \leq 2^{10}-1$ 。

至于缩放，这里的缩放是直接将其线性乘。这个缩放因子 $s$ 是 $s < 1$ 的，它是一种下采样。（原文还提出了一个future research 说，可以在神经网络内考虑采用一个适应性的缩放因子）

$\begin{aligned}\hat{\mathbf{X}}_{n} &=\operatorname{ROUND}\left(\mathbf{X}_{n} \times s\right) \\&=\operatorname{ROUND}\left(i_{n} \times s, j_{n} \times s, k_{n} \times s\right)\end{aligned}$

划分是指将其划分为不重叠的小方块，每个小方块大小为 $W\times W\times W$ 。要注意的是，只有方块里有点的方块才是我们编码压缩的对象。

Cube-based Learned-PCGC

本文的最明确的一个idea就在此：通过有效地探索局部和全局空间相关性，通过基于CNN的堆叠自动编码器，学习到的2D变换在图像压缩中表现出了良好的编码性能。于是，作者利用3DCNN将图像压缩相关的思想迁移于点云中。

首先要阐述的一点是，Analysis transform 和 Synthesis transform是一组编码器——解码器对，在main codec和hyperpriors codec都用到了这两种transform。

在这里插入图片描述

在hyperpriors的codec中，它起到的作用其实是针对于潜在特征建模。因此，在hypercodec中，它使用了3个轻量级的3D卷积，旨在进一步地降采样。（原文Given that hyperpriors are mainly used for latent feature entropy modeling, we apply three consecutive lightweight 3D convolutions (with further downsampling mechanism embedded) instead of in hyper codec.，感觉是有笔误的：应该是instead of in main codec）

而在main codec中，它在三个卷积里面，还增加了一个Voxception-ResNet (VRN)。增加这个模块是希望运用残差网络的优势，来实现这个提取特征的目的。

量化

因直接rounding是不可微的，因此这里的量化，是在原始的潜在特征向量 $y$ 下，加入：

$\hat y = y + \mu$

其中， $\mu \sim U(-\frac 1 {2}, \frac 1 {2})$ ，即 $(-\frac 1 {2}, \frac 1 {2})$ 的均匀分布。

Entropy Rate Modeling

本文考虑了用算术编码来压缩量化后的潜在特征。在这里，还提出了一个：

理论上，源符号（例如，特征元素）的熵界与其概率分布密切相关，更重要的是，准确的速率估计在有损压缩中起着关键作用，以实现率失真优化（Theoretically, the entropy bound of the source symbol (e.g., feature element) is closely related to its probability distribution, and more importantly, accurate rate estimation plays a key role in lossy compression for rate-distortion optimization）。

对于实际的比特率：

$R_{\hat{y}}=E_{\hat{y}}\left[-\log _{2} p_{\hat{y}}(\hat{y})\right]$

对于rate-modeling，它的性能的提升，可以用先验知识提升。如果可以使用足够的先验信息 $\hat z$ ，我们就可以用其更好地估计 $\hat y$ 。事实上，这里的先验知识 $\hat z$ 在此时，其实是用 $\hat y$ 降采样了所得到的 $\hat z$ （这部分实在看不懂，如果不对请斧正，谢谢！）

这几个式子确实是看不懂，请各位大神点拨一二！

$p_{\hat{z} \mid \psi}(\hat{z} \mid \psi)=\prod_{i}\left(p_{\hat{z}_{i} \mid \psi^{(i)}}\left(\psi^{(i)}\right) * \mathcal{U}\left(-\frac{1}{2}, \frac{1}{2}\right)\right)\left(\hat{z}_{i}\right)$

$p_{\hat{y} \mid \hat{z}}(\hat{y} \mid \hat{z})=\prod_{i}\left(\mathcal{L}\left(\mu_{i}, \sigma_{i}\right) * \mathcal{U}\left(-\frac{1}{2}, \frac{1}{2}\right)\right)\left(\hat{y}_{i}\right)$

Rate-distortion Optimization

分为两部分：Rate Estimation 和 Distortion Measurement。在Rate Estimation 里，所使用的是
$\begin{aligned} &R_{\hat{y}}=\sum_{i}-\log _{2}\left(p_{\hat{y}_{i} \mid \hat{z}_{i}}\left(\hat{y}_{i} \mid \hat{z}_{i}\right)\right) \\ &R_{\hat{z}}=\sum_{i}-\log _{2}\left(p_{\hat{z}_{i} \mid \psi^{(i)}}\left(\hat{z}_{i} \mid \psi^{(i)}\right)\right) \end{aligned}$
在 Distortion Measurement里，所使用的是：
$D_{\mathrm{WBCE}}=\frac{1}{N_{o}} \sum^{N_{o}}-\log p_{\tilde{x}_{o}}+\alpha \frac{1}{N_{n}} \sum^{N_{n}}-\log \left(1-p_{\tilde{x}_{n}}\right)$