Ⅰ 3D-LMNet: Latent Embedding Matching for Accurate and Diverse 3D Point Cloud Reconstruction from a Single Image1
1 Task
3D reconstruction from single view images.
2 Method
- Stage 1: train a point cloud auto-encoder
(
E
p
,
D
p
)
(E_p, D_p)
(Ep,Dp) to learn a latent space
Z
∈
R
k
\mathcal{Z} \in \mathcal{R}^k
Z∈Rk of 3D point clouds.
– Loss function : Chamfer distance - Stage 2: train an image encoder
E
I
E_I
EI to map the 2D images to this learnt latent space
Z
\mathcal{Z}
Z.
( b b b): Match vectors in the latent Z \mathcal{Z} Z space. 2D 图像的latent vector应该和对应的3D点云的latent vector相同. 以便在inference阶段根据 z I z_I zIdecode出来3D点云 X ^ I \hat{X}_I X^I.
– latent matching loss: L 1 ( z I − z p ) = ∣ z I − z p ∣ \mathcal{L}_1(z_I-z_p)=|z_I-z_p| L1(zI−zp)=∣zI−zp∣ or L 2 ( z I − z p ) = ∣ z I − z p ∣ \mathcal{L}_2(z_I-z_p)=|z_I-z_p| L2(zI−zp)=∣zI−zp∣
( c c c): Generate multiple plausible outputs, Learn a probabilistic distribution in the latent space.
– Reparameterization trick
– Formulate the latent representation z 1 z_1 z1 of a specific input image I 1 I_1 I1 to be a Gaussian random variable, i.e. z 1 ∼ N ( μ , σ 2 ) z_1 \sim \mathcal{N}(\mu,\sigma^2) z1∼N(μ,σ2). The image encoder predicts the mean μ \mu μ and standard deviation σ \sigma σ of the distribution, and ϵ ∼ N ( 0 , 1 ) \epsilon\sim\mathcal{N}(0,1) ϵ∼N(0,1) is sampled to obtain the latent vector as z 1 = μ + ϵ σ z_1=\mu+\epsilon\sigma z1=μ+ϵσ.
– Diversity Loss:Diversity loss penalizes σ \sigma σ for being too far off from zero for unambiguous views, while giving it the liberty to explore the latent space for ambiguous views. L d i v = ( σ − η e − ( ϕ i − ϕ 0 ) 2 δ 2 ) 2 \mathcal{L}_{div}=(\sigma-\eta e^{-\frac{(\phi_i-\phi_0)^2}{\delta^2}})^2 Ldiv=(σ−ηe−δ2(ϕi−ϕ0)2)2
其中 ϕ i \phi_i ϕi 是输入图像 I I I的方位角, ϕ 0 \phi_0 ϕ0是最大遮挡试图的方位角。
– For ( c c c), L = L l m + λ L d i v \mathcal{L}=\mathcal{L}_{lm}+\lambda\mathcal{L}_{div} L=Llm+λLdiv
– Inference阶段,改变 ϵ \epsilon ϵ的取值,可以生成不同的3D点云预测结果。
– 由上图可以看出,如果输入是一张不明确的2D image,调整 ϵ \epsilon ϵ大小可以得到不同的预测点云结果(如前两行)。对于明确的2D image输入,大小不同的 ϵ \epsilon ϵ对点云的预测结果影响不大。
3 Result
Ⅱ PointFlow: 3D Point Cloud Generation with Continuous Normalizing Flows2
Task
Generate 3D point clouds
Method
Model 3D point clouds as a distribution of distributions. 学习分布的两层层次结构,其中第一层是形状的分布;第二层是给定形状,点的分布。
- Training阶段: 3D点云先经过encoder Q ϕ Q_\phi Qϕ来得到shape representation z = μ + σ ϵ z=\mu+\sigma\epsilon z=μ+σϵ. 因为 F ϕ F_\phi Fϕ和 G θ G_\theta Gθ都是continuous normalizing flows(是可逆的),所以在训练阶段可以计算形状表征的先验 P ϕ ( z ) P_\phi(z) Pϕ(z)和 P θ ( X ∣ z ) P_{\theta}(X|z) Pθ(X∣z),从而得到loss L p r i o r \mathcal{L}_{prior} Lprior和 L r e c o n \mathcal{L}_{recon} Lrecon.
- Test阶段:选
ω
^
∼
N
(
0
,
I
)
\hat{\omega}\sim\mathcal{N}(0,I)
ω^∼N(0,I), 然后通过
F
ϕ
F_\phi
Fϕ得到形状表征
z
^
=
F
ϕ
(
ω
~
)
\hat{z}=F_\phi(\tilde\omega)
z^=Fϕ(ω~).
重复 M ~ \tilde M M~次:
从 N ( 0 , I ) \mathcal{N}(0,I) N(0,I) 分布中取样一个点 y ~ ∈ R 3 \tilde y\in \mathcal{R}^3 y~∈R3,然后把 y ~ \tilde y y~输入到 G θ G_\theta Gθ,得到在形状z上的点 x ~ = G θ ( ω ~ ; z ) \tilde x=G_\theta(\tilde \omega;z) x~=Gθ(ω~;z).
Ⅲ End-to-End Differentiable Learning of Protein structure3
1 Task
Prediction of protein structure from sequence.
2 Highlights
- 用神经网络从氨基酸序列来预测蛋白质结构,不需要co-evolution information.
- 学到了蛋白质序列空间的一种低维度表征
3 Method
(1)用循环神经网络encode蛋白质序列
(2)通过扭转角度参数化局部的蛋白质结构,来使模型推理出不同的构造
(3)通过循环geometric units,把局部的蛋白质结构连接到它的全局表征上
(4)用可导损失函数来刻画预测的结构和真实结构的偏差
Three-stages recurrent geometric network (RGN)
1) computation
输入:每个计算单元一次接收一个residue。(position-specific scoring matrices(PSSMs)?)
输出:每个计算单元输出三个数字,代表输入residue对应的扭转角度(三维)。
computation units are based on LSTM.
2) geometry
输入:扭转角度+由上游geometric单元部分组装完成的骨架
输出:增加了这个residue的骨架
3) assessment
计算预测出的结构和实验结构的差异
Metric: Distance-based root mean square deviation (dRMSD)
Mandikal P, Murthy N, Agarwal M, et al. 3D-LMNet: Latent Embedding Matching for Accurate and Diverse 3D Point Cloud Reconstruction from a Single Image[J]. arXiv preprint arXiv:1807.07796, 2018. ↩︎
Yang G, Huang X, Hao Z, et al. PointFlow: 3D Point Cloud Generation with Continuous Normalizing Flows[J]. arXiv preprint arXiv:1906.12320, 2019. ↩︎
AlQuraishi M. End-to-end differentiable learning of protein structure[J]. Cell systems, 2019, 8(4): 292-301. e3. ↩︎