what are coronal and sagittal views of the images?
Encoder-Decoder框架
所谓编码,就是将输入序列转化成一个固定长度的向量;解码,就是将之前生成的固定向量再转化成输出序列。
编码-存储-解码
Input :
Given a sequence of 2D projections denoted as
{
X
1
,
X
2
,
⋯
,
X
N
}
\left\{X_{1}, X_{2}, \cdots, X_{N}\right\}
{X1,X2,⋯,XN} where
X
i
∈
R
H
2
D
×
W
2
D
X_{i} \in \mathbb{R}^{H_{2 \mathrm{D}} \times W_{2 \mathrm{D}}}
Xi∈RH2D×W2D for all
1
≤
i
≤
N
1 \leq i \leq N
1≤i≤N ,N is the number of given 2D projections. Each entry is a pixel-wise intensity value.
Output :
volumetric 3D image Y describing the corresponding 3D physical scene.
Y
pred
∈
R
C
3
D
×
H
3
D
×
W
3
D
Y_{\text {pred }} \in \mathbb{R}^{C_{3 \mathrm{D}} \times H_{3 \mathrm{D}} \times W_{3 \mathrm{D}}}
Ypred ∈RC3D×H3D×W3D, while
Y
truth
∈
R
C
3
D
×
H
3
D
×
W
3
D
Y_{\text {truth }} \in \mathbb{R}^{C_{3 \mathrm{D}} \times H_{3 \mathrm{D}} \times W_{3 \mathrm{D}}}
Ytruth ∈RC3D×H3D×W3D is the ground- truth 3D imaIge as the reconstruction targIet. Each entry is a voxel-wise intensity value.
(目标是生成描述相应3D物理场景的体积3D图像Y。)
In order to use a sequence of
2
D
2 \mathrm{D}
2D projections as model input, we stack all the
2
D
2 \mathrm{D}
2D projections together as a single
3
D
3 \mathrm{D}
3D tensor. In other words, a set of
2
D
2 \mathrm{D}
2D projections
{
X
1
,
X
2
,
⋯
,
X
N
}
(
X
i
∈
R
H
2
D
×
W
2
D
)
\left\{X_{1}, X_{2}, \cdots, X_{N}\right\}\left(X_{i} \in \mathbb{R}^{H_{2 D} \times W_{2 \mathrm{D}}}\right)
{X1,X2,⋯,XN}(Xi∈RH2D×W2D) are stacked as a 3D volume
Z
∈
R
N
×
H
2
D
×
W
2
D
\mathrm{Z} \in \mathbb{R}^{N \times H_{2 D} \times W_{2 \mathrm{D}}}
Z∈RN×H2D×W2D
where
N
N
N is the number of
2
D
2 \mathrm{D}
2D projections. In what follows, we introduce the model