Doc2X | 专业公式识别工具
精准识别 PDF 文档中的公式,并支持编辑与转化到 Word 或 Latex,为科研工作者节省宝贵时间。
Doc2X | Professional Formula Recognition Tool
Accurately recognizes formulas in PDFs, with editing and conversion to Word or LaTeX, saving valuable time for researchers.
👉 点击体验 Doc2X | Try Doc2X
QN-Mixer: A Quasi-Newton MLP-Mixer Model for Sparse-View CT Reconstruction
QN-Mixer:一种用于稀疏视图CT重建的准牛顿MLP-Mixer模型
Ishak Ayad1,2*†
Ishak Ayad1,2*†
1 {}^{1} 1 ETIS (UMR 8051), CY Cergy Paris University, ENSEA, CNRS, France
1 {}^{1} 1 ETIS (UMR 8051),CY塞尔吉巴黎大学,ENSEA,法国
2 {}^{2} 2 AGM (UMR 8088), CY Cergy Paris University, CNRS, France
2 {}^{2} 2 AGM (UMR 8088),CY塞尔吉巴黎大学,CNRS,法国
3 {}^{3} 3 University of Ljubljana,Slovenia
3 {}^{3} 3 卢布尔雅那大学,斯洛文尼亚
ishak.ayad@cyu.fr
Figure 1. CT Reconstruction with 32 views of State-of-the-Art Methods. Comparative analysis with post-processing and first-order unrolling networks highlights QN-Mixer’s superiority in artifact removal, training time, and data efficiency.
图1. 使用32个视图的CT重建与最先进方法的比较分析。与后处理和一阶展开网络的比较突显了QN-Mixer在去除伪影、训练时间和数据效率方面的优势。
Abstract
摘要
Inverse problems span across diverse fields. In medical contexts,computed tomography(CT)plays a crucial role in reconstructing a patient’s internal structure, presenting challenges due to artifacts caused by inherently ill-posed inverse problems. Previous research advanced image quality via post-processing and deep unrolling algorithms but faces challenges, such as extended convergence times with ultra-sparse data. Despite enhancements, resulting images often show significant artifacts, limiting their effectiveness for real-world diagnostic applications. We aim to explore deep second-order unrolling algorithms for solving imaging inverse problems, emphasizing their faster convergence and lower time complexity compared to common first-order methods like gradient descent. In this paper, we introduce QN-Mixer, an algorithm based on the quasi-Newton approach. We use learned parameters through the BFGS algorithm and introduce Incept-Mixer, an efficient neural architecture that serves as a non-local regularization term, capturing long-range dependencies within images. To address the computational demands typically associated with quasi-Newton algorithms that require full Hessian matrix computations, we present a memory-efficient alternative. Our approach intelligently downsamples gradient information, significantly reducing computational requirements while maintaining performance. The approach is validated through experiments on the sparse-view CT problem, involving various datasets and scanning protocols, and is compared with post-processing and deep unrolling state-of-the-art approaches. Our method outperforms existing approaches and achieves state-of-the-art performance in terms of SSIM and PSNR, all while reducing the number of unrolling iterations required.
逆问题跨越多个领域。在医学背景下,计算机断层扫描(CT)在重建患者内部结构方面发挥着关键作用,但由于固有的不适定逆问题引发的伪影,面临着挑战。以往的研究通过后处理和深度展开算法提高了图像质量,但仍面临诸如超稀疏数据导致的收敛时间延长等问题。尽管有所改进,生成的图像通常显示出显著的伪影,限制了它们在实际诊断应用中的有效性。我们旨在探索深度二阶展开算法以解决成像逆问题,强调其相较于常见的一阶方法(如梯度下降)具有更快的收敛速度和更低的时间复杂度。在本文中,我们介绍了QN-Mixer,这是一种基于拟牛顿方法的算法。我们通过BFGS算法使用学习到的参数,并引入Incept-Mixer,这是一种高效的神经网络架构,作为非局部正则化项,捕捉图像中的长程依赖关系。为了应对通常与拟牛顿算法相关的计算需求,这些算法需要完整的海森矩阵计算,我们提出了一种内存高效的替代方案。我们的方法智能地下采样梯度信息,显著减少计算需求,同时保持性能。该方法通过在稀疏视图CT问题上的实验进行了验证,涉及各种数据集和扫描协议,并与后处理和深度展开的最新方法进行了比较。我们的方法在SSIM和PSNR方面超越了现有方法,并在减少所需展开迭代次数的同时实现了最新的性能。
1. Introduction
1. 引言
Computed tomography (CT) is a widely used imaging modality in medical diagnosis and treatment planning, delivering intricate anatomical details of the human body with precision. Despite its success, CT is associated with high radiation doses, which can increase the risk of cancer induction [50]. Adhering to the ALARA principle (As Low As Reasonably Achievable) [37], the medical community emphasizes minimizing radiation exposure to the lowest level necessary for accurate diagnosis. Numerous approaches have been proposed to reduce radiation doses while maintaining image quality. Among these, sparse-view CT emerges as a promising solution, effectively lowering radiation doses by subsampling the projection data, often referred to as the sinogram. Nonetheless, reconstructed images using the well-known Filtered Back Projection (FBP) algorithm [34], suffer from pronounced streaking artifacts (see Fig. 1), which can lead to misdiagnosis. The challenge of effectively reconstructing high-quality CT images from sparse-view data is gaining increasing attention in both the computer vision and medical imaging communities.
计算机断层扫描(CT)是一种广泛应用于医学诊断和治疗规划的成像方式,能够精确地提供人体复杂的解剖细节。尽管取得了成功,CT 仍然与高辐射剂量相关,这可能增加癌症诱发的风险 [50]。医学界遵循 ALARA 原则(尽可能低) [37],强调将辐射暴露降低到进行准确诊断所需的最低水平。已经提出多种方法来减少辐射剂量,同时保持图像质量。在这些方法中,稀疏视图 CT 显示出作为一种有前景的解决方案,通过对投影数据进行子采样,通常称为正弦图,有效降低辐射剂量。然而,使用著名的滤波反投影(FBP)算法 [34] 重建的图像存在明显的条纹伪影(见图 1),这可能导致误诊。从稀疏视图数据有效重建高质量 CT 图像的挑战在计算机视觉和医学成像领域越来越受到关注。
*Corresponding author. † \dagger † Equal contribution.
*通讯作者。 † \dagger † 平等贡献。
With the success of deep learning spanning diverse domains,initial image-domain techniques [ 6 , 19 , 25 , 28 , 59 ] \left\lbrack {6,{19},{25},{28},{59}}\right\rbrack [6,19,25,28,59] have been introduced as post-processing tasks on the FBP reconstructed images, exhibiting notable accomplishments in artifact removal and structure preservation. However, the inherent limitations of these methods arise from their constrained receptive fields, leading to challenges in effectively capturing global information and, consequently, suboptimal results.
随着深度学习在各个领域的成功,初始图像域技术 [ 6 , 19 , 25 , 28 , 59 ] \left\lbrack {6,{19},{25},{28},{59}}\right\rbrack [6,19,25,28,59] 被引入作为 FBP 重建图像的后处理任务,在去除伪影和结构保留方面取得了显著成就。然而,这些方法的固有限制源于其受限的感受野,导致在有效捕捉全局信息方面面临挑战,从而产生次优结果。
To address this limitation, recent advances have seen a shift toward a dual-domain approach [18, 27, 29, 49], where post-processing methods turn to the sinogram domain. In this dual-domain paradigm, deep neural networks are employed to perform interpolation tasks on the sinogram data [ 15 , 24 ] \left\lbrack { {15},{24}}\right\rbrack [15,24] ,facilitating more accurate image reconstruction. Despite the significant achievements of postprocessing and dual-domain methods, they confront issues of interpretability and performance limitations, especially when working with small datasets and ultra-sparse-view data, as shown in Fig. 1. To tackle these challenges, deep unrolling networks have been introduced [ 1 , 7 , 8 , 11 , 16 \lbrack 1,7,8,{11},{16} [1,7,8,11,16 , 20, 51, 54]. Unrolling networks treat the sparse-view CT reconstruction problem as an optimization task, resulting in a first-order iterative algorithm like gradient descent, which is subsequently unrolled into a deep recurrent neural network in order to learn the optimization parameters and the regularization term. Like post-processing techniques, unrolling networks have been extended to the sinogram domain [ 52 , 56 ] \left\lbrack { {52},{56}}\right\rbrack [52,56] to perform interpolation task.
为了解决这一限制,最近的进展已转向双域方法 [18, 27, 29, 49],其中后处理方法转向正弦图域。在这一双域范式中,深度神经网络被用于对正弦图数据 [ 15 , 24 ] \left\lbrack { {15},{24}}\right\rbrack [15,24] 进行插值任务,从而促进更准确的图像重建。尽管后处理和双域方法取得了显著成就,但它们面临可解释性和性能限制的问题,特别是在处理小数据集和超稀疏视图数据时,如图1所示。为了解决这些挑战,已引入深度展开网络 [ 1 , 7 , 8 , 11 , 16 \lbrack 1,7,8,{11},{16} [1,7,8,11,16,[20, 51, 54]。展开网络将稀疏视图CT重建问题视为优化任务,产生类似于梯度下降的一阶迭代算法,随后被展开为深度递归神经网络,以学习优化参数和正则化项。与后处理技术一样,展开网络也已扩展到正弦图域 [ 52 , 56 ] \left\lbrack { {52},{56}}\right\rbrack [52,56] 以执行插值任务。
Unrolling networks,as referenced in [ 12 , 36 , 44 ] \left\lbrack { {12},{36},{44}}\right\rbrack [12,36,44] ,exhibit remarkable performance across diverse domains. However, they suffer from slow convergence and high computational costs, as illustrated in Fig. 1, necessitating the development of more efficient alternatives [14]. More specifically, they confront two main issues: Firstly, they frequently grapple with capturing long-range dependencies due to their dependence on locally-focused regularization terms using CNNs. This limitation results in suboptimal outcomes, particularly evident in tasks such as image reconstruction. Secondly, the escalating computational costs of unrolling methods align with the general trend of increased complexity in modern neural networks. This escalation not only amplifies the required number of iterations due to the algorithm’s iterative nature but also contributes to their high computational demand.
如 [ 12 , 36 , 44 ] \left\lbrack { {12},{36},{44}}\right\rbrack [12,36,44] 中所述,展开网络在不同领域表现出卓越的性能。然而,它们面临着收敛速度慢和计算成本高的问题,如图1所示,这需要开发更高效的替代方案 [14]。更具体地说,它们面临两个主要问题:首先,由于依赖于使用卷积神经网络(CNN)的局部聚焦正则化项,它们常常难以捕捉长程依赖性。这一限制导致了次优结果,特别是在图像重建等任务中尤为明显。其次,展开方法的计算成本不断上升,与现代神经网络复杂性增加的一般趋势相一致。这一上升不仅增加了由于算法的迭代性质所需的迭代次数,还导致了它们的高计算需求。
To tackle the aforementioned issues, we introduce a novel second-order unrolling network for sparse-view CT reconstruction. In particular, to enable the learnable regularization term to apprehend long-range interactions within the image, we propose a non-local regularization block termed Incept-Mixer. Drawing inspiration from the multilayer perceptron mixer [46] and the inception architecture [45], it is created to combine the best features from both sides: capturing long-range interactions from the attention-like mechanism of MLP-Mixer and extracting local invaria