EMNLP 2023 - Oral Long Paper - Granularity Matters: Pathological Graph-driven Cross-modal Alignment for Brain CT Report Generation
前言
进入课题组以后,一直在Medical Report Generation领域寻找新的突破,并取得了一些成果。下面,分享一篇发表在EMNLP 2023的工作。
[Oral Long Paper] “Granularity Matters: Pathological Graph-driven Cross-modal Alignment for Brain CT Report Generation”
文章下载:https://aclanthology.org/2023.emnlp-main.408/
更多的信息,可见我的Video in Bilibili
链接: My_Oral_Video
Introduction
Brain CT examination is widely applied in cranial diseases diagnosis. However, writing reports could be time-consuming and error-prone for radiologists.
The automatic Brain CT reports generation can improve the efficiency and accuracy of diagnosing brain diseases.
The current methods have employed various Cross-modal alignment mechanisms to refine the dedicated consistency of salient pathological features between visual and textual modalities.
For instance, the Cross-modal Attention Mechanism tends to concentrate on specific visual regions to mimic clinical observations during report generation.
Besides, the Cross-modal Memory Mechanism employs a memory matrix to patternize visual-textual relations.
Moreover, the Cross-modal Contrastive Learning facilitates unsupervised feature alignment and has been proved to be effective on our small-scale medical dataset.
Challenges:
- The first one is Coarse-grained Supervision: the training data in image-text format lacks detailed supervision for recognizing subtle abnormalities.
- And the next is Coupled Cross-modal Alignment: visual-textual alignment may be inevitably coupled in a coarse-grained manner, and this may cause the tangled feature representation for report generation.
Contribution:
Method
Now, let’s delve into our framework.
At its core, we feed a series of Brain CT scans as input with the goal of generating a medical report.
Our model contains two parallel branches.
First off, there’s the Brain CT report generation branch.
Alongside that, we have the Pathological Graph-driven Cross-modal Alignment branch, designed to learn the consistency across different modalities of pathologies and improve the overall report generation process.
These two branches collaborate through shared visual and textual embedding layers.
Now, let’s dive deeper into the details of each of these branches.
Here is our PGCA branch.
First, we organize a Pathological Graph to encompass clinically significant attributes, such as tissues represented in green and lesions in purple.
The inituation behind deviding tissues and lesions lies in their ability to reflect the backbone of a medical report. Examining the figure, you’ll notice that the fundamental structure of diagnostic sentences revolves around the relationship between tissues and lesions.
To capture this, we select key tissue and lesion entities as graph nodes, connecting them through intra-attribute edges fixed by expert knowledge to convey common medical understanding. Additionally, we establish inter-attribute edges that dynamically adapt based on actual tissue-lesion relations in reports, reflecting specific clinical observations. These edges effectively partition the graph into three distinct sub-graphs: the tissue graph, the lesion graph, and the tissue-lesion graph.
Experiment
一些开会照片
会议注册
会议演讲
Picture with Le Bras, Ronan, 艾伦AI研究所
社交晚宴
Poster现场
会议环球影城活动
最佳论文评选现场
总结
总体来讲,会议体验非常好。一方面结识了许多圈内的朋友,另一方面也通过交流拓宽了自己的知识面,收获满满。感谢EMNLP2023,给予这次宝贵的交流学习机会,未来将持续关注该会议,争取产出更优秀的成果!