Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation(NIPS 2018)总结

Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation

混合检索生成强化的代理----------医学报告生成
文章链接https://proceedings.neurips.cc/paper/2018/file/e07413354875be01a996dc560274708e-Paper.pdf
此博客适合对于image captioning 和 reinforcement learning 具有一定基础的人群浏览

摘要

生成长和连贯报告去描述医学图像对于桥接视觉模式和丰富信息的人类语言的描述是一个挑战。我们提出一个新颖的Hybrid Retrieval-Generation Reinforced Agent (HRGR-Agent),这利用人类先验知识去重新一致传统的基于检索的方法,利用现代基于学习的方法去实现结构化的,鲁棒的和多样的报告生成。HRGR-Agent采用一个层级的决策制作流程。对于每个句子,一个高层级retrieval policy module选择要么检索一个来自一个off-the-shelf模板数据库的模板句子,要么利用一个低层级的generation module去生成一个新句子。HRGR-Agent通过强化学习去更新,由句子级和单词级奖励去引导。实验证明我们的方法实现了两个数据集的SOTA,生成非常平衡的结构化具有鲁棒性的医学报告内容。此外,我们的模型实现了对医疗异常术语的最高检测精度,提高了评价性能。

贡献

主要贡献是去桥接基于规则的(检索)和通过强化学习基于学习的生成,这能实现令人信服的,正确和多样的医学报告生成。此外,我们的HRGR-Agent比起现有的基于检索生成的模型具有以下三点技术贡献:
(1)我们的检索和生成模块被更新并且互相受益通过policy learning
(2)检索action被作为生成的一部分,它的模板的选择直接影响最终生成的结果
(3)生成模块被刺激去学习多样且复杂的句子当检索策略模块学习如模板的句子时,由独特的单词级和句子级奖励所驱动。其他的工作仍然加强了生成模型去预测如模板的句子。

在这里插入图片描述

总体算法流程:一组图像传入CNN提取视觉特征,然后由image encoder转换为context vector。接着一个sentence decoder递归生成一系列隐藏状态q,其中q代表sentence topics。给定每个topic state qi,一个retrieval policy module决定要么由激发generation module自动生成一个新句子,要么从模板数据库中检索现有的末班。以上两者由retrieval policy module决定。generation module做出清晰的决策由REINFORCE算法去生成单词。我们设计句子级和单词级奖励为了这两个模块。

方法

Hybrid Retrieval-Generation Reinforced Agent

Image Encoder

给定图像I,DenseNet or VGG-19 提取出视觉特征v,经过Image encoder转换视觉特征v为context vector h

Sentence Decoder

视觉特征和前一时间步隐藏层经过堆叠注意力得到context vector

将context vector和前一时间步隐藏层通过堆叠RNN生成当前隐藏层

经过线性映射当前隐藏层为stop control q 控制句子结束【0,1】

经过sigmoid转换为概率结束 z

在这里插入图片描述

Retrieval Policy Module

给定每个topic state qi,检索策略模块采取两步。首先,它预测一个概率分布ui随着生成一个新句子的action并且检索自T。基于第一步的预测,它激发不同的actions。如果自动地生成获得最高概率,generation module被激活去生成一系列单词(在当前topic state的条件下)。如果在T中的模板获得最高概率,它被检索自off-the-shelf模板数据库,并作为当前句子topic的生成结果。我们用0索引去表示选择自动的生成的概率,positive integers in {1,T}去索引选择在T中的模板的概率。第一步被表示为带有softmax激活的全连接:

在这里插入图片描述

m是在ui中最高概率的索引

Generation Module

此模块在当前topic state qi的条件下生成一系列单词和对于每个句子的图像上下文向量hv。它构成RNNs其中将环境参数和前一步隐藏状态h作为输入,生成一个新的隐藏状态h,这进一步转换为概率分布a随着在V中的所有单词,其中t代表第t个单词。我们定义环境参数作为当前topic state qi的拼接,上下文向量c被接下来的在句子解码器中同样注意力模式去编码,先前的单词的嵌入e。生成每个单词的步骤如下,是一个注意力的解码步骤:

在这里插入图片描述

O 为one-hot编码

Reward Module

我们使用自动的指标CIDEr去计算奖励。由于最近在image captioning的工作表明CIDEr比许多传统自动指标表现出更好的方法如BLEU, METEOR and ROUGE。我们考虑两种奖励函数:句子级奖励和单词级奖励。对于第i个生成的句子yi要么来自检索要么生成结果,我们计算delta CIDEr分数在句子级,这是

其在f代表CIDEr评价,gt表示真实报告。对于单个单词输入,我们使用奖励作为delta CIDEr分数,其中

其中gt代表真实句子。句子级和单词级奖励用于计算不可计数的奖励对于retrieval policy module and generation module。

Hierarchical Reinforcement Learning

我们的目标是去最大化生成的报告Y的奖励比起真实报告Y*。损失函数:

在这里插入图片描述

Policy Update for Retrieval Policy Module

在这里插入图片描述

Policy Update for Generation Module

在这里插入图片描述

实验结果

在这里插入图片描述

在这里插入图片描述
个人见解:
据我所知,此paper是第一个将reinforcement learning引入medical report generation 任务中的。但是利用到模板数据库这一点相当于是加了一部分监督信息的(先验知识)。实验效果比起同年提出的On the Automatic Generation of Medical Imaging Reports的指标来说,CIDEr高一点。其余都要低。并且我这边已经调通复现,具体可以看我的这篇博客记录:On the Automatic Generation of Medical Imaging Reports Github源码复现 正在改算法,欢迎有idea的同行与我讨论~

### Retrieval-Augmented Generation in Knowledge-Intensive NLP Tasks Implementation and Best Practices The method of retrieval-augmented generation (RAG) for knowledge-intensive natural language processing tasks aims to combine the strengths of dense vector representations with sparse exact match methods, thereby improving model performance on tasks that require access to external information not present during training[^1]. This approach ensures models can retrieve relevant documents or passages from a large corpus at inference time and generate responses conditioned on this retrieved context. #### Key Components of RAG Framework A typical implementation involves two main components: 1. **Retriever**: A component responsible for fetching potentially useful pieces of text based on input queries. 2. **Generator**: An encoder-decoder architecture like BART or T5 which generates outputs given both the query and retrieved contexts as inputs. This dual-stage process allows systems to leverage vast amounts of unstructured data without needing explicit retraining when new facts become available. #### Practical Steps for Implementing RAG Models To effectively implement such an architecture, one should consider several factors including but not limited to choosing appropriate pre-trained retrievers and generators fine-tuned specifically towards question answering or similar objectives where factual accuracy is paramount. Additionally, integrating these modules into existing pipelines requires careful consideration regarding latency constraints versus quality trade-offs especially under real-time applications scenarios. For instance, here's how you might set up a simple pipeline using Hugging Face Transformers library: ```python from transformers import RagTokenizer, RagTokenForGeneration tokenizer = RagTokenizer.from_pretrained("facebook/rag-token-nq") model = RagTokenForGeneration.from_pretrained("facebook/rag-token-nq") def rag_pipeline(question): inputs = tokenizer([question], return_tensors="pt", truncation=True) generated_ids = model.generate(input_ids=inputs["input_ids"]) output = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] return output ``` In practice, tuning hyperparameters associated with each stage separately could lead to better overall results compared to treating them monolithically due to their distinct roles within the system design. #### Best Practices When Working With RAG Systems When deploying RAG-based solutions, adhering to certain guidelines helps maximize effectiveness while minimizing potential pitfalls: - Ensure high-quality indexing over document collections used by the retriever part since poor recall directly impacts downstream generations negatively. - Regularly update underlying corpora so they remain current; stale resources may propagate outdated information through synthetic texts produced thereafter. - Monitor closely any changes made either upstream (e.g., modifications affecting source material accessibility) or inside your own infrastructure because alterations elsewhere often necessitate corresponding adjustments locally too. By following these recommendations alongside leveraging state-of-the-art techniques provided via frameworks like those mentioned earlier, developers stand well positioned to build robust conversational agents capable of delivering accurate answers across diverse domains requiring specialized domain expertise beyond what general-purpose pretrained models alone offer today. --related questions-- 1. How does multi-task learning compare against single-task approaches concerning adaptability? 2. What are some challenges faced when implementing keyword-based point cloud completion algorithms? 3. Can prompt engineering significantly influence outcomes in few-shot learning settings? 4. Are there specific industries benefiting most prominently from advancements in knowledge-intensive NLP technologies?
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值