1)数据集来源:https://iuhealth.org/find-medical-services/x-rays(IU,The Indiana University Health,不直接提供数据集)
数据集简介:
The Indiana University Chest XRay Collection (IU X-Ray) is a set of chest x-ray images paired with their corresponding diagnostic reports. The dataset contains 7,470 pairs of images and reports(6470:500:500).
Each report consists of the following sections: impression, findings, tags, comparison, and indication.On average, each image is associated with 2.2 tags, 5.7 sentences, and each sentence contains 6.5 words.
Besides, we find that top 1,000 words cover 99.0% word occurrences in the dataset, therefore we only included top 1,000 words in the dictionary.
2)数据集的获取:
论文:Preparing a collection of radiology examinations for distribution and retrieval
链接:https://scholarworks.iupui.edu/bitstream/handle/1805/13649/ocv080.pdf?sequence=1&isAllowed=y
This paper presents an approach to developing a collection of radiology examinations, including both the images and radiologist narrative reports, and making them publicly available in a searchable database.
The authors collected 3996 radiology reports from the Indiana Network for Patient Care and 8121 associated images from the hospitals’ picture archiving systems.
The de-identified Indiana chest X-ray collection is available for searching and downloading from the National Library of Medicine (http://openi.nlm.nih.gov/).
数据集:https://openi.nlm.nih.gov/gridquery.php?q=Indiana%20chest%20X-ray%20collection&it=xg
3)使用该数据集的论文:
On the Automatic Generation of Medical Imaging Reports
论文地址:https://arxiv.org/pdf/1711.08195.pdf