前言
继MIMIC-III之后,MIT又发布了全新版本的数据库MIMIC-IV。目前版本为V0.4。本文主要对MIMIC-IV进行简要介绍。
引用要求
When using this resource, please cite:
Johnson, A., Bulgarelli, L., Pollard, T., Horng, S., Celi, L. A., & Mark, R. (2020). MIMIC-IV (version 0.4). PhysioNet. https://doi.org/10.13026/a3wn-hq05.
Please include the standard citation for PhysioNet:
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., … & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
方法
MIMIC-IV来自两个住院数据库系统:定制的医院电子病历系统 (EHR)和ICU自身的临床信息系统。
MIMIC-IV的创建分三个步骤进行:
- 采集
- 从各自的医院数据库中提取了进入BIDMC急诊科或重症监护病房之一的患者的数据
- 创建了一个主患者列表 (master patient list),其中包含从2008年到2019年入住ICU或急诊科的患者的所有病案号
- 所有源表均被过滤为仅包含主患者列表中的患者病案号的数据
- 数据准备
- 数据进行重组,包括反规范化 (denormalization)、删除审计数据、合并某些表等
- 未执行数据清理,因此可以确保反映了真实的临床数据集
- 去身份化
- 随机编码替换患者标识符、住院标识符与入住ICU标识符
- 每位患者的日期数据固定增加一随机的天数,以消除患者间时间数据的关联,但保证同一患者内部时间数据的相对关系
完成上述3步骤后,将数据库导出为逗号分隔符 (*.csv)文件。
数据描述
MIMIC-IV数据库组织为3个模块:
- core
- hosp
- icu
如此可便于明确数据的用途以及出处。进一步的更新可以参见此。
core
该模块包含数据分析所必须的信息。
- 含有3个表
- patients:人口学
- admissions:每次住院记录
- transfers:每次住院中入住病房的记录
- 特别地,patients表为每位患者提供了关于大致住院年份的信息
Notably, the patients table provides timing information for each patient through the anchor_year and anchor_year_group columns. The anchor_year is a deidentified year occurring sometime between 2100 - 2200, and the anchor_year_group is a three year long date ranges between 2008 - 2019. These pieces of information allow researchers to infer the approximate year a patient received care. For example, if a patient’s anchor_year is 2158, and their anchor_year_group is 2011 - 2013, then any hospitalizations for the patient occurring in the year 2158 actually occurred sometime between 2011 - 2013. Finally, the anchor_age provides the patient age in the given anchor_year. If the patient was over 89 in the anchor_year, this anchor_age has been set to 91 (i.e. all patients over 89 have been grouped together into a single group with value 91, regardless of what their real age was).
hosp
该模块包含从医院EHR提取的信息,但也含有部分门诊信息。其中的表包括:
- laboratory measurements (labevents, d_labitems)
- microbiology cultures (microbiologyevents, d_micro)
- provider orders (poe, poe_detail)
- medication administration (emar, emar_detail)
- medication prescription (prescriptions, pharmacy)
- hospital billing information (diagnoses_icd, d_icd_diagnoses, procedures_icd, d_icd_procedures, hcpcsevents, d_hcpcs, drgcodes)
- service related information (services)
icu
该模块包含BIDMC MetaVision数据库的信息。
MetaVision中表进行了反规范化以创建星型范式 (star schema),其中icustays表和d_items表链接到一组均带有events后缀的数据表。其中的表包括:
- intravenous and fluid inputs (inputevents)
- patient outputs (outputevents)
- procedures (procedureevents)
- information documented as a date or time (datetimeevents)
- other charted information (chartevents)
所有上述的表都:
- 含有字段stay_id以在icustays表中识别具体是那位患者
- 含有字段itemid以在d_items中识别记录的具体内容