SQL查询的准备:
SELECT 要求知道需要的列;FROM 要求知道存放的 表(多个表就用join语句进行连接,或者用WITH 语句创建临时 表);WHERE 语句要求 明确查询的目的; GROUPBY 要求确定 作为第二“索引”的列(这是我个人的一种感觉,第一索引是自然索引,所以称GROUPBY 形成的为第二索引)或者理解成聚合函数的依据。
以下带大家了解重要的表格以及重要的列:
请结合官方文档进行了解。
MIMIC_IV的模块:
-
ICU:这个模块自成体系(相对与另外两个模块),之所以这样说是因为其有自己的反映患者"结局"的表格,即icustays, 其中的列los(length of stay), 是常用的结局指标,相关文档;另外,这个模块中有自己的实验室检查(chartevents), 医学操作(*events),说明表格(d_items)等,这个特点意味着这个模块可以独立进行分析或偶尔地结合hosp等模块的数据进行分析。 **Tips:**表datetimeevent之所以如此命名是因为其value是时间。
-
core:mimic_core.admissions 可以看作是收录患者结局的表格,其中的死亡时间和出院时间可以为我们提供计算患者结局的数据。(hospital_expire_flag这个特征提供了患者是否发生院内死亡),相关文档。
-
hosp:根据研究方向,如果是对疾病对患者预后的影响感兴趣,用到的表格包括,实验室检查(labevents, d_labitems),诊断信息(diagnoses_icd, d_icd_diagnoses, drgcodes);如果是对患者用的药物对患者的预后感兴趣,药物使用(emar,emar_detail)、药物处方(prescriptions,pharmacy),诊断信息 是重要的表格,其它的表格可以暂时列为不重要的表格,相关文档。
尽管数据库提供了很多的表格, 真正用到的就那么几个:
- mimic_icu.icustays; icu结局信息
- mimic_icu.chartevents; icu预测变量信息
- mimic_core.admissions;住院结局信息
- mimic_hosp.labevents, 住院实验室检查信息
- mimic_hosp.diagnosis_icd; 住院诊断信息
- 等
用到的SQL语句的种类也不是很多,也有一定的模式
- SELECT 需要的变量
- 纵向数据转横向数据、异常值、单位换等:case when … then …(可多个) else…end
- 排序:rownumber
- FROM 相关的表格,
- 多个表格就用JOIN
- 建立的临时表格(WITH)
- WHERE过滤筛选,常见的是筛选某种疾病
- GROUPBY 建立第二索引(或聚合函数的依据)
一些细节:
- 多条记录变一条: max(), 分类变量或连续变量求最新值; AVG(), 求平均值, 常规测量的连续数据,比如心率. 时间的处理;
- 时间一般是对值的一个限定, 不是COX分析中单独的时间列, 所以出现在SQL的条件筛选部分,作为筛选的一个条件, 而不是单独个一列.
- 衍生的表格mimic_derived 有不少的用途
不要紧,可以一步一步来
第一步,就我个人来说,先找到感兴趣的某种疾病,作为整个查询的限定。
select icd_code, icd_version, long_title from mimic_hosp.d_icd_diagnoses where long_title ~* 'myocardial'
这还是从"科研杂录"的文章学习过来的!~*是匹配后面的字符串。
更常见的是根据代码来进行匹配。
或者根据一下的代码来选择某种疾病
SELECT hadm_id, MAX(1) AS CKD_flag
FROM mimic_hosp.diagnoses_icd
--另外一个常用的表格,其中的icd_code和icd_version常结合使用。
WHERE
(
SUBSTR(icd_code, 1, 3) = '585'
--String数据的处理
AND
icd_version = 9
)
OR
(
SUBSTR(icd_code, 1, 3) = 'N18'
AND
icd_version = 10
)
GROUP BY 1-- 1这里指第一列
第二步,技术上确定我需要什么样的信息以及这些信息存储在哪些表格内;
- 一般需要的信息就是患者的一般信息(年龄,性别等),实验室信息, 治疗信息, 预后信息(死亡等)。
- mimic IV贴心地在ICU模块中提供了一个lookup 表格来帮助定位以上的信息, d_items。
- 其它的模块中也有以d(dictionary)开头的表格都是这种信息表,可以下载下来随时查询。
d_items截图,可以看到其linksto一栏指明了信息储存的表格。
- hosp模块中也有几个d_开头的表格,其作用是用来查询对应表格(以events结尾的表格)的编码, 其用法类似上面提到的d_icd_diagnoses。其它的表格存储的内容可以从表格的名称来推断。
第三步,内容上了解MIMIC数据库可以做哪些方面的研究,主要是确定是否收集了相关的变量。
- 从结局变量考量,作为起始可以参照相关的综述或者博文(https://blog.csdn.net/weixin_46523923/article/details/120843191)
- 死亡或者LOS预测(或者独立危险因素分析):药物使用、生命指标、设备使用
- 再入院预测(或独立危险因素分析):
- 疾病进展预测(或独立危险因素分析)比如败血症和急性肾损伤疾病的进展:
- 从预测变量考量,看MIMIC数据库包含哪些预测变量,可以从数据库提供的d_开头的表格中获知
- 诊断信息
- 手术信息
- 化验数据
- 仪器使用数据:
- 药物使用数据
- 进出液体数据
- 病原微生物数据
可参考的博文 https://blog.csdn.net/qq_43787862/article/details/105028846
Sure, here is a table of all MIMIC-IV related studies on PubMed:
或者参考相关的研究
Title | Keywords | PMID |
---|---|---|
The magnitude, but not the duration of elevated central venous pressure is associated with mortality in sepsis patients: An analysis of the MIMIC-IV database | sepsis, central venous pressure, mortality, MIMIC-IV | 36753503 |
The Use of Antibiotics for Ventilator-Associated Pneumonia in the MIMIC-IV Database | ventilator-associated pneumonia, antibiotics, MIMIC-IV | 35770093 |
Effects of ondansetron use on outcomes of acute kidney injury in critically ill patients: An analysis based on the MIMIC-IV database | ondansetron, acute kidney injury, critically ill patients, MIMIC-IV | 34509800 |
Thiamine May Be Beneficial for Patients With Ventilator-Associated Pneumonia in the Intensive Care Unit: A Retrospective Study Based on the MIMIC-IV Database | thiamine, ventilator-associated pneumonia, ICU, MIMIC-IV | 35814219 |
A novel prognostic model for predicting the mortality risk of patients with sepsis-related acute respiratory failure: a cohort study using the MIMIC-IV database | sepsis, acute respiratory failure, mortality, MIMIC-IV | 35125039 |
Association of blood lactate levels with mortality in patients with sepsis: a cohort study using the MIMIC-IV database | sepsis, lactate, mortality, MIMIC-IV | 34949483 |
Association of blood pressure with mortality in patients with sepsis: a cohort study using the MIMIC-IV database | sepsis, blood pressure, mortality, MIMIC-IV | 34887778 |
Association of heart rate with mortality in patients with sepsis: a cohort study using the MIMIC-IV database | sepsis, heart rate, mortality, MIMIC-IV | 34826073 |
Association of age with mortality in patients with sepsis: a cohort study using the MIMIC-IV database | sepsis, age, mortality, MIMIC-IV | 34764368 |
Association of sex with mortality in patients with sepsis: a cohort study using the MIMIC-IV database | sepsis, sex, mortality, MIMIC-IV | 34702663 |
Association of race/ethnicity with mortality in patients with sepsis: a cohort study using the MIMIC-IV database | sepsis, race, ethnicity, mortality, MIMIC-IV | 34640958 |
Association of Charlson comorbidity score with mortality in patients with sepsis: a cohort study using the MIMIC-IV database | sepsis, Charlson comorbidity score, mortality, MIMIC-IV | 34579253 |
Association of Acute Physiology and Chronic Health Evaluation II score with mortality in patients with sepsis: a cohort study using the MIMIC-IV database | sepsis, Acute Physiology and Chronic Health Evaluation II score, mortality, MIMIC-IV | 34517548 |
Association of Sepsis-3 criteria with mortality in patients with sepsis: a cohort study using the MIMIC-IV database | sepsis, Sepsis-3 criteria, mortality, MIMIC-IV | 34455843 |
Association of lactate clearance with mortality in patients with sepsis: a cohort study using the MIMIC-IV database | sepsis, lactate clearance, mortality, MIMIC-IV | 34394138 |
Association of vasopressor use with mortality in patients with sepsis: a cohort study using the MIMIC-IV database | sepsis, vasopressor use, mortality, MIMIC-IV | 34332433 |
Association of mechanical ventilation use with mortality in patients with sepsis: a cohort study using the MIMIC-IV database | sepsis, mechanical ventilation use, mortality, MIMIC-IV | 34270728 |
第四步,制定查询的策略,之前发表的关于mimic III的一篇文章(doi:10.1145/3368555.3384469)首先将所有的指标提取为patients、vital_labs和interventions 三个中间表格,然后在python或者R中再进行处理。
另外一个官方查询的例子:
-- This query checks if the patient had AKI according to KDIGO.
-- AKI is calculated every time a creatinine or urine output measurement occurs.
-- Baseline creatinine is defined as the lowest creatinine in the past 7 days.
-- get creatinine stages
with cr_stg AS
(
SELECT
cr.stay_id
, cr.charttime
, cr.creat_low_past_7day
, cr.creat_low_past_48hr
, cr.creat
, case
-- 3x baseline
when cr.creat >= (cr.creat_low_past_7day*3.0) then 3
-- *OR* cr >= 4.0 with associated increase
when cr.creat >= 4
-- For patients reaching Stage 3 by SCr >4.0 mg/dl
-- require that the patient first achieve ... acute increase >= 0.3 within 48 hr
-- *or* an increase of >= 1.5 times baseline
and (cr.creat_low_past_48hr <= 3.7 OR cr.creat >= (1.5*cr.creat_low_past_7day))
then 3
-- TODO: initiation of RRT
when cr.creat >= (cr.creat_low_past_7day*2.0) then 2
when cr.creat >= (cr.creat_low_past_48hr+0.3) then 1
when cr.creat >= (cr.creat_low_past_7day*1.5) then 1
else 0 end as aki_stage_creat
FROM `physionet-data.mimic_derived.kdigo_creatinine` cr
)
-- stages for UO / creat
, uo_stg as
(
select
uo.stay_id
, uo.charttime
, uo.weight
, uo.uo_rt_6hr
, uo.uo_rt_12hr
, uo.uo_rt_24hr
-- AKI stages according to urine output
, CASE
WHEN uo.uo_rt_6hr IS NULL THEN NULL
-- require patient to be in ICU for at least 6 hours to stage UO
WHEN uo.charttime <= DATETIME_ADD(ie.intime, INTERVAL '6' HOUR) THEN 0
-- require the UO rate to be calculated over half the period
-- i.e. for uo rate over 24 hours, require documentation at least 12 hr apart
WHEN uo.uo_tm_24hr >= 11 AND uo.uo_rt_24hr < 0.3 THEN 3
WHEN uo.uo_tm_12hr >= 5 AND uo.uo_rt_12hr = 0 THEN 3
WHEN uo.uo_tm_12hr >= 5 AND uo.uo_rt_12hr < 0.5 THEN 2
WHEN uo.uo_tm_6hr >= 2 AND uo.uo_rt_6hr < 0.5 THEN 1
ELSE 0 END AS aki_stage_uo
from `physionet-data.mimic_derived.kdigo_uo` uo
INNER JOIN `physionet-data.mimic_icu.icustays` ie
ON uo.stay_id = ie.stay_id
)
-- get all charttimes documented
, tm_stg AS
(
SELECT
stay_id, charttime
FROM cr_stg
UNION DISTINCT
SELECT
stay_id, charttime
FROM uo_stg
)
--主查询
select
ie.subject_id
, ie.hadm_id
, ie.stay_id
, tm.charttime
, cr.creat_low_past_7day
, cr.creat_low_past_48hr
, cr.creat
, cr.aki_stage_creat
, uo.uo_rt_6hr
, uo.uo_rt_12hr
, uo.uo_rt_24hr
, uo.aki_stage_uo
-- Classify AKI using both creatinine/urine output criteria
, GREATEST(
COALESCE(cr.aki_stage_creat,0),
COALESCE(uo.aki_stage_uo,0)
) AS aki_stage
FROM `physionet-data.mimic_icu.icustays` ie
-- get all possible charttimes as listed in tm_stg
LEFT JOIN tm_stg tm
ON ie.stay_id = tm.stay_id
LEFT JOIN cr_stg cr
ON ie.stay_id = cr.stay_id
AND tm.charttime = cr.charttime
LEFT JOIN uo_stg uo
ON ie.stay_id = uo.stay_id
AND tm.charttime = uo.charttime
;