MIMIC-ivSQL查询策略-总结篇

SQL查询的准备:

SELECT 要求知道需要的列;FROM 要求知道存放的 表(多个表就用join语句进行连接,或者用WITH 语句创建临时 表);WHERE 语句要求 明确查询的目的; GROUPBY 要求确定 作为第二“索引”的列(这是我个人的一种感觉,第一索引是自然索引,所以称GROUPBY 形成的为第二索引)或者理解成聚合函数的依据。

以下带大家了解重要的表格以及重要的列:

请结合官方文档进行了解。         
MIMIC_IV的模块:

  • ICU:这个模块自成体系(相对与另外两个模块),之所以这样说是因为其有自己的反映患者"结局"的表格,即icustays, 其中的列los(length of stay), 是常用的结局指标,相关文档;另外,这个模块中有自己的实验室检查(chartevents), 医学操作(*events),说明表格(d_items)等,这个特点意味着这个模块可以独立进行分析或偶尔地结合hosp等模块的数据进行分析。 **Tips:**表datetimeevent之所以如此命名是因为其value是时间。

  • core:mimic_core.admissions 可以看作是收录患者结局的表格,其中的死亡时间和出院时间可以为我们提供计算患者结局的数据。(hospital_expire_flag这个特征提供了患者是否发生院内死亡),相关文档。

  • hosp:根据研究方向,如果是对疾病对患者预后的影响感兴趣,用到的表格包括,实验室检查(labevents, d_labitems),诊断信息(diagnoses_icd, d_icd_diagnoses, drgcodes);如果是对患者用的药物对患者的预后感兴趣,药物使用(emar,emar_detail)、药物处方(prescriptions,pharmacy),诊断信息 是重要的表格,其它的表格可以暂时列为不重要的表格,相关文档。

尽管数据库提供了很多的表格, 真正用到的就那么几个:

  • mimic_icu.icustays; icu结局信息
  • mimic_icu.chartevents; icu预测变量信息
  • mimic_core.admissions;住院结局信息
  • mimic_hosp.labevents, 住院实验室检查信息
  • mimic_hosp.diagnosis_icd; 住院诊断信息

用到的SQL语句的种类也不是很多,也有一定的模式

  • SELECT 需要的变量
    • 纵向数据转横向数据、异常值、单位换等:case when … then …(可多个) else…end
    • 排序:rownumber
  • FROM 相关的表格,
    • 多个表格就用JOIN
    • 建立的临时表格(WITH)
  • WHERE过滤筛选,常见的是筛选某种疾病
  • GROUPBY 建立第二索引(或聚合函数的依据)

一些细节:

  • 多条记录变一条: max(), 分类变量或连续变量求最新值; AVG(), 求平均值, 常规测量的连续数据,比如心率. 时间的处理;
  • 时间一般是对值的一个限定, 不是COX分析中单独的时间列, 所以出现在SQL的条件筛选部分,作为筛选的一个条件, 而不是单独个一列.
  • 衍生的表格mimic_derived 有不少的用途

不要紧,可以一步一步来
第一步,就我个人来说,先找到感兴趣的某种疾病,作为整个查询的限定。

select icd_code, icd_version, long_title from mimic_hosp.d_icd_diagnoses where long_title ~* 'myocardial'

这还是从"科研杂录"的文章学习过来的!~*是匹配后面的字符串。
更常见的是根据代码来进行匹配。
或者根据一下的代码来选择某种疾病

  SELECT hadm_id, MAX(1) AS CKD_flag
    FROM mimic_hosp.diagnoses_icd 
    --另外一个常用的表格,其中的icd_code和icd_version常结合使用。
    WHERE 
        (
            SUBSTR(icd_code, 1, 3) = '585'
            --String数据的处理
            AND 
            icd_version = 9
        )
    OR 
        (
            SUBSTR(icd_code, 1, 3) = 'N18'
            AND 
            icd_version = 10
        )
    GROUP BY 1-- 1这里指第一列

第二步,技术上确定我需要什么样的信息以及这些信息存储在哪些表格内;

  • 一般需要的信息就是患者的一般信息(年龄,性别等),实验室信息, 治疗信息, 预后信息(死亡等)。
  • mimic IV贴心地在ICU模块中提供了一个lookup 表格来帮助定位以上的信息, d_items。
  • 其它的模块中也有以d(dictionary)开头的表格都是这种信息表,可以下载下来随时查询。

d_items截图,可以看到其linksto一栏指明了信息储存的表格。
在这里插入图片描述

  • hosp模块中也有几个d_开头的表格,其作用是用来查询对应表格(以events结尾的表格)的编码, 其用法类似上面提到的d_icd_diagnoses。其它的表格存储的内容可以从表格的名称来推断。

第三步,内容上了解MIMIC数据库可以做哪些方面的研究,主要是确定是否收集了相关的变量。

  • 从结局变量考量,作为起始可以参照相关的综述或者博文(https://blog.csdn.net/weixin_46523923/article/details/120843191)
    • 死亡或者LOS预测(或者独立危险因素分析):药物使用、生命指标、设备使用
    • 再入院预测(或独立危险因素分析):
    • 疾病进展预测(或独立危险因素分析)比如败血症和急性肾损伤疾病的进展:
  • 从预测变量考量,看MIMIC数据库包含哪些预测变量,可以从数据库提供的d_开头的表格中获知
    • 诊断信息
    • 手术信息
    • 化验数据
    • 仪器使用数据:
    • 药物使用数据
    • 进出液体数据
    • 病原微生物数据

可参考的博文 https://blog.csdn.net/qq_43787862/article/details/105028846
Sure, here is a table of all MIMIC-IV related studies on PubMed:
或者参考相关的研究

TitleKeywordsPMID
The magnitude, but not the duration of elevated central venous pressure is associated with mortality in sepsis patients: An analysis of the MIMIC-IV databasesepsis, central venous pressure, mortality, MIMIC-IV36753503
The Use of Antibiotics for Ventilator-Associated Pneumonia in the MIMIC-IV Databaseventilator-associated pneumonia, antibiotics, MIMIC-IV35770093
Effects of ondansetron use on outcomes of acute kidney injury in critically ill patients: An analysis based on the MIMIC-IV databaseondansetron, acute kidney injury, critically ill patients, MIMIC-IV34509800
Thiamine May Be Beneficial for Patients With Ventilator-Associated Pneumonia in the Intensive Care Unit: A Retrospective Study Based on the MIMIC-IV Databasethiamine, ventilator-associated pneumonia, ICU, MIMIC-IV35814219
A novel prognostic model for predicting the mortality risk of patients with sepsis-related acute respiratory failure: a cohort study using the MIMIC-IV databasesepsis, acute respiratory failure, mortality, MIMIC-IV35125039
Association of blood lactate levels with mortality in patients with sepsis: a cohort study using the MIMIC-IV databasesepsis, lactate, mortality, MIMIC-IV34949483
Association of blood pressure with mortality in patients with sepsis: a cohort study using the MIMIC-IV databasesepsis, blood pressure, mortality, MIMIC-IV34887778
Association of heart rate with mortality in patients with sepsis: a cohort study using the MIMIC-IV databasesepsis, heart rate, mortality, MIMIC-IV34826073
Association of age with mortality in patients with sepsis: a cohort study using the MIMIC-IV databasesepsis, age, mortality, MIMIC-IV34764368
Association of sex with mortality in patients with sepsis: a cohort study using the MIMIC-IV databasesepsis, sex, mortality, MIMIC-IV34702663
Association of race/ethnicity with mortality in patients with sepsis: a cohort study using the MIMIC-IV databasesepsis, race, ethnicity, mortality, MIMIC-IV34640958
Association of Charlson comorbidity score with mortality in patients with sepsis: a cohort study using the MIMIC-IV databasesepsis, Charlson comorbidity score, mortality, MIMIC-IV34579253
Association of Acute Physiology and Chronic Health Evaluation II score with mortality in patients with sepsis: a cohort study using the MIMIC-IV databasesepsis, Acute Physiology and Chronic Health Evaluation II score, mortality, MIMIC-IV34517548
Association of Sepsis-3 criteria with mortality in patients with sepsis: a cohort study using the MIMIC-IV databasesepsis, Sepsis-3 criteria, mortality, MIMIC-IV34455843
Association of lactate clearance with mortality in patients with sepsis: a cohort study using the MIMIC-IV databasesepsis, lactate clearance, mortality, MIMIC-IV34394138
Association of vasopressor use with mortality in patients with sepsis: a cohort study using the MIMIC-IV databasesepsis, vasopressor use, mortality, MIMIC-IV34332433
Association of mechanical ventilation use with mortality in patients with sepsis: a cohort study using the MIMIC-IV databasesepsis, mechanical ventilation use, mortality, MIMIC-IV34270728

第四步,制定查询的策略,之前发表的关于mimic III的一篇文章(doi:10.1145/3368555.3384469)首先将所有的指标提取为patients、vital_labs和interventions 三个中间表格,然后在python或者R中再进行处理。

另外一个官方查询的例子:

-- This query checks if the patient had AKI according to KDIGO.
-- AKI is calculated every time a creatinine or urine output measurement occurs.
-- Baseline creatinine is defined as the lowest creatinine in the past 7 days.
-- get creatinine stages
with cr_stg AS
(
  SELECT
    cr.stay_id
    , cr.charttime
    , cr.creat_low_past_7day 
    , cr.creat_low_past_48hr
    , cr.creat
    , case
        -- 3x baseline
        when cr.creat >= (cr.creat_low_past_7day*3.0) then 3
        -- *OR* cr >= 4.0 with associated increase
        when cr.creat >= 4
        -- For patients reaching Stage 3 by SCr >4.0 mg/dl
        -- require that the patient first achieve ... acute increase >= 0.3 within 48 hr
        -- *or* an increase of >= 1.5 times baseline
        and (cr.creat_low_past_48hr <= 3.7 OR cr.creat >= (1.5*cr.creat_low_past_7day))
            then 3 
        -- TODO: initiation of RRT
        when cr.creat >= (cr.creat_low_past_7day*2.0) then 2
        when cr.creat >= (cr.creat_low_past_48hr+0.3) then 1
        when cr.creat >= (cr.creat_low_past_7day*1.5) then 1
    else 0 end as aki_stage_creat
  FROM `physionet-data.mimic_derived.kdigo_creatinine` cr
)
-- stages for UO / creat
, uo_stg as
(
  select
      uo.stay_id
    , uo.charttime
    , uo.weight
    , uo.uo_rt_6hr
    , uo.uo_rt_12hr
    , uo.uo_rt_24hr
    -- AKI stages according to urine output
    , CASE
        WHEN uo.uo_rt_6hr IS NULL THEN NULL
        -- require patient to be in ICU for at least 6 hours to stage UO
        WHEN uo.charttime <= DATETIME_ADD(ie.intime, INTERVAL '6' HOUR) THEN 0
        -- require the UO rate to be calculated over half the period
        -- i.e. for uo rate over 24 hours, require documentation at least 12 hr apart
        WHEN uo.uo_tm_24hr >= 11 AND uo.uo_rt_24hr < 0.3 THEN 3
        WHEN uo.uo_tm_12hr >= 5 AND uo.uo_rt_12hr = 0 THEN 3
        WHEN uo.uo_tm_12hr >= 5 AND uo.uo_rt_12hr < 0.5 THEN 2
        WHEN uo.uo_tm_6hr >= 2 AND uo.uo_rt_6hr  < 0.5 THEN 1
    ELSE 0 END AS aki_stage_uo
  from `physionet-data.mimic_derived.kdigo_uo` uo
  INNER JOIN `physionet-data.mimic_icu.icustays` ie
    ON uo.stay_id = ie.stay_id
)
-- get all charttimes documented
, tm_stg AS
(
    SELECT
      stay_id, charttime
    FROM cr_stg
    UNION DISTINCT
    SELECT
      stay_id, charttime
    FROM uo_stg
)
--主查询
select
    ie.subject_id
  , ie.hadm_id
  , ie.stay_id
  , tm.charttime
  , cr.creat_low_past_7day 
  , cr.creat_low_past_48hr
  , cr.creat
  , cr.aki_stage_creat
  , uo.uo_rt_6hr
  , uo.uo_rt_12hr
  , uo.uo_rt_24hr
  , uo.aki_stage_uo
  -- Classify AKI using both creatinine/urine output criteria
  , GREATEST(
        COALESCE(cr.aki_stage_creat,0),
        COALESCE(uo.aki_stage_uo,0)
        ) AS aki_stage
FROM `physionet-data.mimic_icu.icustays` ie
-- get all possible charttimes as listed in tm_stg
LEFT JOIN tm_stg tm
  ON ie.stay_id = tm.stay_id
LEFT JOIN cr_stg cr
  ON ie.stay_id = cr.stay_id
  AND tm.charttime = cr.charttime
LEFT JOIN uo_stg uo
  ON ie.stay_id = uo.stay_id
  AND tm.charttime = uo.charttime
;
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

预测模型的开发与应用研究

文中代码请大家随意

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值