MIMIC数据库官方SQL查询标注和初步分析--sofa评分(2-19)

说明:
'SOFA’的意思是序贯器官衰竭评分表。
本SQL的目的就是查出计算Sepsis3的相关的资料。
展示了MIMIC查询的一种策略,即在官方SQL的基础上进一步进行查询,本SQL是在SOFA.sql查询和suspicion_of_infection.sql查询结果上的基础上进行的查询。
多条记录变为一条记录的另一种方式。rownumber()over的用法。
在这里插入图片描述


-- Creates a table with "onset" time of Sepsis-3 in the ICU.目的:Sepsis-3形成的时间
-- That is, the **earliest time** at which a patient had SOFA >= 2 and suspicion of infection.
-- As many variables used in SOFA are only collected in the ICU, this query can only
-- define sepsis-3 onset within the ICU.仅限于ICU患者

-- extract rows with SOFA >= 2
-- implicitly this assumes baseline SOFA was 0 before ICU admission.
WITH sofa AS--临时表,是从一个叫做mimic_derived的模块的sofa表格中获取的,算是二次获取
(
  SELECT stay_id
    , starttime, endtime
    , respiration_24hours as respiration--呼吸
    , coagulation_24hours as coagulation--血小板
    , liver_24hours as liver--肝
    , cardiovascular_24hours as cardiovascular--心
    , cns_24hours as cns--神经
    , renal_24hours as renal--肾脏
    , sofa_24hours as sofa_score
  FROM `physionet-data.mimic_derived.sofa`
  WHERE sofa_24hours >= 2
)
, s1 as
(
  SELECT
    soi.subject_id
    , soi.stay_id
    -- suspicion columns
    , soi.ab_id
    , soi.antibiotic
    , soi.antibiotic_time
    , soi.culture_time
    , soi.suspected_infection
    , soi.suspected_infection_time
    , soi.specimen
    , soi.positive_culture
    -- sofa columns
    , starttime, endtime
    , respiration, coagulation, liver, cardiovascular, cns, renal
    , sofa_score
    -- All rows have an associated suspicion of infection event
    -- Therefore, Sepsis-3 is defined as SOFA >= 2.
    -- Implicitly, the baseline SOFA score is assumed to be zero, as we do not know
    -- if the patient has preexisting (acute or chronic) organ dysfunction
    -- before the onset of infection.
    , sofa_score >= 2 and suspected_infection = 1 as sepsis3
    -- subselect to the earliest suspicion/antibiotic/SOFA row
    , ROW_NUMBER() OVER--实现分组功能
    (
        PARTITION BY soi.stay_id--ICU的id,分组的根据, 有多条记录,分组为1,2,3,后面只取了=1的记录
        ORDER BY suspected_infection_time, antibiotic_time, culture_time, endtime--排序
    ) AS rn_sus
  FROM `physionet-data.mimic_derived.suspicion_of_infection` as soi
  INNER JOIN sofa
    ON soi.stay_id = sofa.stay_id
    AND sofa.endtime >= DATETIME_SUB(soi.suspected_infection_time, INTERVAL '48' HOUR)
    AND sofa.endtime <= DATETIME_ADD(soi.suspected_infection_time, INTERVAL '24' HOUR)
  -- only include in-ICU rows
  WHERE soi.stay_id is not null
)
--以下是主查询
SELECT
subject_id, stay_id
-- note: there may be more than one antibiotic given at this time
, antibiotic_time
-- culture times may be dates, rather than times
, culture_time
, suspected_infection_time
-- endtime is latest time at which the SOFA score is valid
, endtime as sofa_time
, sofa_score
, respiration, coagulation, liver, cardiovascular, cns, renal
, sepsis3
FROM s1
WHERE rn_sus = 1--

简单分析

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 34789 entries, 0 to 34788
Data columns (total 14 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   subject_id                34789 non-null  int64 
 1   stay_id                   34789 non-null  int64 
 2   antibiotic_time           34789 non-null  object
 3   culture_time              34789 non-null  object
 4   suspected_infection_time  34789 non-null  object
 5   sofa_time                 34789 non-null  object
 6   sofa_score                34789 non-null  int64 
 7   respiration               34789 non-null  int64 
 8   coagulation               34789 non-null  int64 
 9   liver                     34789 non-null  int64 
 10  cardiovascular            34789 non-null  int64 
 11  cns                       34789 non-null  int64 
 12  renal                     34789 non-null  int64 
 13  sepsis3                   34789 non-null  bool  
dtypes: bool(1), int64(9), object(4)
memory usage: 3.5+ MB
None
df.head()
subject_id	stay_id	antibiotic_time	culture_time	suspected_infection_time	sofa_time	sofa_score	respiration	coagulation	liver	cardiovascular	cns	renal	sepsis3
0	13184298	30099685	2127-02-15 18:00:00	2127-02-15 17:19:00	2127-02-15 17:19:00	2127-02-15 18:00:00	12	0	2	2	4	0	4	True
1	12688844	34925336	2158-06-12 08:00:00	2158-06-11 01:00:00	2158-06-11 01:00:00	2158-06-11 23:00:00	12	4	0	0	4	0	4	True
2	14755611	33075809	2179-02-11 10:00:00	2179-02-10 04:06:00	2179-02-10 04:06:00	2179-02-08 05:00:00	13	2	2	1	4	0	4	True
3	10723086	38690919	2191-10-01 09:00:00	2191-10-01 07:55:00	2191-10-01 07:55:00	2191-09-29 08:00:00	12	3	1	2	0	2	4	True
4	12453404	31892948	2117-03-03 18:00:00	2117-02-28 00:00:00	2117-02-28 00:00:00	2117-02-28 10:00:00	12	0	2	2	4	3	1	True
#sofa得分的count
2     13002
3      7728
4      5769
5      3157
6      1980
7      1193
8       762
9       500
10      315
11      175
12       87
13       50
14       32
15       20
16       11
17        4
20        2
18        1
19        1
Name: sofa_score, dtype: int64

在这里插入图片描述
附: 以上是在查询SOFA的基础上进行的二次查询, 我们也来对比一下SOFA是如何查询的。

-- ------------------------------------------------------------------
-- Title: Sequential Organ Failure Assessment (SOFA)
-- This query extracts the sequential organ failure assessment (formally: sepsis-related organ failure assessment).
-- This score is a measure of organ failure for patients in the ICU.
-- The score is calculated on the **first day** of each ICU patients' stay.--注意是firstday
-- ------------------------------------------------------------------

-- Reference for SOFA:
--    Jean-Louis Vincent, Rui Moreno, Jukka Takala, Sheila Willatts, Arnaldo De Mendonça,
--    Hajo Bruining, C. K. Reinhart, Peter M Suter, and L. G. Thijs.
--    "The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure."
--    Intensive care medicine 22, no. 7 (1996): 707-710.

-- Variables used in SOFA:--事先摸清变量储存的位置
--  GCS, MAP, FiO2, Ventilation status (sourced from CHARTEVENTS)
--  Creatinine, Bilirubin, FiO2, PaO2, Platelets (sourced from LABEVENTS)
--  Dopamine, Dobutamine, Epinephrine, Norepinephrine (sourced from INPUTEVENTS)
--  Urine output (sourced from OUTPUTEVENTS)

-- The following views required to run this query:
--  1) first_day_urine_output
--  2) first_day_vitalsign
--  3) first_day_gcs
--  4) first_day_lab
--  5) first_day_bg_art
--  6) ventdurations

-- extract drug rates from derived vasopressor tables
with vaso_stg as
(
  select ie.stay_id, 'norepinephrine' AS treatment, vaso_rate as rate
  FROM `physionet-data.mimic_icu.icustays` ie
  INNER JOIN `physionet-data.mimic_derived.norepinephrine` mv--mimic_derived 是一个很常用的中间库
    ON ie.stay_id = mv.stay_id
    AND mv.starttime >= DATETIME_SUB(ie.intime, INTERVAL '6' HOUR)
    AND mv.starttime <= DATETIME_ADD(ie.intime, INTERVAL '1' DAY)
  UNION ALL
  select ie.stay_id, 'epinephrine' AS treatment, vaso_rate as rate
  FROM `physionet-data.mimic_icu.icustays` ie
  INNER JOIN `physionet-data.mimic_derived.epinephrine` mv
    ON ie.stay_id = mv.stay_id
    AND mv.starttime >= DATETIME_SUB(ie.intime, INTERVAL '6' HOUR)
    AND mv.starttime <= DATETIME_ADD(ie.intime, INTERVAL '1' DAY)
  UNION ALL
  select ie.stay_id, 'dobutamine' AS treatment, vaso_rate as rate
  FROM `physionet-data.mimic_icu.icustays` ie
  INNER JOIN `physionet-data.mimic_derived.dobutamine` mv
    ON ie.stay_id = mv.stay_id
    AND mv.starttime >= DATETIME_SUB(ie.intime, INTERVAL '6' HOUR)
    AND mv.starttime <= DATETIME_ADD(ie.intime, INTERVAL '1' DAY)
  UNION ALL
  select ie.stay_id, 'dopamine' AS treatment, vaso_rate as rate
  FROM `physionet-data.mimic_icu.icustays` ie
  INNER JOIN `physionet-data.mimic_derived.dopamine` mv
    ON ie.stay_id = mv.stay_id
    AND mv.starttime >= DATETIME_SUB(ie.intime, INTERVAL '6' HOUR)
    AND mv.starttime <= DATETIME_ADD(ie.intime, INTERVAL '1' DAY)
)
, vaso_mv AS
(
    SELECT
    ie.stay_id
    , max(CASE WHEN treatment = 'norepinephrine' THEN rate ELSE NULL END) as rate_norepinephrine
    , max(CASE WHEN treatment = 'epinephrine' THEN rate ELSE NULL END) as rate_epinephrine
    , max(CASE WHEN treatment = 'dopamine' THEN rate ELSE NULL END) as rate_dopamine
    , max(CASE WHEN treatment = 'dobutamine' THEN rate ELSE NULL END) as rate_dobutamine
  from `physionet-data.mimic_icu.icustays` ie
  LEFT JOIN vaso_stg v
      ON ie.stay_id = v.stay_id
  GROUP BY ie.stay_id
)
, pafi1 as
(
  -- join blood gas to ventilation durations to determine if patient was vent
  select ie.stay_id, bg.charttime
  , bg.pao2fio2ratio
  , case when vd.stay_id is not null then 1 else 0 end as IsVent
  from `physionet-data.mimic_icu.icustays` ie
  LEFT JOIN `physionet-data.mimic_derived.bg` bg
      ON ie.subject_id = bg.subject_id
      AND bg.charttime >= DATETIME_SUB(ie.intime, INTERVAL '6' HOUR)
      AND bg.charttime <= DATETIME_ADD(ie.intime, INTERVAL '1' DAY)
  LEFT JOIN `physionet-data.mimic_derived.ventilation` vd
    ON ie.stay_id = vd.stay_id
    AND bg.charttime >= vd.starttime
    AND bg.charttime <= vd.endtime
    AND vd.ventilation_status = 'InvasiveVent'
)
, pafi2 as
(
  -- because pafi has an interaction between vent/PaO2:FiO2, we need two columns for the score
  -- it can happen that the lowest unventilated PaO2/FiO2 is 68, but the lowest ventilated PaO2/FiO2 is 120
  -- in this case, the SOFA score is 3, *not* 4.
  select stay_id
  , min(case when IsVent = 0 then pao2fio2ratio else null end) as PaO2FiO2_novent_min
  , min(case when IsVent = 1 then pao2fio2ratio else null end) as PaO2FiO2_vent_min
  from pafi1
  group by stay_id
)
-- Aggregate the components for the score
, scorecomp as
(
select ie.stay_id
  , v.mbp_min
  , mv.rate_norepinephrine
  , mv.rate_epinephrine
  , mv.rate_dopamine
  , mv.rate_dobutamine

  , l.creatinine_max
  , l.bilirubin_total_max as bilirubin_max
  , l.platelets_min as platelet_min

  , pf.PaO2FiO2_novent_min
  , pf.PaO2FiO2_vent_min

  , uo.UrineOutput

  , gcs.gcs_min
from `physionet-data.mimic_icu.icustays` ie
left join vaso_mv mv
  on ie.stay_id = mv.stay_id
left join pafi2 pf
 on ie.stay_id = pf.stay_id
left join `physionet-data.mimic_derived.first_day_vitalsign` v
  on ie.stay_id = v.stay_id
left join `physionet-data.mimic_derived.first_day_lab` l
  on ie.stay_id = l.stay_id
left join `physionet-data.mimic_derived.first_day_urine_output` uo
  on ie.stay_id = uo.stay_id
left join `physionet-data.mimic_derived.first_day_gcs` gcs
  on ie.stay_id = gcs.stay_id
)
, scorecalc as
(
  -- Calculate the final score
  -- note that if the underlying data is missing, the component is null
  -- eventually these are treated as 0 (normal), but knowing when data is missing is useful for debugging
  select stay_id
  -- Respiration
  , case
      when PaO2FiO2_vent_min   < 100 then 4
      when PaO2FiO2_vent_min   < 200 then 3
      when PaO2FiO2_novent_min < 300 then 2
      when PaO2FiO2_novent_min < 400 then 1
      when coalesce(PaO2FiO2_vent_min, PaO2FiO2_novent_min) is null then null
      else 0
    end as respiration

  -- Coagulation
  , case
      when platelet_min < 20  then 4
      when platelet_min < 50  then 3
      when platelet_min < 100 then 2
      when platelet_min < 150 then 1
      when platelet_min is null then null
      else 0
    end as coagulation

  -- Liver
  , case
      -- Bilirubin checks in mg/dL
        when bilirubin_max >= 12.0 then 4
        when bilirubin_max >= 6.0  then 3
        when bilirubin_max >= 2.0  then 2
        when bilirubin_max >= 1.2  then 1
        when bilirubin_max is null then null
        else 0
      end as liver

  -- Cardiovascular
  , case
      when rate_dopamine > 15 or rate_epinephrine >  0.1 or rate_norepinephrine >  0.1 then 4
      when rate_dopamine >  5 or rate_epinephrine <= 0.1 or rate_norepinephrine <= 0.1 then 3
      when rate_dopamine >  0 or rate_dobutamine > 0 then 2
      when mbp_min < 70 then 1
      when coalesce(mbp_min, rate_dopamine, rate_dobutamine, rate_epinephrine, rate_norepinephrine) is null then null
      else 0
    end as cardiovascular

  -- Neurological failure (GCS)
  , case
      when (gcs_min >= 13 and gcs_min <= 14) then 1
      when (gcs_min >= 10 and gcs_min <= 12) then 2
      when (gcs_min >=  6 and gcs_min <=  9) then 3
      when  gcs_min <   6 then 4
      when  gcs_min is null then null
  else 0 end
    as cns

  -- Renal failure - high creatinine or low urine output
  , case
    when (creatinine_max >= 5.0) then 4
    when  UrineOutput < 200 then 4
    when (creatinine_max >= 3.5 and creatinine_max < 5.0) then 3
    when  UrineOutput < 500 then 3
    when (creatinine_max >= 2.0 and creatinine_max < 3.5) then 2
    when (creatinine_max >= 1.2 and creatinine_max < 2.0) then 1
    when coalesce(UrineOutput, creatinine_max) is null then null
  else 0 end
    as renal
  from scorecomp
)
select ie.subject_id, ie.hadm_id, ie.stay_id
  -- Combine all the scores to get SOFA
  -- Impute 0 if the score is missing
  , coalesce(respiration,0)
  + coalesce(coagulation,0)
  + coalesce(liver,0)
  + coalesce(cardiovascular,0)
  + coalesce(cns,0)
  + coalesce(renal,0)
  as SOFA
, respiration
, coagulation
, liver
, cardiovascular
, cns
, renal
from `physionet-data.mimic_icu.icustays` ie
left join scorecalc s
  on ie.stay_id = s.stay_id
;
评论 8
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

预测模型的开发与应用研究

文中代码请大家随意

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值