说明:
'SOFA’的意思是序贯器官衰竭评分表。
本SQL的目的就是查出计算Sepsis3的相关的资料。
展示了MIMIC查询的一种策略,即在官方SQL的基础上进一步进行查询,本SQL是在SOFA.sql查询和suspicion_of_infection.sql查询结果上的基础上进行的查询。
多条记录变为一条记录的另一种方式。rownumber()over的用法。
-- Creates a table with "onset" time of Sepsis-3 in the ICU.目的:Sepsis-3形成的时间
-- That is, the **earliest time** at which a patient had SOFA >= 2 and suspicion of infection.
-- As many variables used in SOFA are only collected in the ICU, this query can only
-- define sepsis-3 onset within the ICU.仅限于ICU患者
-- extract rows with SOFA >= 2
-- implicitly this assumes baseline SOFA was 0 before ICU admission.
WITH sofa AS--临时表,是从一个叫做mimic_derived的模块的sofa表格中获取的,算是二次获取
(
SELECT stay_id
, starttime, endtime
, respiration_24hours as respiration--呼吸
, coagulation_24hours as coagulation--血小板
, liver_24hours as liver--肝
, cardiovascular_24hours as cardiovascular--心
, cns_24hours as cns--神经
, renal_24hours as renal--肾脏
, sofa_24hours as sofa_score
FROM `physionet-data.mimic_derived.sofa`
WHERE sofa_24hours >= 2
)
, s1 as
(
SELECT
soi.subject_id
, soi.stay_id
-- suspicion columns
, soi.ab_id
, soi.antibiotic
, soi.antibiotic_time
, soi.culture_time
, soi.suspected_infection
, soi.suspected_infection_time
, soi.specimen
, soi.positive_culture
-- sofa columns
, starttime, endtime
, respiration, coagulation, liver, cardiovascular, cns, renal
, sofa_score
-- All rows have an associated suspicion of infection event
-- Therefore, Sepsis-3 is defined as SOFA >= 2.
-- Implicitly, the baseline SOFA score is assumed to be zero, as we do not know
-- if the patient has preexisting (acute or chronic) organ dysfunction
-- before the onset of infection.
, sofa_score >= 2 and suspected_infection = 1 as sepsis3
-- subselect to the earliest suspicion/antibiotic/SOFA row
, ROW_NUMBER() OVER--实现分组功能
(
PARTITION BY soi.stay_id--ICU的id,分组的根据, 有多条记录,分组为1,2,3,后面只取了=1的记录
ORDER BY suspected_infection_time, antibiotic_time, culture_time, endtime--排序
) AS rn_sus
FROM `physionet-data.mimic_derived.suspicion_of_infection` as soi
INNER JOIN sofa
ON soi.stay_id = sofa.stay_id
AND sofa.endtime >= DATETIME_SUB(soi.suspected_infection_time, INTERVAL '48' HOUR)
AND sofa.endtime <= DATETIME_ADD(soi.suspected_infection_time, INTERVAL '24' HOUR)
-- only include in-ICU rows
WHERE soi.stay_id is not null
)
--以下是主查询
SELECT
subject_id, stay_id
-- note: there may be more than one antibiotic given at this time
, antibiotic_time
-- culture times may be dates, rather than times
, culture_time
, suspected_infection_time
-- endtime is latest time at which the SOFA score is valid
, endtime as sofa_time
, sofa_score
, respiration, coagulation, liver, cardiovascular, cns, renal
, sepsis3
FROM s1
WHERE rn_sus = 1--
简单分析
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 34789 entries, 0 to 34788
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 subject_id 34789 non-null int64
1 stay_id 34789 non-null int64
2 antibiotic_time 34789 non-null object
3 culture_time 34789 non-null object
4 suspected_infection_time 34789 non-null object
5 sofa_time 34789 non-null object
6 sofa_score 34789 non-null int64
7 respiration 34789 non-null int64
8 coagulation 34789 non-null int64
9 liver 34789 non-null int64
10 cardiovascular 34789 non-null int64
11 cns 34789 non-null int64
12 renal 34789 non-null int64
13 sepsis3 34789 non-null bool
dtypes: bool(1), int64(9), object(4)
memory usage: 3.5+ MB
None
df.head()
subject_id stay_id antibiotic_time culture_time suspected_infection_time sofa_time sofa_score respiration coagulation liver cardiovascular cns renal sepsis3
0 13184298 30099685 2127-02-15 18:00:00 2127-02-15 17:19:00 2127-02-15 17:19:00 2127-02-15 18:00:00 12 0 2 2 4 0 4 True
1 12688844 34925336 2158-06-12 08:00:00 2158-06-11 01:00:00 2158-06-11 01:00:00 2158-06-11 23:00:00 12 4 0 0 4 0 4 True
2 14755611 33075809 2179-02-11 10:00:00 2179-02-10 04:06:00 2179-02-10 04:06:00 2179-02-08 05:00:00 13 2 2 1 4 0 4 True
3 10723086 38690919 2191-10-01 09:00:00 2191-10-01 07:55:00 2191-10-01 07:55:00 2191-09-29 08:00:00 12 3 1 2 0 2 4 True
4 12453404 31892948 2117-03-03 18:00:00 2117-02-28 00:00:00 2117-02-28 00:00:00 2117-02-28 10:00:00 12 0 2 2 4 3 1 True
#sofa得分的count
2 13002
3 7728
4 5769
5 3157
6 1980
7 1193
8 762
9 500
10 315
11 175
12 87
13 50
14 32
15 20
16 11
17 4
20 2
18 1
19 1
Name: sofa_score, dtype: int64
附: 以上是在查询SOFA的基础上进行的二次查询, 我们也来对比一下SOFA是如何查询的。
-- ------------------------------------------------------------------
-- Title: Sequential Organ Failure Assessment (SOFA)
-- This query extracts the sequential organ failure assessment (formally: sepsis-related organ failure assessment).
-- This score is a measure of organ failure for patients in the ICU.
-- The score is calculated on the **first day** of each ICU patients' stay.--注意是firstday
-- ------------------------------------------------------------------
-- Reference for SOFA:
-- Jean-Louis Vincent, Rui Moreno, Jukka Takala, Sheila Willatts, Arnaldo De Mendonça,
-- Hajo Bruining, C. K. Reinhart, Peter M Suter, and L. G. Thijs.
-- "The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure."
-- Intensive care medicine 22, no. 7 (1996): 707-710.
-- Variables used in SOFA:--事先摸清变量储存的位置
-- GCS, MAP, FiO2, Ventilation status (sourced from CHARTEVENTS)
-- Creatinine, Bilirubin, FiO2, PaO2, Platelets (sourced from LABEVENTS)
-- Dopamine, Dobutamine, Epinephrine, Norepinephrine (sourced from INPUTEVENTS)
-- Urine output (sourced from OUTPUTEVENTS)
-- The following views required to run this query:
-- 1) first_day_urine_output
-- 2) first_day_vitalsign
-- 3) first_day_gcs
-- 4) first_day_lab
-- 5) first_day_bg_art
-- 6) ventdurations
-- extract drug rates from derived vasopressor tables
with vaso_stg as
(
select ie.stay_id, 'norepinephrine' AS treatment, vaso_rate as rate
FROM `physionet-data.mimic_icu.icustays` ie
INNER JOIN `physionet-data.mimic_derived.norepinephrine` mv--mimic_derived 是一个很常用的中间库
ON ie.stay_id = mv.stay_id
AND mv.starttime >= DATETIME_SUB(ie.intime, INTERVAL '6' HOUR)
AND mv.starttime <= DATETIME_ADD(ie.intime, INTERVAL '1' DAY)
UNION ALL
select ie.stay_id, 'epinephrine' AS treatment, vaso_rate as rate
FROM `physionet-data.mimic_icu.icustays` ie
INNER JOIN `physionet-data.mimic_derived.epinephrine` mv
ON ie.stay_id = mv.stay_id
AND mv.starttime >= DATETIME_SUB(ie.intime, INTERVAL '6' HOUR)
AND mv.starttime <= DATETIME_ADD(ie.intime, INTERVAL '1' DAY)
UNION ALL
select ie.stay_id, 'dobutamine' AS treatment, vaso_rate as rate
FROM `physionet-data.mimic_icu.icustays` ie
INNER JOIN `physionet-data.mimic_derived.dobutamine` mv
ON ie.stay_id = mv.stay_id
AND mv.starttime >= DATETIME_SUB(ie.intime, INTERVAL '6' HOUR)
AND mv.starttime <= DATETIME_ADD(ie.intime, INTERVAL '1' DAY)
UNION ALL
select ie.stay_id, 'dopamine' AS treatment, vaso_rate as rate
FROM `physionet-data.mimic_icu.icustays` ie
INNER JOIN `physionet-data.mimic_derived.dopamine` mv
ON ie.stay_id = mv.stay_id
AND mv.starttime >= DATETIME_SUB(ie.intime, INTERVAL '6' HOUR)
AND mv.starttime <= DATETIME_ADD(ie.intime, INTERVAL '1' DAY)
)
, vaso_mv AS
(
SELECT
ie.stay_id
, max(CASE WHEN treatment = 'norepinephrine' THEN rate ELSE NULL END) as rate_norepinephrine
, max(CASE WHEN treatment = 'epinephrine' THEN rate ELSE NULL END) as rate_epinephrine
, max(CASE WHEN treatment = 'dopamine' THEN rate ELSE NULL END) as rate_dopamine
, max(CASE WHEN treatment = 'dobutamine' THEN rate ELSE NULL END) as rate_dobutamine
from `physionet-data.mimic_icu.icustays` ie
LEFT JOIN vaso_stg v
ON ie.stay_id = v.stay_id
GROUP BY ie.stay_id
)
, pafi1 as
(
-- join blood gas to ventilation durations to determine if patient was vent
select ie.stay_id, bg.charttime
, bg.pao2fio2ratio
, case when vd.stay_id is not null then 1 else 0 end as IsVent
from `physionet-data.mimic_icu.icustays` ie
LEFT JOIN `physionet-data.mimic_derived.bg` bg
ON ie.subject_id = bg.subject_id
AND bg.charttime >= DATETIME_SUB(ie.intime, INTERVAL '6' HOUR)
AND bg.charttime <= DATETIME_ADD(ie.intime, INTERVAL '1' DAY)
LEFT JOIN `physionet-data.mimic_derived.ventilation` vd
ON ie.stay_id = vd.stay_id
AND bg.charttime >= vd.starttime
AND bg.charttime <= vd.endtime
AND vd.ventilation_status = 'InvasiveVent'
)
, pafi2 as
(
-- because pafi has an interaction between vent/PaO2:FiO2, we need two columns for the score
-- it can happen that the lowest unventilated PaO2/FiO2 is 68, but the lowest ventilated PaO2/FiO2 is 120
-- in this case, the SOFA score is 3, *not* 4.
select stay_id
, min(case when IsVent = 0 then pao2fio2ratio else null end) as PaO2FiO2_novent_min
, min(case when IsVent = 1 then pao2fio2ratio else null end) as PaO2FiO2_vent_min
from pafi1
group by stay_id
)
-- Aggregate the components for the score
, scorecomp as
(
select ie.stay_id
, v.mbp_min
, mv.rate_norepinephrine
, mv.rate_epinephrine
, mv.rate_dopamine
, mv.rate_dobutamine
, l.creatinine_max
, l.bilirubin_total_max as bilirubin_max
, l.platelets_min as platelet_min
, pf.PaO2FiO2_novent_min
, pf.PaO2FiO2_vent_min
, uo.UrineOutput
, gcs.gcs_min
from `physionet-data.mimic_icu.icustays` ie
left join vaso_mv mv
on ie.stay_id = mv.stay_id
left join pafi2 pf
on ie.stay_id = pf.stay_id
left join `physionet-data.mimic_derived.first_day_vitalsign` v
on ie.stay_id = v.stay_id
left join `physionet-data.mimic_derived.first_day_lab` l
on ie.stay_id = l.stay_id
left join `physionet-data.mimic_derived.first_day_urine_output` uo
on ie.stay_id = uo.stay_id
left join `physionet-data.mimic_derived.first_day_gcs` gcs
on ie.stay_id = gcs.stay_id
)
, scorecalc as
(
-- Calculate the final score
-- note that if the underlying data is missing, the component is null
-- eventually these are treated as 0 (normal), but knowing when data is missing is useful for debugging
select stay_id
-- Respiration
, case
when PaO2FiO2_vent_min < 100 then 4
when PaO2FiO2_vent_min < 200 then 3
when PaO2FiO2_novent_min < 300 then 2
when PaO2FiO2_novent_min < 400 then 1
when coalesce(PaO2FiO2_vent_min, PaO2FiO2_novent_min) is null then null
else 0
end as respiration
-- Coagulation
, case
when platelet_min < 20 then 4
when platelet_min < 50 then 3
when platelet_min < 100 then 2
when platelet_min < 150 then 1
when platelet_min is null then null
else 0
end as coagulation
-- Liver
, case
-- Bilirubin checks in mg/dL
when bilirubin_max >= 12.0 then 4
when bilirubin_max >= 6.0 then 3
when bilirubin_max >= 2.0 then 2
when bilirubin_max >= 1.2 then 1
when bilirubin_max is null then null
else 0
end as liver
-- Cardiovascular
, case
when rate_dopamine > 15 or rate_epinephrine > 0.1 or rate_norepinephrine > 0.1 then 4
when rate_dopamine > 5 or rate_epinephrine <= 0.1 or rate_norepinephrine <= 0.1 then 3
when rate_dopamine > 0 or rate_dobutamine > 0 then 2
when mbp_min < 70 then 1
when coalesce(mbp_min, rate_dopamine, rate_dobutamine, rate_epinephrine, rate_norepinephrine) is null then null
else 0
end as cardiovascular
-- Neurological failure (GCS)
, case
when (gcs_min >= 13 and gcs_min <= 14) then 1
when (gcs_min >= 10 and gcs_min <= 12) then 2
when (gcs_min >= 6 and gcs_min <= 9) then 3
when gcs_min < 6 then 4
when gcs_min is null then null
else 0 end
as cns
-- Renal failure - high creatinine or low urine output
, case
when (creatinine_max >= 5.0) then 4
when UrineOutput < 200 then 4
when (creatinine_max >= 3.5 and creatinine_max < 5.0) then 3
when UrineOutput < 500 then 3
when (creatinine_max >= 2.0 and creatinine_max < 3.5) then 2
when (creatinine_max >= 1.2 and creatinine_max < 2.0) then 1
when coalesce(UrineOutput, creatinine_max) is null then null
else 0 end
as renal
from scorecomp
)
select ie.subject_id, ie.hadm_id, ie.stay_id
-- Combine all the scores to get SOFA
-- Impute 0 if the score is missing
, coalesce(respiration,0)
+ coalesce(coagulation,0)
+ coalesce(liver,0)
+ coalesce(cardiovascular,0)
+ coalesce(cns,0)
+ coalesce(renal,0)
as SOFA
, respiration
, coagulation
, liver
, cardiovascular
, cns
, renal
from `physionet-data.mimic_icu.icustays` ie
left join scorecalc s
on ie.stay_id = s.stay_id
;