MIMIC-IV数据库衍生表格mimic_derived配置和使用

本文详细介绍了如何配置和使用MIMIC-IV数据库的衍生表格mimic_derived,包括数据示例、衍生表格的生成步骤、实例应用(如计算患者真实年龄和筛选特定条件的首次入院记录),以及官方函数的安装和使用方法。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

图片

有很多同学反馈上一篇文章  

手把手教会你认识跟安装MIMIC-IV物化视图

没有完全看懂,不知道如何配置物化视图或衍生表格 和使用衍生表格

本篇文章旨在系统介绍下衍生表格mimic_derived的配置, 并演示如何使用。

使用以下实例您还需要用到官方函数, 可以看往期这篇

MIMIC数据库官方函数的安装与使用

一、MIMIC-IV数据库衍生表格(mimic_derived)简介

MIMIC IV数据库衍生表格实际上就是对数据库数据进一步归纳整理后的数据。由于

mimic_derived 参与了很多内容的查询,熟悉这个模块的内容还是很有必要的,但是官方

对这个模块缺乏详细的介绍。模块mimic_derived, 而且这个模块仅存在与谷歌云的mimic

数据库中, 本地建立的数据库中虽然有这个模块,但是没有内容,需要自己根据官方提供的

代码生成。

二、衍生表格示例

这些示例每一个后面都会单章一一讲解

1.age,为患者入院(admission)时的年龄,存储在age列。

图片

SELECT     ad.subject_id  , ad.hadm_id  , ad.admittime  , pa.anchor_age  , pa.anchor_year  , mimiciv_derived.DATETIME_DIFF(ad.admittime, mimiciv_derived.DATETIME(pa.anchor_year, 1, 1, 0, 0,0),'YEAR') + pa.anchor_age AS ageFROM mimiciv_hosp.admissions  AS adINNER JOIN mimiciv_hosp.patients  AS paON ad.subject_id = pa.subject_id;

图片

这里就使用了官方函数 "mimiciv_derived.DATETIME_DIFF" 和 “mimiciv_derived.DATETIME”, 如果您没有安装官方函数就会报错!

安装好的官方函数长这样:

图片

2.weight_duration, ICU期间体重的变化,体重是反应患者营养状况的重要因素。

-- THIS SCRIPT IS AUTOMATICALLY GENERATED. DO NOT EDIT IT DIRECTLY.DROP TABLE IF EXISTS weight_durations; CREATE TABLE weight_durations AS -- This query extracts weights for adult ICU patients with start/stop times-- if an admission weight is given, then this is assigned from intime to outtimeWITH wt_stg as(    SELECT        c.stay_id      , c.charttime      , case when c.itemid = 226512 then 'admit'          else 'daily' end as weight_type      -- TODO: eliminate obvious outliers if there is a reasonable weight      , c.valuenum as weight    FROM mimic_icu.chartevents c    WHERE c.valuenum IS NOT NULL      AND c.itemid in      (          226512 -- Admit Wt          , 224639 -- Daily Weight      )      AND c.valuenum > 0)-- assign ascending row number, wt_stg1 as(  select      stay_id    , charttime    , weight_type    , weight    , ROW_NUMBER() OVER (partition by stay_id, weight_type order by charttime) as rn  from wt_stg  WHERE weight IS NOT NULL)-- change charttime to intime for the first admission weight recorded, wt_stg2 AS(  SELECT       wt_stg1.stay_id    , ie.intime, ie.outtime    , wt_stg1.weight_type    , case when wt_stg1.weight_type = 'admit' and wt_stg1.rn = 1        then DATETIME_SUB(ie.intime, INTERVAL '2' HOUR)      else wt_stg1.charttime end as starttime    , wt_stg1.weight  from wt_stg1  INNER JOIN mimic_icu.icustays ie    on ie.stay_id = wt_stg1.stay_id), wt_stg3 as(  select    stay_id    , intime, outtime    , starttime    , coalesce(        LEAD(starttime) OVER (PARTITION BY stay_id ORDER BY starttime),        DATETIME_ADD(outtime, INTERVAL '2' HOUR)      ) as endtime    , weight    , weight_type  from wt_stg2)-- this table is the start/stop times from admit/daily weight in charted data, wt1 as(  select      stay_id    , starttime    , coalesce(endtime,      LEAD(starttime) OVER (partition by stay_id order by starttime),      -- impute ICU discharge as the end of the final weight measurement      -- plus a 2 hour "fuzziness" window      DATETIME_ADD(outtime, INTERVAL '2' HOUR)    ) as endtime    , weight    , weight_type  from wt_stg3)-- if the intime for the patient is < the first charted daily weight-- then we will have a "gap" at the start of their stay-- to prevent this, we look for these gaps and backfill the first weight-- this adds (153255-149657)=3598 rows, meaning this fix helps for up to 3598 stay_id, wt_fix as(  select ie.stay_id    -- we add a 2 hour "fuzziness" window    , DATETIME_SUB(ie.intime, INTERVAL '2' HOUR) as starttime    , wt.starttime as endtime    , wt.weight    , wt.weight_type  from mimic_icu.icustays ie  inner join  -- the below subquery returns one row for each unique stay_id  -- the row contains: the first starttime and the corresponding weight  (    SELECT wt1.stay_id, wt1.starttime, wt1.weight    , weight_type    , ROW_NUMBER() OVER (PARTITION BY wt1.stay_id ORDER BY wt1.starttime) as rn    FROM wt1  ) wt    ON  ie.stay_id = wt.stay_id    AND wt.rn = 1    and ie.intime < wt.starttime)-- add the backfill rows to the main weight tableSELECTwt1.stay_id, wt1.starttime, wt1.endtime, wt1.weight, wt1.weight_typeFROM wt1UNION ALLSELECTwt_fix.stay_id, wt_fix.starttime, wt_fix.endtime, wt_fix.weight, wt_fix.weight_typeFROM wt_fix;

3.GCS评分, 神经系统功能评分。

with base as(  select    subject_id  , ce.stay_id, ce.charttime  -- pivot each value into its own column  , max(case when ce.ITEMID = 223901 then ce.valuenum else null end) as GCSMotor  , max(case      when ce.ITEMID = 223900 and ce.VALUE = 'No Response-ETT' then 0      when ce.ITEMID = 223900 then ce.valuenum      else null    end) as GCSVerbal  , max(case when ce.ITEMID = 220739 then ce.valuenum else null end) as GCSEyes  -- convert the data into a number, reserving a value of 0 for ET/Trach  , max(case      -- endotrach/vent is assigned a value of 0      -- flag it here to later parse specially      when ce.ITEMID = 223900 and ce.VALUE = 'No Response-ETT' then 1 -- metavision    else 0 end)    as endotrachflag  , ROW_NUMBER ()          OVER (PARTITION BY ce.stay_id ORDER BY ce.charttime ASC) as rn  from mimic_icu.chartevents ce  -- Isolate the desired GCS variables  where ce.ITEMID in  (    -- GCS components, Metavision    223900, 223901, 220739  )  group by ce.subject_id, ce.stay_id, ce.charttime), gcs as (  select b.*  , b2.GCSVerbal as GCSVerbalPrev  , b2.GCSMotor as GCSMotorPrev  , b2.GCSEyes as GCSEyesPrev  -- Calculate GCS, factoring in special case when they are intubated and prev vals  -- note that the coalesce are used to implement the following if:  --  if current value exists, use it  --  if previous value exists, use it  --  otherwise, default to normal  , case      -- replace GCS during sedation with 15      when b.GCSVerbal = 0        then 15      when b.GCSVerbal is null and b2.GCSVerbal = 0        then 15      -- if previously they were intub, but they aren't now, do not use previous GCS values      when b2.GCSVerbal = 0        then            coalesce(b.GCSMotor,6)          + coalesce(b.GCSVerbal,5)          + coalesce(b.GCSEyes,4)      -- otherwise, add up score normally, imputing previous value if none available at current time      else            coalesce(b.GCSMotor,coalesce(b2.GCSMotor,6))          + coalesce(b.GCSVerbal,coalesce(b2.GCSVerbal,5))          + coalesce(b.GCSEyes,coalesce(b2.GCSEyes,4))      end as GCS
  from base b  -- join to itself within 6 hours to get previous value  left join base b2    on b.stay_id = b2.stay_id    and b.rn = b2.rn+1    and b2.charttime > DATETIME_ADD(b.charttime, INTERVAL '6' HOUR))-- combine components with previous within 6 hours-- filter down to cohort which is not excluded-- truncate charttime to the hour, gcs_stg as(  select    subject_id  , gs.stay_id, gs.charttime  , GCS  , coalesce(GCSMotor,GCSMotorPrev) as GCSMotor  , coalesce(GCSVerbal,GCSVerbalPrev) as GCSVerbal  , coalesce(GCSEyes,GCSEyesPrev) as GCSEyes  , case when coalesce(GCSMotor,GCSMotorPrev) is null then 0 else 1 end  + case when coalesce(GCSVerbal,GCSVerbalPrev) is null then 0 else 1 end  + case when coalesce(GCSEyes,GCSEyesPrev) is null then 0 else 1 end    as components_measured  , EndoTrachFlag  from gcs gs)-- priority is:--  (i) complete data, (ii) non-sedated GCS, (iii) lowest GCS, (iv) charttime, gcs_priority as(  select      subject_id    , stay_id    , charttime    , gcs    , gcsmotor    , gcsverbal    , gcseyes    , EndoTrachFlag    , ROW_NUMBER() over      (        PARTITION BY stay_id, charttime        ORDER BY components_measured DESC, endotrachflag, gcs, charttime DESC      ) as rn  from gcs_stg)select  gs.subject_id  , gs.stay_id  , gs.charttime  , GCS AS gcs  , GCSMotor AS gcs_motor  , GCSVerbal AS gcs_verbal  , GCSEyes AS gcs_eyes  , EndoTrachFlag AS gcs_unablefrom gcs_priority gswhere rn = 1;

只是简单列举了几个常用表格,具体的表格说明可以查询github上官方源代码中的concept文件夹内的代码和注释。

图片

之前未下载的同学也可以后台回复 【mimiccode】领取

三、衍生表格配置

1.参考前面数据安装步骤,打开命令界面,进入mimiciv数据库,配置数据库函数

这里具体看这篇文章  MIMIC数据库官方函数的安装与使用

set search_path to mimic_derived;

图片

2.运行shell脚本生成衍生表格数据文件夹postgres(windows运行shell脚本可通过安装git运行,可自行百度安装)。

图片

3.衍生表格数据。

图片

4.修改postgres-make-concepts.sql文件。

图片

5.运行postgres-make-concepts.sql文件(注意替换为自己的文件路径)

运行时间有点长,耐心等待。

 set search_path to mimic_derived; \encoding 'UTF-8';  \i F:/mimic/mimic-code-main/mimic-iv/concepts/postgres/postgres-make-concepts.sql;

6.表格生成成功,一共58个表格。

图片

图片

四、实例使用演示

我们分析的是一篇利用MIMIC数据库发表的SCI论文,这是一篇2022年12月份发表在《Inflammation》期刊上的论文。

论文名为《Association between lactate-to-albumin ratio and 28-days all-cause mortality in patients with acute pancreatitis: A retrospective analysis of the MIMIC-IV database.》

中文名为《急性胰腺炎患者乳酸白蛋白比值与28天内全因死亡率的关系:对MIMIC-IV数据库的回顾性分析》

图片

关注公众号“科研收录”,后台回复“急性胰腺炎患者死亡率”,即可获得该SCI论文

该论文在使用MIMIC数据库时候,筛选患者的标准如下:

图片

所以我们这里讲围绕【如何提取年龄大于18岁的患者的首次入院记录】来教大家提取数据

4.1 年龄过滤:首次入院时年龄小于18岁的患者

mimiciv数据库中并没有直接记录患者的真实年龄,mimiciv_hosp.patients表中有一个关于关于年龄的字段anchor_age,注意这并不是患者的真实年龄,是是经过mimic偏移的年龄。

图片

4.1.1 计算患者的真实年龄

患者的真实年龄 = anchor_age + admittime - anchor_year

  • anchor_age,anchor_year可以在mimiciv_hosp.patients表中找到

  • 患者的入院年龄可以在mimiciv_hosp.admissions表中找到。

以下为患者的真实年龄的计算SQL:

pa.anchor_age + mimiciv_derived.DATETIME_DIFF(ad.admittime, 
    mimiciv_derived.DATETIME(pa.anchor_year, 1, 1, 0, 0, 0)

注意:如果你将官方函数DATETIME_DIFF与DATETIME安装在了public模块中(小编这里安装在的mimiciv_derived),请修改成如下

pa.anchor_age + public.DATETIME_DIFF(ad.admittime, 
    public.DATETIME(pa.anchor_year, 1, 1, 0, 0, 0)

这个算法算出来的年龄是小数,我们可以使用ROUND函数转成整数,完整SQL如下:

图片

SELECT  ad.subject_id,  MAX (    ROUND(      pa.anchor_age + mimiciv_derived.datetime_diff ( ad.admittime, mimiciv_derived.DATETIME ( pa.anchor_year, 1, 1, 0, 0, 0 ), 'YEAR' ),      0     )) AS 患者真实年龄 FROM  mimiciv_hosp.admissions AS ad,  mimiciv_hosp.patients AS pa WHERE  ad.subject_id = pa.subject_id GROUP BY  ad.subject_id   LIMIT 200;

4.1.2 排除小于18岁的患者

我们可以先使用子查询先查询出来患者的真实年龄,再排除掉小于18岁的患者

关于子查询语句我们将在下一篇文章 《MIMIC数据库,常用查询指令SQL基础(二)》中详细讲解,这里大家可以先照着用就行

​​​​​​​

with base as (SELECT    ad.subject_id,MAX(ROUND(pa.anchor_age + mimiciv_derived.DATETIME_DIFF(ad.admittime,     mimiciv_derived.DATETIME(pa.anchor_year, 1, 1, 0, 0, 0), 'YEAR'),0)) AS age
FROM mimiciv_hosp.admissions ad, mimiciv_hosp.patients paWHERE ad.subject_id = pa.subject_idGROUP BY    ad.subject_id)SELECT * FROM base WHERE base.age >=18 limit 100;

图片

以上我们就筛选出大于等于18岁的患者,大家可以根据代码修改筛选出来其他年龄段的患者。

4.2 首次入院记录:因急性胰腺反复入院的患者,仅保留首次入院数据

mimiciv数据库记录了29万个病人的43万条入院记录,但是我们在做数据分析和提取的时候,通常只需要提取某个病人的首次入院记录。

4.2.1 入院时间排序

要获取患者的首次入院记录,我们需要使用到postgres数据库的内置函数ROW_NUMBER函数,该函数会将患者进行分组,并可以按照入院时间排序。

关于SQL语句ORDER BY 使用, 看这里  

SELECT ADM.SUBJECT_ID,  ADM.HADM_ID,  ADM.ADMITTIME,  ROW_NUMBER () OVER(PARTITION BY ADM.SUBJECT_ID    ORDER BY ADM.ADMITTIME) AS ADMITTIME_RANKFROM MIMICIV_HOSP.ADMISSIONS AS ADMLIMIT 100;

图片

4.2.2 获取首次入院记录

利用postgres数据库的子查询,过滤出患者的首次入院记录

​​​​​​​

WITH base AS (SELECT ADM.SUBJECT_ID,  ADM.HADM_ID,  ADM.ADMITTIME,  ROW_NUMBER () OVER(PARTITION BY ADM.SUBJECT_ID    ORDER BY ADM.ADMITTIME) AS ADMITTIME_RANKFROM MIMICIV_HOSP.ADMISSIONS AS ADM) SELECT * FROM base WHERE base.ADMITTIME_RANK = 1 LIMIT 200;

图片

4.3 合并数据

根据年龄过滤出符合条件的患者,以及查出了患者的首次入院记录之后,我们需要把这两部分数据进行合并,也就是取出符合年龄条件的首次入院记录。

我们可以使用postgres数据库的子查询把这两部分数据合并在一起。

WITH BASE0 AS (  SELECT AD.hadm_id,  MAX(ROUND(pa.anchor_age + mimiciv_derived.datetime_diff(ad.admittime, mimiciv_derived.datetime(pa.anchor_year, 1, 1, 0, 0, 0), 'YEAR'), 0)) AS age    FROM mimiciv_hosp.admissions AS AD, mimiciv_hosp.patients AS PA    WHERE ad.subject_id = pa.subject_id    GROUP BY ad.hadm_id LIMIT 1000),  BASE1 AS    (      SELECT ADM.subject_id, ADM.hadm_id, ADM.admittime,      ROW_NUMBER() OVER( PARTITION BY ADM.subject_id ORDER BY ADM.admittime) AS admittime_rank      FROM mimiciv_hosp.admissions AS ADM  )
SELECT * FROM BASE0, BASE1WHERE BASE0.age >= 18 AND BASE1.admittime_rank = 1 AND BASE0.hadm_id = base1.hadm_idLIMIT 200;

图片

想获取mimiciv数据或者有问题的小伙伴可关注【科研收录】后台回复,谢谢!

阅读论文的同学会发现原文对提取的数据做了可视化图表展示和预测回归分析这部分我们会在往后的R语言和Python数据分析教程中涉及,敬请期待。

图片

图片

Mimic-IV数据库是一个用于医学研究的大规模合成人工历史数据库。它包含了来自美国医疗机构的庞大数量的医疗数据,能够帮助医学研究人员更好地了解分析疾病、治疗医疗实践。 要导入Mimic-IV数据库,我们需要按照以下步骤进行操作: 1. 获得Mimic-IV数据库:首先需要从官方网站或授权机构处获得Mimic-IV数据库。下载并解压缩数据库文件,确保电脑有足够的存储空间。 2. 安装数据库管理系统:Mimic-IV数据库使用PostgreSQL管理系统,因此需要先下载安装最新版本的PostgreSQL。 3. 创建数据库:在PostgreSQL中创建一个新的数据库,用于存储Mimic-IV数据。可以使用命令行或图形界面工具来创建。 4. 导入数据:将Mimic-IV数据库中的数据导入到新创建的数据库中。可以使用PostgreSQL的命令行工具或图形界面工具来执行此操作。 5. 确认导入成功:导入数据需要一定的时间,取决于数据的大小计算机的性能。在导入完成后,可以查询数据库中的一些表格记录,以确保导入成功。 需要注意的是,导入Mimic-IV数据库需要一定的计算机技术知识经验,以确保操作的准确性成功率。在处理敏感医疗数据时,务必要保护隐私数据安全。因此,在导入过程中,我们应遵循合适的安全措施,并与相关机构沟通,确保操作的合规性合法性。
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值