Multitask learning and benchmarking with clinical time series data阅读记录

0. Summary

1. Research Objective

  • To propose a public benchmark suite that includes four diferent clinical prediction tasks: in-hospital mortality, physiologic decompensation, length of stay (LOS), and phenotype classifcation.This setup of benchmarks allows to formulate a heterogeneous multitask learning problem that involves jointly learning all four prediction tasks simultaneously.

2. Background and Problems

Background

  • The large volume of digital clinical data.

Problems

  • The absence of widely accepted benchmarks to evaluate competing models. The practical progress in clinical machine learning has been difcult to measure due to variability in data sets and task defnitions.
  • Most of the researchers develop new methods for one clinical prediction task at a time.

The heterogeneous nature of multitask requires a modeling solution that can not only handle sequence data but also model correlations between tasks distributed in time.

3. Method

Benchmark tasks

  • Some terminology
    In MIMIC-III:subjects, admissions, ICU stays/episodes, event
    In this paper:sample
    For tasks like phenotyping, a sample consists of an entire ICU stay. For tasks requiring hourly predictions, e.g., LOS, a sample includes all events that occur before a specifc time, and so a single ICU stay yields multiple samples.
  • benchmark preparation workfow

这篇文章的数据处理的过程介绍得很详细且有源码,以后可以用来参考.

  • Introduction to different benchmark tasks

In-hospital mortality
Risk of mortality is most ofen formulated as binary classifcation using observations recorded from a limited window of time following admission. The target label indicates whether the patient died before hospital discharge.
Scoringsystems: SAPS,APS-III,OASIS,SAPS-II

Physiologic Decompensation
This paper defnes a binary label that indicates whether the patient’s date of death falls within the next 24hours of the current time point. They then assign these labels to each hour, starting at four hours afer admission to the ICU (in order to avoid having too short samples) and ending when the patient dies or is discharged.

Forecasting length of stay
①In this benchmark the researchers predict the remaining LOS once per hour for every hour afer admission, similar to decompensation.
②They divide the range of values into ten buckets, one bucket for extremely short visits (less than one day), seven day-long buckets for each day of the frst week, and two “outlier” buckets – one for stays of over one week but less than two, and one for stays of over two weeks. This converts length-of-stay prediction into an ordinal multiclass classifcation problem.

Acute care phenotype classifcation
The final benchmark task is phenotyping, i.e., classifying which acute care conditions are present in a given patient record. In this task they classify 25 conditions that are common in adult ICUs and formulate phenotyping as a multi-label classifcation problem.

Baselines

  • Logistic regression
    For each variable, we compute six diferent sample statistic features on seven diferent subsequences of a given time series.

In total, 17×7×6=714 features per time series are obtained.
We train a separate logistic regression classifer for each of mortality, decompensation, and the 25 phenotypes. For LOS, we trained a sofmax regression model to solve the 10-class bucketed LOS problem.

  • LSTM-based models

这部分所说的LSTM是所谓的standard LSTM
For LSTM-based models we re-sample the time series into regularly spaced intervals.
这一部分详细说明了如何处理成能够输入到LSTM的特征向量,以后可以参考.

  • Channel-wise LSTM

While the standard LSTM network work directly on the concatenation{xT}Tt≥1 of the time series, the channel-wise LSTM pre-processes the data({μ(i)T}Tt≥1,{c(i)T}Tt≥1) of diferent variables independently using a bidirectional LSTM layer.
They use diferent LSTM layers for diferent variables. Then the outputs of these LSTM layers are concatenated and are fed to another LSTM layer.
The intuition behind having channel-wise module
① First, it helps to pre-process the data of a single variable before mixing it with the data of other variables.
②Second, this channel-wise module facilitates incorporation of missing data information by explicitly showing which mask variables relate to which variables.
Note that this channel-wise module can be used as a replacement of the input layer in any neural architecture which takes the concatenation of time series of diferent variables as its input.

  • Deep supervision
    For in-hospital mortality and phenotype prediction tasks they replicate the target in all time steps and by changing the loss function they require the model to predict the replicated target variable too.
    从而有一个新的总损失函数的形式(见原文).
    For decompensation and length of stay prediction tasks ,they create multiple prediction instances from a single ICU stay and group these samples and predict them in a single pass.
  • Multitask learning LSTM
    这部分主要介绍如何结合所有的任务,变成Multi-task,数学符号相对较多,记得回原文阅读.

Experiments, Model selection and Evaluation

这部分主要介绍了对模型超参数的选择,主要用了grid search等方法.
文章用的Evaluation的方法很特殊,而且有点没懂…应该是训练-测试了很多很多轮,然后找置信区间啥的.

4. Evaluation

In-hospital mortality
AUC-ROC and AUC-PR
Physiologic Decompensation
AUC-ROC and AUC-PR
注意:Because we care about per-instance (vs. per-patient) accuracy in this task, overall performance is computed as the micro-average over all predictions, regardless of patient.
Forecasting length of stay
① a standard regression metric – mean absolute diference (MAD)
②Cohen’s linear weighted kappa, which measures correlation between ordered items.
Acute care phenotype classifcation
macro- and micro-averaged AUC-ROC with the macro-averaged score being the main score

5. Conclusion

6. Notes

  • This paper demonstrate that carefully designed recurrent neural networks are able to exploit these correlations to improve the performance for several tasks.

References

Bates, D. W., Saria, S., Ohno-Machado, L., Shah, A. & Escobar, G. Big data in health care: using analytics to identify and manage high-risk and high-cost patients. Health Afairs 33, 1123–1131 (2014).

这篇文章就Big Clinical Data展开了讨论

Harutyunyan, H. et al. MIMIC-III benchmark repository. Zenodo, https://doi.org/10.5281/zenodo.1306527 (2018).

这篇文章是本文作者写的另一篇文章,里面有代码.

Rajkomar, A. et al. Scalable and accurate deep learning with electronic health records. NPJ Digital Med 1, 18 (2018).

作者说这篇文章里有关于不同结构的深度学习网络的研究.
Recently, it was shown that novel neural architectures (including ones based on LSTM) perform well for predicting inpatient mortality, 30-day unplanned readmission, long length-of-stay (binary classifcation) and diagnoses on general EHR data (not limited to ICU)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值