Multitask learning and benchmarking with clinical time series data阅读记录

最新推荐文章于 2022-10-14 10:25:18 发布

Worldora-

最新推荐文章于 2022-10-14 10:25:18 发布

阅读量924

点赞数 2

分类专栏：文献阅读记录

本文链接：https://blog.csdn.net/qq_43787862/article/details/105092726

版权

文献阅读记录专栏收录该内容

9 篇文章 10 订阅

订阅专栏

0. Summary

1. Research Objective

To propose a public benchmark suite that includes four diferent clinical prediction tasks: in-hospital mortality, physiologic decompensation, length of stay (LOS), and phenotype classifcation.This setup of benchmarks allows to formulate a heterogeneous multitask learning problem that involves jointly learning all four prediction tasks simultaneously.

2. Background and Problems

Background

The large volume of digital clinical data.

Problems

The absence of widely accepted benchmarks to evaluate competing models. The practical progress in clinical machine learning has been difcult to measure due to variability in data sets and task defnitions.
Most of the researchers develop new methods for one clinical prediction task at a time.

The heterogeneous nature of multitask requires a modeling solution that can not only handle sequence data but also model correlations between tasks distributed in time.

3. Method

Benchmark tasks

Some terminology
In MIMIC-III:subjects, admissions, ICU stays/episodes, event
In this paper:sample
For tasks like phenotyping, a sample consists of an entire ICU stay. For tasks requiring hourly predictions, e.g., LOS, a sample includes all events that occur before a specifc time, and so a single ICU stay yields multiple samples.
benchmark preparation workfow

这篇文章的数据处理的过程介绍得很详细且有源码,以后可以用来参考.

Introduction to different benchmark tasks

In-hospital mortality
Risk of mortality is most ofen formulated as binary classifcation using observations recorded from a limited window of time following admission. The target label indicates whether the patient died before hospital discharge.
Scoringsystems: SAPS,APS-III,OASIS,SAPS-II

Physiologic Decompensation
This paper defnes a binary label that indicates whether the patient’s date of death falls within the next 24hours of the current time point. They then assign these labels to each hour, starting at four hours afer admission to the ICU (in order to avoid having too short samples) and ending when the patient dies or is discharged.

Forecasting length of stay
①In this benchmark the researchers predict the remaining LOS once per hour for every hour afer admission, similar to decompensation.
②They divide the range of values into ten buckets, one bucket for extremely short visits (less than one day), seven day-long buckets for each day of the frst week, and two “outlier” buckets – one for stays of over one week but less than two, and one for stays of over two weeks. This converts length-of-stay prediction into an ordinal multiclass classifcation problem.

Acute care phenotype classifcation
The final benchmark task is phenotyping, i.e., classifying which acute care conditions are present in a given patient record. In this task they classify 25 conditions that are common in adult ICUs and formulate phenotyping as a multi-label classifcation problem.

Baselines

Logistic regression
For each variable, we compute six diferent sample statistic features on seven diferent subsequences of a given time series.

In total, 17×7×6=714 features per time series are obtained.
We train a separate logistic regression classifer for each of mortality, decompensation, and the 25 phenotypes. For LOS, we trained a sofmax regression model to solve the 10-class bucketed LOS problem.

LSTM-based models

这部分所说的LSTM是所谓的standard LSTM
For LSTM-based models we re-sample the time series into regularly spaced intervals.
这一部分详细说明了如何处理成能够输入到LSTM的特征向量,以后可以参考.

Channel-wise LSTM

While the standard LSTM network work directly on the concatenation{x_T}^T_t≥1 of the time series, the channel-wise LSTM pre-processes the data({μ⁽ⁱ⁾_T}^T_t≥1,{c⁽ⁱ⁾_T}^T_t≥1) of diferent variables independently using a bidirectional LSTM layer.
They use diferent LSTM layers for diferent variables. Then the outputs of these LSTM layers are concatenated and are fed to another LSTM layer.
The intuition behind having channel-wise module
① First, it helps to pre-process the data of a single variable before mixing it with the data of other variables.
②Second, this channel-wise module facilitates incorporation of missing data information by explicitly showing which mask variables relate to which variables.
Note that this channel-wise module can be used as a replacement of the input layer in any neural architecture which takes the concatenation of time series of diferent variables as its input.

Deep supervision
For in-hospital mortality and phenotype prediction tasks they replicate the target in all time steps and by changing the loss function they require the model to predict the replicated target variable too.
从而有一个新的总损失函数的形式(见原文).
For decompensation and length of stay prediction tasks ,they create multiple prediction instances from a single ICU stay and group these samples and predict them in a single pass.
Multitask learning LSTM
这部分主要介绍如何结合所有的任务,变成Multi-task,数学符号相对较多,记得回原文阅读.

Experiments, Model selection and Evaluation

这部分主要介绍了对模型超参数的选择,主要用了grid search等方法.
文章用的Evaluation的方法很特殊,而且有点没懂…应该是训练-测试了很多很多轮,然后找置信区间啥的.

4. Evaluation

In-hospital mortality
AUC-ROC and AUC-PR
Physiologic Decompensation
AUC-ROC and AUC-PR
注意:Because we care about per-instance (vs. per-patient) accuracy in this task, overall performance is computed as the micro-average over all predictions, regardless of patient.
Forecasting length of stay
① a standard regression metric – mean absolute diference (MAD)
②Cohen’s linear weighted kappa, which measures correlation between ordered items.
Acute care phenotype classifcation
macro- and micro-averaged AUC-ROC with the macro-averaged score being the main score

5. Conclusion

6. Notes

This paper demonstrate that carefully designed recurrent neural networks are able to exploit these correlations to improve the performance for several tasks.

References

Bates, D. W., Saria, S., Ohno-Machado, L., Shah, A. & Escobar, G. Big data in health care: using analytics to identify and manage high-risk and high-cost patients. Health Afairs 33, 1123–1131 (2014).

这篇文章就Big Clinical Data展开了讨论

Harutyunyan, H. et al. MIMIC-III benchmark repository. Zenodo, https://doi.org/10.5281/zenodo.1306527 (2018).

这篇文章是本文作者写的另一篇文章,里面有代码.

Rajkomar, A. et al. Scalable and accurate deep learning with electronic health records. NPJ Digital Med 1, 18 (2018).

作者说这篇文章里有关于不同结构的深度学习网络的研究.
Recently, it was shown that novel neural architectures (including ones based on LSTM) perform well for predicting inpatient mortality, 30-day unplanned readmission, long length-of-stay (binary classifcation) and diagnoses on general EHR data (not limited to ICU)