文献笔记——ECG分类器（Inter- and intra- patient ECG heartbeat classification）

最新推荐文章于 2024-04-24 13:52:12 发布

sinat_18131557

最新推荐文章于 2024-04-24 13:52:12 发布

阅读量4.1k

点赞数 7

分类专栏： Python ECG

本文链接：https://blog.csdn.net/sinat_18131557/article/details/103106949

版权

Python 同时被 2 个专栏收录

66 篇文章 38 订阅

订阅专栏

ECG

2 篇文章 1 订阅

订阅专栏

文献笔记——ECG分类器（Inter- and intra- patient ECG heartbeat classification）
文献笔记

Date	Version	Comments
2019/11/1	V0.1	Init
2019/12/7	V0.2	添加LSTM部分

参考：

论文及其引用：Inter- and intra- patient ECG heartbeat classification for arrhythmia detection: a sequence to sequence deep learning approach

@Article{Mousavi2018,
 author        = {Sajad Mousavi and Fatemeh Afghah and U. Rajendra Acharya},
 title         = {Inter- and intra- patient ECG heartbeat classification for arrhythmia detection: a sequence to sequence deep learning approach},
 date          = {2018-12-09},
 eprint        = {http://arxiv.org/abs/1812.07421v2},
 eprintclass   = {q-bio.QM},
 eprinttype    = {arXiv},
 keywords      = {q-bio.QM, eess.SP, physics.med-ph},
}

源代码：https://github.com/SajadMo/ECG-Heartbeat-Classification-seq2seq-model

WFDB安装

作者的WFDB工具是安装在Matlab中的。直接打开Matlab,切换到需要安装的目录下，在Matlab中使用一下代码就可以直接安装¹，下载过程可能比较慢，但是文件不大，大概就只有8M：

[old_path]=which('rdsamp');if(~isempty(old_path)) rmpath(old_path(1:end-8)); end
wfdb_url='http://physionet.org/physiotools/matlab/wfdb-app-matlab/wfdb-app-toolbox-0-10-0.zip';
[filestr,status] = urlwrite(wfdb_url,'wfdb-app-toolbox-0-10-0.zip');
unzip('wfdb-app-toolbox-0-10-0.zip');
cd mcode
addpath(pwd);savepath

url最后的.zip文件含有版本信息，可以切换文件名，下载不同的版本。下载安装成功后，使用wfdbdemo如果出现图片，表示安装成功。需要下载安装最新版本，不然可能出现下载错误的问题。应该是网站的文件位置发生了变化，可能会出现如下的错误问题：

java.io.FileNotFoundException: http://physionet.org/physiobank/database/pbi/mitdb

数据下载

安装WFDB成功以后，下载数据，需要打开文件：./data preprocessing_Matlab/download_MITBIHDB.m,把里面的两个配置信息改了，就可以下载了，下载过程可能需要一点时间。

path_to_exes = 'D:\MATLAB_ECG_TOOL\mcode\nativelibs\windows\bin';
path_to_save_records = 'E:\03personal\DeepLearning\ECG-Heartbeat-seq2seq\data';

path_to_exes是安装WFDB的目录，我的安装在了D:\MATLAB_ECG_TOOL就写成上面那样，后面是固定的，注意：安装路径不要出现空格，否则会出现命令行语法识别的问题。path_to_save_records是需要把数据下载的目标地址，这里就设置为了代码项目目录中的data文件夹中。
另外，如果自行在网站上下载数据，下载的数据是二进制文件，不能使用作者的数据处理方式进行，需要自行预处理。

在这里插入图片描述

数据预处理

下载好数据后，分别运行./data preprocessing_Matlab/seq2seq_mitbih_AAMI.m和./data preprocessing_Matlab/seq2seq_mitbih_AAMI_DS1DS2.m进行信号的预处理。
在运行时候，我的机器上，normalize函数不认识，按照作者论文中描述，需要将数据归一化到0~1的范围内，所以把signal = normalize(signal);替换为signal = (mapminmax(signal)+1)/2;,其中mapminmax(s)会把数据转化为[-1,1]的范围，做算数运算转化为[0,1]即可。还有throw("No label! :(")中的双引号，需要改为单引号throw('No label! :(')。

模型

首先作者的模型有几个小问题，有好些地方的seq_seq_annot_DS1DS2.py的255-256行的代码，由于索引值必须要为整数，所以需要改为：

    data = _data[:int((len(_data) / max_time)) * max_time, :]
    _labels = _labels[:int((len(_data) / max_time)) * max_time]

build_network函数的第一行也是：

    _inputs = tf.reshape(inputs, [-1, n_channels, int(input_depth / n_channels)])

325-326行：

    X_train = X_train[:int((X_train.shape[0] / max_time)) * max_time, :]
    y_train = y_train[:int((X_train.shape[0] / max_time)) * max_time]

seq_seq_annot_aami.py中类似的地方也需要改。这样，代码是能跑起来了。随便跑了一会儿，效果还行。

在这里插入图片描述

数据不平衡问题

由于MIT-BIH数据库中虽然由较多的分类，但是大部分的心拍数据还是正常的(N),标记为S和F的实际上较少，这样的数据直接训练模型肯定存在问题的。数据不不平衡的问题是通过Synthetic Minority Over-sampling Technique(SMOTE)实现的，论文参考²。这个论文很早了，是2002年的，事实上在，处理中，也并没有太复杂。使用pip安装imblearn库，就可以进行数据的SMOTE处理。nums[1]是不需要扩增的数据，n_oversampling是参数，需要增加到这么多，程序中取的10000。处理前，F 802, N 90502, S 2777, V 7219,处理后，F 10000, N 72438, S 10000, V 9992。这样数据就平衡很多了。

from imblearn.over_sampling import SMOTE
......
    ratio = {0: n_oversampling, 1: nums[1], 2: n_oversampling, 3: n_oversampling}
    sm = SMOTE(random_state=12, ratio=ratio)
    X_train, y_train = sm.fit_sample(X_train, y_train)
......

Inter-patient和Intra-patient

在讨论ECG信号时候，通常都会有Inter-patient和Intra-patient的概念。

Inter-patient: 对不同病人的数据进行特征提取并分类；
Intra-patient: 直接对所有数据进行随机分配，一部分为训练集，一部分为测试集。

其实在实际上Inter-patient更加符合使用意义，但是现在大部分论文还是Intra-patient(尤其中文)，这样分类的精度高，看起来论文结果好，但是不符合实际使用情况。对于MIT-BIH的ECG信号来说，有AAMI推荐的一种Inter-patient的分组方式³。

DS1 = {101, 101, 106, 108, 109, 112, 114, 115, 116, 118, 119, 122, 124, 201, 203, 205, 207, 208, 209, 215, 220, 223,230}
DS2 = {100, 103, 105, 111, 113, 117, 121, 123, 200, 202, 210, 212, 213, 214, 219, 221, 222, 228, 231, 232, 233, 234}

DS1用于了训练模型，DS2用于了测试模型。

预处理与模型

样本的标记比较多，对相似的分类进行了分组。然后根据标记的标签，讲数据分割为一段一段的信号，为了方便进行输入，分段的信号被resize成为了280点的长度。基本上来说，预处理没有改变任何信号的特点，没有额外引入滤波等的特性。
在这里插入图片描述

模型的结构如下图：
在这里插入图片描述

CNN部分是如下的3层结构。输入的280x1的数据被resize为了10x28，这点不是非常理解，这样的话，ECG明显的特征点都丢失了。

layer	size	激活函数
CNN1	2x1,32	ReLU
MAXPOOL1	2x1,stride 1	-
CNN2	2x1,64	ReLU
MAXPOOL2	2x1,stride 1	-
CNN3	2x1,128	rectifier
MAXPOOL3	2x1,stride 1	-

每次的seq长度即为一次送入模型的长度（程序中maxtime=10）,CNN的输出作为Encoder的输入。encoder是使用的LSTM结构，同时，他是一种bidirectional recurrent neural network(BiRNN)的网络结构，并非最简单的LSTM结构，实现信息的向前和向后传播。decoder是用于生成判断的target,由于解码器的 $y^{<i>}=x^{<i+1>},i>0$ ,这样 $x^{<0>}$ 没有输入，所以在label中，在每次做输入时候，在训练集第一个label前面，额外插入了一个<GO>这个标签，作为 $x^{<0>}$ 。

建模的python代码：

_input=tf.reshape(inputs,[n_channels,input_depth // n_channels]) # 将输入数据resize(280x1 -> 10x28)
conv1=tf.layers.conv1d(inputs=_input,filters=32,kernel_size=2,strides=1,padding='same',activation=tf.nn.relu) #CNN1
max_pool_1 = tf.layers.max_pooling1d(inputs=conv1, pool_size=2, strides=2, padding='same') # POOL1
conv2 = tf.layers.conv1d(inputs=max_pool_1, filters=64, kernel_size=2, strides=1,padding='same', activation=tf.nn.relu)
max_pool_2 = tf.layers.max_pooling1d(inputs=conv2, pool_size=2, strides=2, padding='same')
conv3 = tf.layers.conv1d(inputs=max_pool_2, filters=128, kernel_size=2, strides=1,padding='same', activation=tf.nn.relu)

shape = conv3.get_shape().as_list() #conv3的输出shape
data_input_embed = tf.reshape(conv3, (-1, max_time, shape[1] * shape[2])) # 将conv3输出转化为向量

模型细节：
论文中给出是300 epochs, RMSProp方法更新权重矩阵，batch_size=20,learning_rate=0.001。实际训练时候，100 epochs, batch_size=512效果也挺好的。

评价参数

在生物信号中常用的评判参数有敏感性SEN,阳性预测值PPV,特异性SPEC,准确率Acc，之前一直没有太理解：
$SEN=\frac{TP}{TP+FN}\tag{敏感性}$ $PPV=\frac{TP}{TP+FP}\tag{阳性预测值}$ $SPEC=\frac{TN}{TN+FP}\tag{特异性}$ $Acc=\frac{TP+TN}{TP+TN+FP+FN}\tag{准确率}$
敏感性SEN表示所有的阳性标签里面的正确率。以有毒/无毒为例来说，TP表示输出有毒并正确的数量，FN表示输出为无毒并且错误的(也就是事实上有毒)。所以SEN表示了所有有毒样本中的正确率。
阳性预测值PPV表示预测的所有阳性标签是真的的概率。还是TP表示输出有毒并正确的数量，FP表示输出有毒，但是错了。PPV就是输出的有毒的正确的概率。
特异性SPEC表示所有的阴性标签里面的正确率。TN表示输出无毒并正确的数量，FP表示输出有毒但错误（事实上无毒），SPEC就表示所有输出为无毒样本的正确率。
准确率Acc就简单了，所有样本中的正确率。

LSTM

LSTM部分

    embed_size = 10

    # Embedding layers
    output_embedding = tf.Variable(tf.random_uniform((len(char2numY), embed_size), -1.0, 1.0), name='dec_embedding')
    data_output_embed = tf.nn.embedding_lookup(output_embedding, dec_inputs)

    with tf.variable_scope("encoding") as encoding_scope:
        if not bidirectional:
            # Regular approach with LSTM units
            lstm_enc = tf.contrib.rnn.LSTMCell(num_units)
            _, last_state = tf.nn.dynamic_rnn(lstm_enc, inputs=data_input_embed, dtype=tf.float32)

        else:
            # Using a bidirectional LSTM architecture instead
            enc_fw_cell = tf.contrib.rnn.LSTMCell(num_units)
            enc_bw_cell = tf.contrib.rnn.LSTMCell(num_units)

            ((enc_fw_out, enc_bw_out), (enc_fw_final, enc_bw_final)) = tf.nn.bidirectional_dynamic_rnn(
                cell_fw=enc_fw_cell,
                cell_bw=enc_bw_cell,
                inputs=data_input_embed,
                dtype=tf.float32)
            enc_fin_c = tf.concat((enc_fw_final.c, enc_bw_final.c), 1)
            enc_fin_h = tf.concat((enc_fw_final.h, enc_bw_final.h), 1)
            last_state = tf.contrib.rnn.LSTMStateTuple(c=enc_fin_c, h=enc_fin_h)

    with tf.variable_scope("decoding") as decoding_scope:
        if not bidirectional:
            lstm_dec = tf.contrib.rnn.LSTMCell(num_units)
        else:
            lstm_dec = tf.contrib.rnn.LSTMCell(2 * num_units)

        dec_outputs, _ = tf.nn.dynamic_rnn(lstm_dec, inputs=data_output_embed, initial_state=last_state)

    logits = tf.layers.dense(dec_outputs, units=len(char2numY), use_bias=True)

https://blog.csdn.net/qq_37663564/article/details/80602059 ↩︎
https://arxiv.org/pdf/1106.1813.pdf ↩︎
ANSI-AAMI, “Testing and reporting performance results of cardiac rhythm and st segment measurement algorithms,” American National Standards Institute, Inc. (ANSI), Association for the Advancement of Medical Instrumentation (AAMI), ANSI/AAMI/ISO, 1998-2008. ↩︎

sinat_18131557

关注

7
点赞
踩
53

收藏

觉得还不错? 一键收藏
8
评论
文献笔记——ECG分类器（Inter- and intra- patient ECG heartbeat classification）

文献笔记——ECG分类器（Inter- and intra- patient ECG heartbeat classification）文献笔记DateVersionComments2019/11/1V0.1Init参考：论文及其引用：Inter- and intra- patient ECG heartbeat classification for arr...
复制链接

扫一扫