[Paper]Cardiologist-Level Arrhythmia Detection with Convolutional Neural Networks_atrial fibrillation detection with convolutional n-CSDN博客

Cardiologist-Level Arrhythmia Detection with Convolutional Neural Networks

基于卷积神经网络的心脏科医生级别的心律失常检测

Abstract

摘要

We develop an algorithm which exceeds the performance of board certified cardiologists in detecting a wide range of heart arrhythmias from electrocardiograms recorded with a single-lead wearable monitor. We build a dataset with more than 500 times the number of unique patients than previously studied corpora. On this dataset,we train a 34-layer convolutional neural network which maps a sequence of ECG samples to a sequence of rhythm classes. Committees of boardcertified cardiologists annotate a gold standard test set on which we compare the performance of our model to that of 6 other individual cardiologists. We exceed the average cardiologist performance in both recall (sensitivity) and precision(positive predictive value).
我们开发了一种算法在来自单导联可穿戴的监视器记录的心电图上检测宽范围的心律失常，其性能超过了职业认证的心脏病专家。我们建立了一个数据集，其不重复的患者数量是之前研究采用的500多倍。在这个数据集上，我们训练了一个34层的卷积神经网络，它将一系列心电图样本映射到一系列心律类别。心脏病学家委员会标注了一个高质量的标准测试集，在此之上，我们将我们的模型表现和另外6位心脏病专家进行的对比。我们在召回率(灵敏度)和准确率(阳性预测值)上都超过了心脏病专家的平均水平。

1.Introduction

We develop a model which can diagnose irregular heart rhythms, also known as arrhythmias, from single-lead ECG signals better than a cardiologist. Key to exceeding expert performance is a deep convolutional network which can map a sequence of ECG samples to a sequence of arrhythmia annotations along with a novel dataset two orders of magnitude larger than previous datasets of its kind.
我们开发了可以依据单导联心电图信号在诊断不规则的心律，也被称为心律失常上表现的比心脏病专家更好的一个模型。超越专家性能的关键是一个深度卷积网络用来将一系列心电图样本映射到一系列心律失常标记以及一个比先前同类数据集大两个数量级的新数据集。
Many heart diseases, including Myocardial Infarction, AV Block, Ventricular Tachycardia and Atrial Fibrillation can all be diagnosed from ECG signals with an estimated 300 million ECGs recorded annually (Heden et al.,1996). We investigate the task of arrhythmia detection from the ECG record. This is known to be a challenging task for computers but can usually be determined by an expert from a single, well-placed lead.
许多心脏病，包括心肌梗死、房室传导阻滞、室性心动过速和房颤，都可以从每年记录的约3亿次心电图中诊断出来。我们研究从心电图记录中检测心律失常的任务。众所周知，这对计算机来说是一项具有挑战性的任务，但通常可以由专家通过一个被放置于适当位置的电极来确定。

Figure 1. Our trained convolutional neural network correctly detecting the sinus rhythm (SINUS) and Atrial Fibrillation (AFIB) from this ECG recorded with a single-lead wearable heart monitor.

Arrhythmia detection from ECG recordings is usually performed by expert technicians and cardiologists given the high error rates of computerized interpretation. One study found that of all the computer predictions for non-sinus rhythms, only about 50% were correct (Shah & Rubin,2007); in another study, only 1 out of every 7 presentations of second degree AV block were correctly recognized by the algorithm (Guglin & Thatai, 2006). To automatically detect heart arrhythmias in an ECG, an algorithm must implicitly recognize the distinct wave types and discern the complex relationships between them over time. This is difficult due to the variability in wave morphology between patients as well as the presence of noise.

由于计算机解析的高错误率，心律失常检测通常由专业技术人员和心脏病专家完成。一项研究发现，在计算机对非窦性心律的所有预测中，只有大约50%是正确的（Shah&Rubin，2007）;在另一项研究中，只有七分之一的二度房室传导阻滞被算法正确识别（Guglin&Thatai，2006）。为了自动在心电图中检测心律失常，算法必须隐式地识别不同的波形类型，并识别它们之间随时间变化的复杂关系。这很困难，因为患者之间的波形形态的变异性以及噪声的存在。

We train a 34-layer convolutional neural network (CNN)to detect arrhythmias in arbitrary length ECG time-series.Figure 1 shows an example of an input to the model. In addition to classifying noise and the sinus rhythm, the network learns to classify and segment twelve arrhythmia types present in the time-series. The model is trained end-to-end on a single-lead ECG signal sampled at 200Hz and a sequence of annotations for every second of the ECG as supervision. To make the optimization of such a deep model tractable, we use residual connections and batch normalization (He et al., 2016b; Ioffe & Szegedy, 2015).The depth increases both the non-linearity of the computation as well as the size of the context window for each classification decision.

我们训练一个34层卷积神经网络（CNN）来检测任意长度ECG时间序列中的心律失常。图1显示了模型的输入示例。除了对噪声和窦性心律进行分类外，该网络还学习对时间序列中存在的12种心律失常类型进行分类和分段。该模型在200Hz采样的单导联心电信号上进行端到端的训练，并有对每秒钟的心电信号进行的一系列标记来作为监督。为了使这种深度模型的优化变得容易处理，我们使用了残差连接和批量规范化（He et al.，2016b；Ioffe&Szegedy，2015）。深度不但增加了计算的非线性也增加了每个分类决策的上下文窗口的大小。

We construct a dataset 500 times larger than other datasets of its kind (Moody & Mark, 2001; Goldberger et al., 2000).One of the most popular previous datasets, the MIT-BIH corpus contains ECG recordings from 47 unique patients.In contrast, we collect and annotate a dataset of about 30,000 unique patients from a pool of nearly 300,000 patients who have used the Zio Patch monitor1 (Turakhia et al., 2013). We intentionally select patients exhibiting abnormal rhythms in order to make the class balance of the dataset more even and thus the likelihood of observing unusual heart-activity high.

我们构建了一个比同类数据集大500倍的数据集（Moody&Mark，2001；Goldberger et al.，2000）。MIT-BIH数据集是先前最流行的数据集之一，包含了47名患者的心电图记录。相比之下，我们使用Zio Patch monitor从近300000名患者中收集并注释了约30000名患者的数据集（Turakhia等人，2013年）。为了使数据集的类平衡更加均匀，从而使观察到异常心脏活动的可能性更高，我们有意选择了表现出异常心律的患者。

We test our model against board-certified cardiologists. A committee of three cardiologists serve as gold-standard annotators for the 336 examples in the test set. Our model exceeds the individual expert performance on both recall (sensitivity), and precision (positive predictive value) on this test set.

我们将我们的模型和专业认证的心脏病专家进行比较。一个由三名心脏病专家组成的委员会作为测试集中336个样本的标准标注者。在这个测试集上，我们的模型在查全率（敏感度）和准确度（正预测值）上都超过了个人专家的表现。

2.Model

Problem Formulation

The ECG arrhythmia detection task is a sequence-to-sequence task which takes as input an ECG signal $X = [x_1, ..x_k]$ , and outputs a sequence of labels $r = [r_1, ...r_n]$ ,such that each $r_i$ can take on one of m different rhythm classes. Each output label corresponds to a segment of the input. Together the output labels cover the full sequence.For a single example in the training set, we optimize the cross-entropy objective function
$\frac{1}{n}\sum_{i=1}^n\log p(R = r|X)$
where $p (\cdot)$ is the probability the network assigns to the i-th output taking on the value $r_i$ .
ECG心律失常检测任务是一个序列到序列任务，它以ECG信号 $X = [x_1, ..x_k]$ 作为输入，并输出标签 $r = [r_1, ...r_n]$ 的序列，使得每个 $r_i$ 可以代表m个不同的心律类中的一个。每个输出标签对应于输入的一段。对于训练集中的一个例子，我们优化了交叉熵目标函数：
$\frac{1}{n}\sum_{i=1}^n\log p(R = r|X)$
其中 $p (\cdot)$ 是网络分配给第i个输出的取值为 $r_i$ 的概率函数。

Model Architecture and Training

We use a convolutional neural network for the sequence-to-sequence learning task.The high-level architecture of the network is shown in Figure 2.The network takes as input a time-series of raw ECG signal, and outputs a sequence of label predictions.The 30 second long ECG signal is sampled at 200Hz, and the model outputs a new prediction once every second.We arrive at an architecture which is 33 layers of convolution followed by a fully connected layer and a softmax.

我们使用一个卷积神经网络来完成序列到序列的学习任务。网络的高层结构如图2所示。该网络以原始心电信号的时间序列作为输入，并输出一系列预测的标签。30秒长的心电信号在200Hz下采样，模型每一秒输出一个新的预测结果。我们得到了一个33层卷积的体系结构，然后是一个全连接层和一个softmax。

Figure 2. The architecture of the network. The first and last layer are special-cased due to the pre-activation residual blocks.Overall, the network contains 33 layers of convolution followed by a fully-connected layer and a softmax.

we employ shortcut connections in a similar manner to those found in the Residual Network architecture (He et al., 2015b).The shortcut connections between neural network layers optimize training by allowing information to propagate well in very deep neural networks. Before the input is fed into the network, it is normalized using a robust normalization strategy. The network consists of 16 residual blocks with 2 convolutional layers per block. The convolutional layers all have a filter length of 16 and have 64k filters, where k starts out as 1 and is incremented every 4-th residual block. Every alternate residual block subsamples its inputs by a factor of 2, thus the original input is ultimately subsampled by a factor of $2^8$ . When a residual block subsamples the input, the corresponding shortcut connections also subsample their input using a Max Pooling operation with the same subsample factor.

为了使这样一个网络的优化变得容易处理，我们采用了与残差网络架构中的快捷连接相似的方式（He等人，2015b）。神经网络层之间的快捷连接通过允许信息在非常深的神经网络中很好地传播来优化训练。在输入被喂到网络之前，通常使用鲁棒的归一化策略对其进行归一化。该网络由16个残差块组成，每个块有2个卷积层。卷积层的滤波器长度均为16，并且具有64k个滤波器，其中k从1开始，并且每4个残差块递增一次。每个备选残差块将其输入以2的因子进行下抽样，因此，原始输入单元最终以 $2^8$ 的因子进行下采样。当残差块对输入进行下采样时，相应的快捷连接也使用相同下采样因子的最大池操作对其输入进行下采样。

Figure 3. Evaluated on the test set, the model outperforms the average cardiologist score on both the Sequence and the Set F1
metrics.

Before each convolutional layer we apply Batch Normalization (Ioffe & Szegedy, 2015) and a rectified linear activation, adopting the pre-activation block design (He et al.,2016a). The first and last layers of the network are specialcased due to this pre-activation block structure. We also apply Dropout (Srivastava et al., 2014) between the convolutional layers and after the non-linearity. The final fully connected layer and softmax activation produce a distribution over the 14 output classes for each time-step.

在每个卷积层之前，我们应用批量规范化（Ioffe&Szegedy，2015）和校正线性激活，采用预激活块设计（He等人，2016a）。由于这种预激活块结构，网络的第一层和最后一层被特殊化。我们也在卷积层之间和非线性之后应用dropout（Srivastava et al.，2014）。最后的全连接层和softmax激活函数对每个时间步长在14个输出类上产生分布。

We train the networks from scratch, initializing the weights of the convolutional layers as in (He et al., 2015a). We use the Adam (Kingma & Ba, 2014) optimizer with the default parameters and reduce the learning rate by a factor of 10 when the validation loss stops improving. We save the best model as evaluated on the validation set during the optimization process. [ht]

我们从头开始训练网络，初始化卷积层的权重，如（He等人，2015a）。我们使用带有默认参数的Adam（Kingma&Ba，2014）优化器，当验证损失停止改善时，将学习率降低10倍。在优化过程中，我们将在验证集上进行评估的最佳模型进行保存。

3.Data

Training

We collect and annotate a dataset of 64,121 ECG records from 29,163 patients. The ECG data is sampled at a frequency of 200 Hz and is collected from a single-lead, noninvasive and continuous monitoring device called the Zio Patch which has a wear period up to 14 days (Turakhia et al., 2013). Each ECG record in the training set is 30 seconds long and can contain more than one rhythm type. Each record is annotated by a clinical ECG expert: the expert highlights segments of the signal and marks it as corresponding to one of the 14 rhythm classes.

我们收集并标注了来自29163名患者的64121份心电图记录。ECG数据从一个称为Zio Patch的单导联、非侵入、连续监测设备中以200赫兹的频率采样，其可穿戴周期长达14天（Turakhia等人，2013年）。训练集中的每个心电图记录长度为30秒，可包含多个心律类型。每个记录都由临床心电图专家打标签：专家突出显示信号的片段，并将其标记为与14个心律类别中的一个对应。

The 30 second records were annotated using a web-based ECG annotation tool designed for this work. Label annotations were done by a group of Certified Cardiographic Technicians who have completed extensive training in arrhythmia detection and a cardiographic certification examination by Cardiovascular Credentialing International. The technicians were guided through the interface before they could annotate records. All rhythms present in a strip were labeled from their corresponding onset to offset, resulting in full segmentation of the input ECG data. To improve labeling consistency among different annotators, specific rules were devised regarding each rhythm transition.

这30秒的记录是使用一个为这项工作设计的基于web的ECG注释工具注释的。标签标注是由一群经过认证的心脏病技术人员完成的，他们已经完成了心律失常检测方面的广泛培训和心血管认证国际组织的心脏病认证考试。条带中的所有节律都是从相应的起始点到偏移点进行标记，从而对输入的心电图数据进行完全分割。技术人员在给记录做注释之前，在指导下浏览了界面。将条形图中出现的所有节律从对应的起点到偏移点进行标记，从而对输入的ECG数据进行完整的分割。为了提高不同注释者之间的标注一致性，针对每个节奏转换设计了特定的规则。

Testing

We collect a test set of 336 records from 328 unique patients. For the test set, ground truth annotations for each record were obtained by a committee of three boardcertified cardiologists; there are three committees responsible for different splits of the test set. The cardiologists discussed each individual record as a group and came to a consensus labeling. For each record in the test set we also collect 6 individual annotations from cardiologists not participating in the group. This is used to assess performance of the model compared to an individual cardiologist.

我们从328个病人中收集了336条记录来作为测试集。对于测试集，每个记录的基本事实注释由三个有专业医师资格的心脏病专家获得；有三个委员会负责测试集的不同部分。心脏病学家把每个记录作为一个小组进行讨论，并达成统一的标记。对于测试集中的每个记录，我们还收集了6个未参加测试组的心脏病专家的个人注释。这是用来与单独的心脏病学家比较从而评估模型的性能。

Rhythm Classes

We identify 12 heart arrhythmias, sinus rhythm and noise for a total of 14 output classes. The arrhythmias are characterized by a variety of features. Table 2 in the Appendix shows an example of each rhythm type we classify. The noise label is assigned when the device is disconnected from the skin or when the baseline noise in the ECG makes identification of the underlying rhythm impossible.

我们鉴定了12种心律失常、窦性心律和噪声共14种输出类型。心律失常有多种特征。附录中的表2显示了我们分类的每种节奏类型的样本。当设备与皮肤断开连接或当心电图中的基线噪声使得无法识别潜在心律时，会分配噪声标签。

The morphology of the ECG during a single heart-beat as well as the pattern of the activity of the heart over time determine the underlying rhythm.In some cases the distinction between the rhythms can be subtle yet critical for treatment. For example two forms of second degree AV Block,Mobitz I (Wenckebach) and Mobitz II (here referred to as AVB TYPE2) can be difficult to distinguish. Wenckebach is considered benign and Mobitz II is considered pathological, requiring immediate attention (Dubin, 1996).

单次心跳时心电图的形态以及心脏随时间的活动模式决定了潜在的心律。在某些情况下，节奏之间的区别可能是微妙的，但对治疗至关重要。例如，二度房室传导阻滞的两种形式，Mobitz I（Wenckebach）和Mobitz II（这里称为AVB TYPE2）可能很难区分。Mobitz I（Wenckebach）被认为是良性的，Mobitz II被认为是病理性的，需要立即关注（Dubin，1996）。

Table 1. The top part of the table gives a class-level comparison of the expert to the model F1 score for both the Sequence and the Set metrics. The bottom part of the table shows aggregate results over the full test set for precision, recall and F1 for both the Sequence and Set metrics.

Table 2 in the Appendix also shows the number of unique patients in the training (including validation) set and test set for each rhythm type.

附录中的表2还显示了每种心律类型的训练（包括验证）集和测试集中的不重复的患者数量。

4. Results

Evaluation Metrics

We use two metrics to measure model accuracy, using the cardiologist committee annotations as the ground truth.

我们使用两个指标来衡量模型的准确性，使用心脏病专家委员会的注释作为基本事实。

Sequence Level Accuracy (F1): We measure the average overlap between the prediction and the ground truth sequence labels. For every record, a model is required to make a prediction approximately once per second (every 256 samples). These predictions are compared against the ground truth annotation.

序列级精度（F1）：我们比较预测结果和真实标签序列之间的平均重叠。对于每一个记录，需要一个模型每秒大约进行一次预测（每次256个样本）。这些预测结果被用于和真实标签进行比较。

Set Level Accuracy (F1): Instead of treating the labels for a record as a sequence, we consider the set of unique arrhythmias present in each 30 second record as the ground truth annotation. Set Level Accuracy, unlike Sequence Level Accuracy, does not penalize for time-misalignment within a record. We report the F1 score between the unique class labels from the ground truth and those from the model prediction.

集合级别精度（F1）：我们不把记录的标签当作一个序列来处理，而是把每30秒记录中出现的一组独特的心律失常作为基本的真值注释。我们不把记录的标签当作一个序列来处理，而是把每30秒记录中出现的不同的心律失常类别的集合作为真实标签。与序列级精度不同，集合级精度不会因为记录内的时间偏差而受到影响。我们记录了来自真实标签和来自模型预测的标签之间的F1分数。

In both the Sequence and the Set case, we compute the F1 score for each class separately. We then compute the overall F1 (and precision and recall) as the class-frequency weighted mean.

在序列和集合情况下，我们分别计算每个类的F1分数。然后，我们以类的频率加权平均计算整体的F1（以及精度和召回率）。

Model vs. Cardiologist Performance

We assess the cardiologist performance on the test set. Recall that each of the records in the test set has a ground truth label from a committee of three cardiologists as well as individual labels from a disjoint set of 6 other cardiologists. To assess cardiologist performance for each class, we take the average of all the individual cardiologist F1 scores using the group label as the ground truth annotation.

我们评估了心脏病专家在测试集上的表现。回想一下，测试集中的每个记录都有一个由三名心脏病专家组成的委员会的真实标签，以及来自其他6名未参与的心脏病专家的单独标签。为了评估心脏病专家在每个类上的表现，我们使用小组打的标签作为真实标签来计算所有心脏病专家的F1分数的平均值。

Table 1 shows the breakdown of both cardiologist and model scores across the different rhythm classes. The model outperforms the average cardiologist performance on most rhythms, noticeably outperforming the cardiologists in the AV Block set of arrhythmias which includes Mobitz I (Wenckebach), Mobitz II (AVB Type2) and complete heart block (CHB). This is especially useful given the severity of Mobitz II and complete heart block and the importance of distinguishing these two from Wenckebach which is usually considered benign.

表1显示了不同心律类别的心脏病专家和模型评分。该模型在大多数心律类别的识别上都优于心脏病专家的平均表现，在包括Mobitz I（Wenckebach）、Mobitz II（AVB Type2）和完全性心脏传导阻滞（CHB）在内的房室传导阻滞型心律失常方面明显优于心脏病专家。鉴于Mobitz II和完全性心脏传导阻滞的严重性以及这两者与通常被认为是良性的Wenckebach区分开的重要性，这一点尤其有用。

Table 1 also compares the aggregate precision, recall and F1 for both model and cardiologist compared to the ground truth annotations. The aggregate scores for the cardiologist are computed by taking the mean of the individual cardiologist scores. The model outperforms the cardiologist average in both precision and recall.

表1还通过与真实标签的比较，对比了模型和心脏病专家的总体精确度、召回率和F1。心脏病专家的总体得分是通过计算每个心脏病专家的平均分来计算的。该模型在准确度和召回率方面均优于心脏病专家的平均水平。

5. Analysis

The model outperforms the average cardiologist score on both the sequence and the set F1 metrics. Figure 4 shows a confusion matrix of the model predictions on the test set. Many arrhythmias are confused with the sinus rhythm. We expect that part of this is due to the sometimes ambiguous location of the exact onset and offset of the arrhythmia in the ECG record.

该模型在序列和集合的F1指标上均优于心脏病专家的平均得分。图4显示了测试集上模型预测的混淆矩阵。许多心律失常与窦性心律混淆。我们认为部分原因是心电图记录中心律失常的起始位置和偏移位置有时并不明确。

Often the mistakes made by the model are understandable. For example, confusing Wenckebach and AVB Type2 makes sense given that the two rhythms in general have very similar ECG morphologies. Similarly, Supraventricular Tachycardia (SVT) and Atrial Fibrillation (AFIB) are often confused with Atrial Flutter (AFL) which is understandable given that they are all atrial arrhythmias. We also note that Idioventricular Rhythm (IVR) is sometimes mistaken as Ventricular Tachycardia (VT), which again makes sense given that the two only differ in heart-rate and are difficult to distinguish close to the 100 beats per minute delineation.

模型所犯的错误通常是可以理解的。例如，将Wenckebach和AVB Type2混淆是有道理的，因为这两种心律通常具有非常相似的心电图形态。类似地，室上性心动过速（SVT）和心房颤动（AFIB）常与心房扑动（AFL）混淆，因为它们都是房性心律失常，这是可以理解的。我们还注意到，室性心律（IVR）有时被误认为是室性心动过速（VT），这同样有道理，因为这两种心动过速只在心率上不同，很难接近每分钟100次心动的界限。

Figure 4. A confusion matrix for the model predictions on the test set. Many of the mistakes the model makes are not surprising. For example, confusing second degree AV Block (Type 2) with Wenckebach makes sense given the often similar expression of the two arrhythmias in the ECG record.

One of the most common confusions is between Ectopic Atrial Rhythm (EAR) and sinus rhythm. The main distinguishing criteria for this rhythm is an irregular P wave. This can be subtle to detect especially when the P wave has a small amplitude or when noise is present in the signal.

最常见的混淆之一是异位房性心律（耳）和窦性心律。这种节律的主要判别标准是不规则P波。特别是当P波振幅很小或信号中存在噪声时，这一点很难检测到。

6. Related Work

Automatic high-accuracy methods for R-peak extraction have existed at least since the mid 1980’s (Pan & Tompkins, 1985). Current algorithms for R-peak extraction tend to use wavelet transformations to compute features from the raw ECG followed by finely-tuned threshold based classifiers (Li et al., 1995; Mart´ınez et al., 2004). Because accurate estimates of heart rate and heart rate variability can be extracted from R-peak features, feature-engineered algorithms are often used for coarse-grained heart rhythm classification, including detecting tachycardias (fast heart rate), bradycardias (slow heart rate), and irregular rhythms. However, such features alone are not sufficient to distinguish between most heart arrhythmias since features based on the atrial activity of the heart as well as other features pertaining to the QRS morphology are needed.

至少从1980年年中开始（PAN和汤普金斯，1985）就已经存在了高精度自动提取R峰的方法。当前的R峰提取算法倾向于使用小波变换从原始心电图中计算特征，然后使用微调的基于阈值的分类器（Li等人，1995；Martınez等人，2004）。由于心率和心率变异性的准确估计可以从R峰特征中提取，因此特征工程算法通常用于粗粒度心律分类，包括检测心动过速（快心率）、心动过缓（慢心率）和不规则心律。然而，这些特征本身不足以区分大多数心律失常，因为需要基于心脏心房活动的特征以及与QRS形态相关的其他特征。

Much work has been done to automate the extraction of other features from the ECG. For example, beat classification is a common sub-problem of heart-arrhythmia classifi- cation. Drawing inspiration from automatic speech recognition, Hidden Markov models with Gaussian observation probability distributions have been applied to the task of beat detection (Coast et al., 1990). Artificial neural networks have also been used for the task of beat detection (Melo et al., 2000). While these models have achieved high-accuracy for some beat types, they are not yet suffi- cient for high-accuracy heart arrhythmia classification and segmentation. For example, (Artis et al., 1991) train a neural network to distinguish between Atrial Fibrillation and Sinus Rhythm on the MIT-BIH dataset. While the network can distinguish between these two classes with high-accuracy, it does not generalize to noisier single-lead recordings or classify among the full range of 15 rhythms available in MIT-BIH. This is in part due to insufficient training data, and because the model also discards critical information in the feature extraction stage.

从心电图中自动提取其他特征的工作已经做了很多。例如，心跳分类是心律失常分类中常见的一个子问题。从自动语音识别中汲取灵感，将具有高斯观测概率分布的隐马尔可夫模型应用于心跳检测（Coast等人，1990）。人工神经网络也被用于心跳检测任务（Melo等人，2000）。虽然这些模型对某些类型的心律失常已经达到了很高的精度，但还不足以对心律失常进行高精度的分类和分割。例如，（Artis等人，1991）在MIT-BIH数据集上训练神经网络以区分心房颤动和窦性心律。虽然该网络可以高精度地区分这两类，但它并没有推广到噪音更大的单导联记录，也没有在MIT-BIH提供的15种心律类别的全部范围内进行分类。这一部分是由于训练数据不足，也是因为模型在特征提取阶段也会丢弃关键信息。

The most common dataset used to design and evaluate ECG algorithms is the MIT-BIH arrhythmia database (Moody & Mark, 2001) which consists of 48 half-hour strips of ECG data. Other commonly used datasets include the MIT-BIH Atrial Fibrillation dataset (Moody & Mark, 1983) and the QT dataset (Laguna et al., 1997). While useful benchmarks for R-peak extraction and beat-level annotations, these datasets are too small for fine-grained arrhythmia classification. The number of unique patients is in the single digit hundreds or fewer for these benchmarks. A recently released dataset captured from the AliveCor ECG monitor contains about 7000 records (Clifford et al., 2017). These records only have annotations for Atrial Fibrillation; all other arrhythmias are grouped into a single bucket. The dataset we develop contains 29,163 unique patients and 14 classes with hundreds of unique examples for the rarest arrhythmias.

用于设计和评估心电图算法的最常见数据集是MIT-BIH心律失常数据库（Moody&Mark，2001），该数据库由48个半小时的心电图数据条组成。其他常用数据集包括MIT-BIH心房颤动数据集（Moody&Mark，1983）和QT数据集（Laguna等人，1997）。虽然这些数据集对于R峰提取和节拍水平注释很有用，但对于细粒度心律失常分类来说，它们太小了。对于这些基准，患者数量是个几百个或更少。最近从AliveCor ECG监护仪获取的数据集包含约7000条记录（Clifford等人，2017年）。这些记录只有心房颤动的注释；所有其他心律失常都归为一个桶。我们开发的数据集包含29163个不重复的患者和14个类别，其中有数百个罕见心律失常的独特例子。

Machine learning models based on deep neural networks have consistently been able to approach and often exceed human agreement rates when large annotated datasets are available (Amodei et al., 2016; Xiong et al., 2016; He et al., 2015c). These approaches have also proven to be effective in healthcare applications, particularly in medical imaging where pretrained ImageNet models can be applied (Esteva et al., 2017; Gulshan et al., 2016). We draw on work in automatic speech recognition for processing time-series with deep convolutional neural networks and recurrent neural networks (Hannun et al., 2014; Sainath et al., 2013), and techniques in deep learning to make the optimization of these models tractable (He et al., 2016b;c; Ioffe & Szegedy, 2015).

基于深层神经网络的机器学习模型在有大量标签数据集的情况下，始终能够接近并经常超过人类的认同率（Amodei et al.，2016；Xiong et al.，2016；He et al.，2015c）。这些方法在医疗应用中也被证明是有效的，特别是预训练的ImageNet模型可以应用在医疗成像中（Esteva等人，2017年；Gulshan等人，2016年）。我们利用深度卷积神经网络和递归神经网络处理时间序列的自动语音识别工作（Hannun等人，2014；Sainath等人，2013）和深度学习技术，使这些模型的优化变得容易处理（He等人，2016b；c；Ioffe&Szegedy，2015）。

7. Conclusion

We develop a model which exceeds the cardiologist performance in detecting a wide range of heart arrhythmias from single-lead ECG records. Key to the performance of the model is a large annotated dataset and a very deep convolutional network which can map a sequence of ECG samples to a sequence of arrhythmia annotations.

我们开发了一个模型，它在从单导联心电图记录中检测广泛的心律失常方面超过了心脏病学家的表现。模型性能的关键是一个带标签的大数据集和一个很深的卷积网络，它可以将一系列心电图样本映射到一系列心律失常标签。

On the clinical side, future work should investigate extending the set of arrhythmias and other forms of heart disease which can be automatically detected with high-accuracy from single or multiple lead ECG records. For example we do not detect Ventricular Flutter or Fibrillation. We also do not detect Left or Right Ventricular Hypertrophy, Myocardial Infarction or a number of other heart diseases which do not necessarily exhibit as arrhythmias. Some of these may be difficult or even impossible to detect on a single-lead ECG but can often be seen on a multiple-lead ECG.

在临床方面，未来的工作应该是研究扩展心律失常分类的类别和其他形式的心脏病，使得这些疾病可以从单导联或多导联心电图记录中自动高精度地检测出来。例如，我们没有检测心室扑动或心室颤动。我们也没有发现左心室或右心室肥大，心肌梗死或其他一些不一定表现为心律失常的心脏病。其中一些可能很难或甚至不可能在单导联心电图上检测到，但通常可以在多导联心电图上看到。

Given that more than 300 million ECGs are recorded annually, high-accuracy diagnosis from ECG can save expert clinicians and cardiologists considerable time and decrease the number of misdiagnoses. Furthermore, we hope that this technology coupled with low-cost ECG devices enables more widespread use of the ECG as a diagnostic tool in places where access to a cardiologist is difficult.

鉴于每年记录的心电图超过3亿次，心电图的高精度诊断可以节省专家临床医生和心脏病专家相当长的时间，减少误诊。此外，我们希望这项技术与低成本的心电图设备相结合，能使心电图作为一种诊断工具在难以接触到心脏病专家的地方得到更广泛的应用。

Acknowledgements

We thank Geoffrey H. Tison MD, MPH of UCSF for helpful feedback on the experiments and references.
我们感谢加州大学旧金山分校的Geoffrey H.Tison MD，MPH对实验和参考文献的反馈。

Table 2. A list of all of the rhythm types which the model classifies. For each rhythm we give the label name, a more descriptive name and an example chosen from the training set. We also give the total number of patients with each rhythm for both the training and test sets.

文章简述：

文章时间：6 Jul 2017
数据来源：单导联的可穿戴的心电监护(Zio Patch)
数据集特点: 是之前同类研究使用的数据集的500多倍
文章目标：检测宽范围的心律失常并进行14分类(窦性心律/噪声/12类心律失常)
文章结论：基于CNN训练的模型在召回率(灵敏度)和准确率(阳性预测值)上都超过了心脏病专家的平均水平
模型衡量指标:Sequence Level Accuracy (F1)、Set Level Accuracy (F1)

端到端(end-to-end)训练
每秒钟ECG信号都有标签
使用了residual connections、batch normalization、shortcut connections来让模型更容易被优化
为了平均类别对数据进行了筛选
3个专家作为1组共同标注数据 6个专家独立标注作为对比
sequence-to-sequence学习
输入原始ECG数据
30s的ECG信号以200Hz采样
每秒输出1个预测结果
来自29163名患者的64121份心电图
328个病人中收集了336条记录来作为测试集

词汇

board certified 职业认证的
tractable 易处理
from scratch 从头开始
strip 带
underlying 潜在的
subtle 微妙的，不易察觉的
penalize 惩罚
time-misalignment 时间偏差
noticeably 明显地
coarse-grained 粗粒度的