Turbo Autoencoder: Deep learning based channel code for point-to-point communication channels
Abstract
Designing codes that combat the noise in a communication medium has remained a significant area of research in information theory as well as wireless communications. Asymptotically optimal channel codes have been developed by mathematicians for communication under canonical models after over 60 years of research. On the other hand, in many non-canonical channel settings, optimal codes do not exist and the codes designed for canonical models are adapted via heuristics to these channels and are thus not guaranteed to be optimal. In this work, we make significant progress on this problem by designing a fully end-to-end jointly trained neural encoder and decoder, namely, Turbo Autoencoder (TurboAE), with the following contributions: (a) under moderate block length, TurboAE approaches state-of-the-art performance under canonical channels; (b) moreover, TurboAE outperforms state-of-the-art codes under non-canonical settings in terms of reliability. TurboAE shows that the development of channel coding design can be automated via deep learning, with near-optimal performance.
设计对抗通信介质中噪声的代码一直是信息理论和无线通信领域的一个重要研究领域。经过60多年的研究,数学家们已经开发出了在规范模型下进行通信的渐近最优信道码。另一方面,在许多非规范信道设置中,最优代码不存在,并且为规范模型设计的代码通过启发式算法适应这些信道,因此不能保证是最优的。在这项工作中,我们通过设计一个完全端到端联合训练的神经编码器和解码器,即Turbo自动编码器(TurboAE),在这个问题上取得了重大进展,其贡献如下:(a)在中等块长度下,TurboAE在规范信道下接近最先进的性能;(b) 此外,在非规范设置下,TurboAE在可靠性方面优于最先进的代码。TurboAE表明,信道编码设计的开发可以通过深度学习实现自动化,性能接近最优。
1.介绍
Autoencoder is a powerful unsupervised learning framework to learn latent representations by minimizing reconstruction loss of the input data . Autoencoders have been widely used in unsupervised learning tasks such as representation learning, denoising, and generative model. Most autoencoders are under-complete autoencoders, for which the latent space is smaller than the input data. Over-complete autoencoders have latent space larger than input data. While the goal of under-complete autoencoder is to find a low dimensional representation of input data, the goal of over-complete autoencoder is to find a higher dimensional representation of input data so that from a noisy version of the higher dimensional representation, original data can be reliably recovered. Over-complete autoencoders are used in sparse representation learning and robust representation learning.
Autoencoder是一个强大的无监督学习框架,通过最小化输入数据的重建损失来学习潜在表示。Autoencoder已广泛应用于表示学习、去噪和生成模型等无监督学习任务中。大多数自动编码器都在完整的自动编码器下,其潜在空间小于输入数据。过完整的自动编码器的潜在空间大于输入数据。欠完备自动编码器的目标是找到输入数据的低维表示,而过完备自动编码器的目标是找到输入数据的高维表示,以便从高维表示的嘈杂版本中可靠地恢复原始数据。过完备的自动编码器用于稀疏表示学习和鲁棒表示学习。
Channel coding aims at communicating a message over a noisy random channel. As shown in Figure 1 left, the transmitter maps a message to a codeword via adding redundancy (this mapping is called encoding). A channel between the transmitter and the receiver randomly corrupts the codeword so that the receiver observes a noisy version which is used by the receiver to estimate the transmitted message (this process is called decoding). The encoder and the decoder together can be naturally viewed as an over-complete autoencoder, where the noisy channel in the middle corrupts the hidden representation (codeword). This coding and decoding process can thus be naturally modeled by over-complete autoencoders. Therefore, designing a reliable autoencoder can have a strong bearing on alternative ways of designing new encoding and decoding schemes for wireless communication systems.
信道编码旨在通过噪声随机信道传输消息。如左图1所示,发送器通过添加冗余将消息映射到码字(这种映射称为编码)。发射机和接收机之间的信道会随机破坏码字,因此接收机会观察到噪声版本,接收机使用噪声版本来估计传输的消息(此过程称为解码)。编码器和解码器一起可以自然地视为一个过于完整的自动编码器,其中中间的噪声信道会破坏隐藏的表示(码字)。因此,这种编码和解码过程可以由过于完整的自动编码器自然地建模。因此,设计一个可靠的自动编码器对设计无线通信系统新的编码和解码方案的替代方法有很大影响。
Traditionally, communication algorithm design first involves designing a ‘code’ (i.e., the encoder) via optimizing certain mathematical properties of encoder such as minimum code distance. The associated decoder that minimizes the bit-error-rate is simply given by the maximum a posteriori (MAP) principle. While the optimal MAP decoder is computationally simple for some simple codes, such as convolutional codes, for known capacity-achieving codes, the MAP decoder is not computationally efficient and alternative decoding principles such as Turbo decoding and belief propagation are employed. The progress on the design of optimal channel codes with computationally efficient decoders has been quite sporadic due to its reliance on human ingenuity. Since Shannon’s seminal work in 1948, it took several decades of research to finally discover the current state-of-the-art codes.
传统上,通信算法设计首先涉及通过优化编码器的某些数学特性(如最小码距)来设计“代码”(即编码器)。通过最大后验概率(maximum a posteriori,MAP)原理,可以简单地给出使误码率最小化的相关解码器。虽然对于一些简单码(例如卷积码)而言,最优MAP解码器在计算上很简单,但对于已知的容量实现码而言,MAP解码器在计算上并不高效,并且采用了替代解码原理,例如Turbo解码和置信传播。由于对人类创造力的依赖,在设计具有计算效率的解码器的最佳信道码方面的进展相当零散。自从香农1948年的开创性工作以来,花了几十年的研究才最终发现了当前最先进的代码。
Near-optimal channel codes such as Turbo, Low Density Parity Check (LDPC), and Polar codes show Shannon capacity-approaching performance on AWGN channels, and they have had a tremendous impact on Long Term Evolution (LTE) and 5G standards. The traditional approach has the following caveats: (a) Decoder design relies on handcrafted optimal decoding algorithms on the canonical Additive White Gaussian Noise (AWGN) channel, where the signal is corrupted by i.i.d. Gaussian noise. When dealing with non-AWGN channels without closed-form representations, canonical codes with the layering of heuristics operate at best sub-optimality. (b) Channel coding transmits code block with K bits per message, which is referred as block length. Channel codes are guaranteed to be optimal only when the block-length approaches infinity, and thus are near-optimal in practice only when the block-length is large. Thus, there is a room for improvement under short and moderate block length regimes. © The encoder designed for the AWGN channel is used across a large family of channels, while the decoder is adapted. This design methodology fails to utilize the flexibility of the encoder.
接近最优的信道编码,如Turbo、低密度奇偶校验(LDPC)和极性码,在AWGN信道上显示出接近香农容量的性能,它们对长期演进(LTE)和5G标准产生了巨大影响。传统方法有以下警告:(a)解码器设计依赖于标准加性高斯白噪声(AWGN)信道上的手工优化解码算法,其中信号被i.i.d.高斯噪声破坏。当处理没有封闭形式表示的非AWGN信道时,带有启发式分层的规范码在最佳次优状态下运行。(b) 信道编码以每条消息的K位传输代码块,这被称为块长度。只有当块长度接近无穷大时,信道码才能保证是最优的,因此在实际中只有当块长度较大时,信道码才接近最优。因此,在短和中等区块长度的情况下,仍有改进的空间。(c) 为AWGN信道设计的编码器在一大系列信道中使用,而解码器则进行了调整。这种设计方法未能利用编码器的灵活性。
Related work. Deep learning has pushed the state-of-the-art performance of computer vision and natural language processing to a new level far beyond handcrafted algorithms in data-driven fashion. There is a recent movement in applying deep learning to wireless communications. Deep learning based channel coding decoder design has been applied, where encoder is fixed as a near-optimal code. It is shown that belief propagation decoders for LDPC and Polar codes can be imitated by neural network. It is also shown that convolutional and turbo codes can be decoded optimally via Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN). Equipping a decoder with a learnable neural network also allows fast adaptation via meta-learning. Recent works also extend deep learning to multiple-input and multiple-output (MIMO) settings. While neural decoders show improved performance on various communication channels, there has been limited success in inventing novel codes using this paradigm. Training methods for improving both modulation and channel coding are introduced, where a (7,4) neural code mapping a 4-bit message to a length-7 codeword can match (7,4) Hamming code performance. Current research includes training an encoder and a decoder with noisy feedback, improving modulation gain, as well as extensions to multi-terminal settings. Joint source-channel coding shows improved results combining source coding (compression) along with channel coding (noise mitigation). Neural codes were shown to outperform existing state-of-the-art codes on the feedback channel. However, in the canonical setting of AWGN channel, neural codes are still far from capacity-approaching performance due to the following challenges.
相关工作:深度学习将计算机视觉和自然语言处理的最新性能提升到了一个新水平,远远超出了以数据驱动方式手工制作的算法。最近有一项将深度学习应用于无线通信的运动。基于深度学习的信道编码解码器设计已经得到应用,其中编码器被固定为接近最优的代码。结果表明,LDPC码和极性码的信度传播译码器可以用神经网络来模拟。研究还表明,卷积码和turbo码可以通过递归神经网络(RNN)和卷积神经网络(CNN)进行最佳解码。为解码器配备可学习的神经网络也可以通过元学习实现快速自适应。最近的工作还将深度学习扩展到多输入多输出(MIMO)设置。虽然神经译码器在各种通信信道上表现出更好的性能,但使用这种模式发明新代码的成功有限。介绍了用于改进调制和信道编码的训练方法,其中将4位消息映射到长度为7的码字的(7,4)神经代码可以匹配(7,4)汉明码的性能。目前的研究包括训练具有噪声反馈的编码器和解码器,提高调制增益,以及扩展到多终端设置。信源-信道联合编码将信源编码(压缩)与信道编码(噪声抑制)相结合,显示出改进的结果。结果表明,在反馈通道上,神经代码的性能优于现有的最先进的代码。然而,在AWGN信道的规范设置下,由于以下挑战,神经代码仍远未达到接近性能的容量。
(Challenge A) : Encoding with randomness is critical to harvest coding gain on long block length. However, existing sequential neural models, both CNN and even RNN, can only learn limited local dependency. Hence, neural encoder cannot sufficiently utilize the benefits of even moderate block length.
(挑战A):随机编码对于在长块长度上获得编码增益至关重要。然而,现有的顺序神经模型,无论是CNN还是RNN,只能学习有限的局部依赖性。因此,神经编码器不能充分利用即使是中等块长度的优点。
(Challenge B) : Training neural encoder and decoder jointly (with a random channel in between) introduces optimization issues where the algorithm gets stuck at local optima. Hence, a novel training algorithm is needed.
(挑战B):联合训练神经编码器和解码器(中间有一个随机通道)引入了优化问题,算法陷入局部最优。因此,需要一种新的训练算法。
Contributions: In this paper, we confront the above challenges by introducing Turbo Autoencoder (henceforth, TurboAE) – the first channel coding scheme with both encoder and decoder powered by neural networks that achieve reliability close to the state-of-the-art channel codes under AWGN channels for a moderate block length. We demonstrate that channel coding, which has been a focus of study by mathematicians for several decades, can be learned in an end-to-end fashion. Our major contributions are:
贡献:在本文中,我们通过引入Turbo Autoencoder(此后称为TurboAE)来应对上述挑战。TurboAE是第一个信道编码方案,编码器和解码器均由神经网络驱动,在中等块长的AWGN信道下,实现了接近最先进信道码的可靠性。我们证明了几十年来一直是数学家研究重点的信道编码可以以端到端的方式学习。我们的主要贡献是:
① We introduce TurboAE, a neural network based over-complete autoencoder parameterized as Convolutional Neural Networks (CNN) along with interleavers (permutation) and de-interleavers (de-permutation) inspired by turbo codes (Section 3.1). We introduce TurboAE-binary, which binarizes the codewords via straight-through estimator (Section 3.2).
① 我们介绍TurboAE,这是一种基于完全自动编码器的神经网络,参数化为卷积神经网络(CNN),以及受turbo码启发的交织器(置换)和去交织器(去置换)(第3.1节)。我们将介绍TurboAE binary,它通过直通估计器对码字进行二值化(第3.2节)。
② We propose techniques that are critical for training TurboAE which includes mechanisms of alternate training of encoder and decoder as well as strategies to choose right training examples. Our training methodology ensures stable training of TurboAE without getting trapped at locally optimal encoder-decoder solutions. (Section 3.3)
② 我们提出了训练TurboAE的关键技术,包括编码器和解码器的交替训练机制,以及选择正确训练示例的策略。我们的训练方法确保了TurboAE的稳定训练,而不会陷入局部最优编码器-解码器解决方案。(第3.3节)
③ Compared to multiple capacity-approaching codes on AWGN channel, TurboAE shows superior performance in the low to middle SNR range when the block length is of moderate size (K∼100). To the best of our knowledge, this is the first result demonstrating the deep learning powered discovered neural codes can outperform traditional codes in the canonical AWGN setting (Section 4.1).
③ 与AWGN信道上的多容量逼近码相比,当块长为中等大小(K)时,TurboAE在中低信噪比范围内表现出优越的性能∼100). 据我们所知,这是第一个证明深度学习驱动的已发现神经代码在规范AWGN设置下优于传统代码的结果(第4.1节)。
④ On a non-AWGN channel, fine-tuned TurboAE shows significant improvements over state-of-the-art coding schemes due to the flexibility of encoder design, which shows that TurboAE has advantages on designing codes where handcrafted solutions fail (Section 4.2).
④ 在非AWGN信道上,由于编码器设计的灵活性,微调TurboAE比最先进的编码方案有显著改进,这表明TurboAE在设计手工解决方案失败的代码方面具有优势(第4.2节)。
2.问题形成
The channel coding problem is illustrated in Figure 1 left, which consists of three blocks – an encoder fθ(·), a channel c(·), and a decoder gφ(.). A channel c(·) randomly corrupts an input x and is represented as a probability transition function p y|x. A canonical example of channel c(·) is an identically and independently distributed AWGN channel, which generates yi=xi+zi for zi ∼ N(0, σ^2), i = 1, · · · , K. The encoder x = fθ(u) maps a random binary message sequence u = (u1, · · · , uK) ∈ {0, 1} K of block length K to a codeword x = (x1, · · · , xN ) of length N, where x must satisfy either soft power constraint where E(x) = 0 and E(x^2) = 1, or hard power constraint x ∈ {−1, +1}. Code rate is defined as R = K/N , where N > K. The decoder gφ(y) maps a real valued received sequence y = (y1, · · · , yN ) ∈ R^N to an estimate of the transmitted message sequence ˆu = (ˆu1, · · · , ˆuK) ∈ {0, 1}^K.
信道编码问题如左图1所示,它由三个块组成——编码器fθ(.)、信道c(.)和解码器gφ(.)。通道c(.) 随机破坏输入x,并表示为概率转移函数Py|x。通道c(.) 的典型示例是一个相同且独立分布的AWGN通道,它为zi生成yi=xi+zi ∼ N(0,σ^2),i=1,··,K。编码器x=fθ(u) 映射随机二进制消息序列 u=(u1,··,uK)∈ 块长度为K的{0,1}K到长度为N的码字x=(x1,···,xN),其中x必须满足软功率约束,其中E(x)=0 且E(x^2)=1,或硬功率约束x∈ {−1, +1}. 码率定义为R=K/N,其中N>K。解码器gφ(y) 映射实值接收序列y=(y1,···,yN)∈ R^N为传输消息序列ˆu=(ˆu1,···,ˆuK)的估计值∈ {0,1}^K。
AWGN channel allows closed-form mathematical analysis, which has remained as the major playground for channel coding researchers. The noise level is defined as signal-to-noise ratio, SNR = −10log10(σ^2). The decoder recovers the original message as ˆu = gφ(y) using the received signal y.
AWGN信道允许进行封闭形式的数学分析,这一直是信道编码研究人员的主要工作场所。噪声级定义为信噪比,SNR=−10log10(σ^2)。解码器使用接收到的信号y将原始消息恢复为ˆu=gφ(y)。
Channel coding aims to minimize the error rate of recovered message ˆu. The standard metrics are bit error rate (BER), defined as BER = 1/K ΣPr(ˆui != ui), and block error rate (BLER), defined as BLER = Pr(ˆu != u).
信道编码旨在将恢复消息ˆu的错误率降至最低。标准指标是误码率(BER),定义为BER=1/K∑Pr(ˆui!=ui),以及块错误率(BLER),定义为BLER=Pr(ˆu!=u)。
While canonical capacity-approaching channel codes work well as block length goes to infinity, when the block length is short, they are not guaranteed to be optimal. We show the benchmarks on block length 100 in Figure 1 right with widely-used LDPC, Turbo, Polar, and Tail-bitting Convolutional Code (TBCC), generated via Vienna 5G simulator, with code rate 1/3.
虽然标准容量逼近信道码在块长度趋于无穷大时工作良好,但当块长度较短时,它们不能保证是最优的。我们在图1中展示了块长为100的基准测试,以及广泛使用的LDPC、Turbo、Polar和尾比特卷积码(TBCC),它们是通过Vienna 5G模拟器生成的,码率为1/3。
Naively applying deep learning models by replacing encoder and decoder with general purpose neural network does not perform well. Direct applications of fully connected neural network (FCNN) cannot scale to a longer block length; the performance of FCNN-AE is even worse than repetition code. Direct applications where both the encoder and the decoder are Convolutional Autoencoder (termed as CNN-AE) shows better performance than TBCC, but are far from capacity-approaching codes such as LDPC, Polar, and Turbo. Bidirectional RNN and LSTM has similar performance as CNN-AE and is not shown in the figure for clarity. Thus neither CNN nor RNN based auto-encoders can directly approach state-of-the-art performance. A key reason for their shortcoming is that they have only local memory, the encoder only remembers information locally. To have high protection against channel noise, it is necessary to have long term memory.
单纯地用通用神经网络代替编码器和解码器来应用深度学习模型,效果并不理想。完全连接神经网络(FCNN)的直接应用不能扩展到更长的块长度;FCNN-AE的性能甚至比重复码还要差。编码器和解码器均为卷积自动编码器(称为CNN-AE)的直接应用显示出比TBCC更好的性能,但远不是LDPC、Polar和Turbo等接近容量的代码。双向RNN和LSTM的性能与CNN-AE类似,为清晰起见,图中未显示。因此,无论是CNN还是基于RNN的自动编码器都无法直接达到最先进的性能。其缺点的一个关键原因是它们只有本地内存,编码器只在本地存储信息。为了对信道噪声有很高的保护,需要有长期记忆。
We propose TurboAE with interleaved encoding and iterative decoding that creates long term memory in the code and shows a significant improvement compared to CNN-AE. TurboAE has two versions, TurboAE-continuous which faces soft power constraint (i.e., the total power across a codeword is bounded) and TurboAE-binary which faces hard power constraint (i.e., each transmitted symbol has a power constraint - and is thus forced to be binary). Both TurboAE-binary and TurboAE-continuous perform comparable or better than all other capacity-approaching codes at a low SNR, while at a high SNR (over 2 dB with BER < 10^−5), the performance is only worse than LDPC and Polar code.
我们建议TurboAE采用交织编码和迭代解码,在代码中创建长期记忆,与CNN-AE相比有显著改进。TurboAE有两个版本,TurboAE continuous面临软功率约束(即,一个码字的总功率是有界的),TurboAE binary面临硬功率约束(即,每个传输的符号都有功率约束,因此被强制为二进制)。在低信噪比下,TurboAE二进制码和TurboAE连续码的性能与所有其他容量接近码相当或更好,而在高信噪比下(超过2 dB,误码率<10^−5) 其性能仅比LDPC码和极性码差。
3.TruboAE:结构设计和训练
3.1TurboAE的设计
Turbo code and turbo principle: Turbo code is the first capacity-approaching code ever designed. There are two novel components of Turbo code which led to its success: an interleaved encoder and an iterative decoder. The starting point of the Turbo code is a recursive systematic convolutional (RSC) code which has an optimal decoding algorithm(BCJR) . A key disadvantage in the RSC code is that the algorithm lacks long range memory (since the convolutional code operates on a sliding window). The key insight of Berrou was to introduce long range memory by creating two copies of the input bits - the first goes through the RSC code and the second copy goes through an interleaver before going through the same code. Such a code can be decoded by iteratively alternating between soft-decoding based on the signal received from the first copy and then using the de-interleaved version as a prior to decode the second copy. The ‘Turbo principle’ refers to the iterative decoding with successively refining the posterior distribution on the transmitted bits across decoding stages with original and interleaved order. This code is known to have excellent performance, and inspired this, we design TurboAE featuring both learnable interleaved encoder and iterative decoder.
Turbo码与Turbo原理:Turbo码是迄今为止设计的第一个接近容量的码。Turbo码有两个新的组成部分导致了它的成功:交织编码器和迭代解码器。Turbo码的起点是递归系统卷积(RSC)码,该码具有最佳解码算法(BCJR)。RSC码的一个关键缺点是,该算法缺乏长程内存(因为卷积码在滑动窗口上运行)。Berrou的关键洞察是通过创建输入位的两个副本来引入远程内存——第一个副本通过RSC代码,第二个副本在通过相同代码之前通过交织器。可以通过在基于从第一副本接收的信号的软解码和然后在解码第二副本之前使用解交织版本作为预解码之间迭代地交替来解码这样的代码。“Turbo原理”指的是迭代解码,在原始和交织顺序的解码阶段,对传输比特的后验分布进行连续细化。众所周知,这段代码具有优异的性能,受此启发,我们设计了具有可学习交织编码器和迭代解码器的TurboAE。
Interleaved Encoding Structure:Interleaving is widely used in communication systems to mitigate bursty noise. Formally, interleaver x^π = π(x) and de-interleaver x=π(−1)(xπ) shuffle and shuffle back the input sequence x with the a pseudo random interleaving array known to both encoder and decoder, respectively, as shown in Figure 2 left. In the context of Turbo code and TurboAE, the interleaving is not used to mitigate bursty errors but rather to add long range memory in the structure of the code.
交织编码结构: 交织广泛应用于通信系统中,以降低突发性噪声。形式上,交织器x^ π=π(x)和解交织器x=π^(−1) (x^π)使用编码器和解码器都知道的伪随机交错数组,分别对输入序列x进行洗牌和洗牌,如左图2所示。在Turbo码和TurboAE的上下文中,交织不是用来缓解突发性错误,而是用来在码的结构中添加远程内存。
We take code rate 1/3 as an example for interleaved encoder fθ, which consists of three learnable encoding blocks fi,θ(.) for i ∈ {1, 2, 3}, where fi,θ(.) encodes bi = fθ(u), i ∈ {1, 2} and b3 = f3,θ(π(u)), where bi is a continuous value. The power constraint of channel coding is enforced via power constraint block xi = h(bi).
我们以码率1/3为例介绍了交织编码器fθ,它有三个可学习的编码块fi,θ(.), i∈ {1,2,3},其中fi,θ(.)编码bi=fθ(u),i∈ {1,2} 且 b3= f3,θ(π(u)),其中bi是一个连续值。信道编码的功率约束通过功率约束块xi=h(bi)来实现。
Iterative Decoding Structure: As received codewords are encoded from original message u and interleaved message π(u), decoding interleaved code requires iterative decoding on both interleaved and de-interleaved order shown in Figure 3. Let y1, y2, y3 denote noisy versions of x1, x2, x3, respectively. The decoder runs multiple iterations, with each iteration contains two decoders gφi,1 and gφi,2 for interleaved and de-interleaved order on the i-th iteration.
迭代解码结构: 由于接收到的码字是从原始消息u和交织消息π(u) 编码的,所以对交织代码进行解码需要按照图3所示的交织和去交织顺序进行迭代解码。让y1、y2、y3分别表示x1、x2、x3的噪声版本。解码器运行多次迭代,每次迭代包含两个解码器gφi,1 和 gφ,2,用于第i次迭代的交织和去交织顺序。
The first decoder gφi,1 takes received signal y1, y2 and de-interleaved prior p with shape (K, F ), where as F is the information feature size for each code bit, to produce the posterior q with same shape (K, F ). The second decoder gφi,2 takes interleaved signal π(y1), y3 and interleaved prior p to produce posterior q. The posterior of previous stage q serves as the prior of next stage p. The first iteration takes 0 as a prior, and at last iteration the posterior is of shape (K, 1), are decoded as by sigmoid function as ˆu = sigmoid(q).
第一解码器gφi,1 获取接收信号y1、y2,并将前p与形状(K,F)解交织,其中 F是每个码位的信息特征大小,以产生具有相同形状(K,F)的后验q。第二个解码器gφi,2 采用交织信号π(y1),y3和交织的先验p来产生后验q。前一阶段q的后验作为下一阶段p的先验。第一次迭代以0为先验,最后一次迭代后验是形状(K,1),通过sigmoid函数解码为ˆu=sigmoid(q)。
Both encoder and decoder structure can be considered as a parametrization of Turbo code. Once we parametrize the encoder and the decoder, since the encoder, channel, and decoder are differentiable, TurboAE can be trained end-to-end via gradient descent and its variants.
编码器和解码器的结构都可以看作是Turbo码的参数化。一旦我们将编码器和解码器参数化,由于编码器、通道和解码器是可微的,TurboAE可以通过梯度下降及其变体进行端到端的训练。
Encoder and Decoder Design: The space of messages and codewords are exponential (For a length-K binary sequence, there are 2^K distinct messages). Hence, the encoder and decoder must have some structural restrictions to ensure generalization to messages unseen during the training. Applying parameter-sharing sequential neural models such as CNN and RNN are natural parametrization methods for both the encoding and the decoding blocks.
编码器和解码器设计:消息和码字的空间是指数的(对于长度为K的二进制序列,有2^K条不同的消息)。因此,编码器和解码器必须有一些结构限制,以确保泛化到训练期间看不到的消息。应用CNN和RNN等参数共享序列神经模型是编码和解码块的自然参数化方法。
RNN models such as Gated Recurrent Unit (GRU) and Long-Short Term Memory (LSTM) are commonly used for sequential modeling problems. RNN is widely used in deep learning based communications systems, as RNN has a natural connection to sequential encoding and decoding algorithms such as convolutional code and BCJR algorithm.
RNN模型,如选通递归单元(GRU)和长短时记忆(LSTM)通常用于序列建模问题。RNN广泛应用于基于深度学习的通信系统中,因为RNN与卷积码和BCJR算法等顺序编码和解码算法有着天然的联系。
However RNN models are: (1) of higher complexity than CNN models, (2) harder to train due to gradient explosion, and (3) harder to run in parallel. In this paper, we use one dimensional CNN (1D-CNN) as the alternative encoding and decoding model. Although the longest dependency length is fixed, 1D-CNN has lower complexity, better trainability, and easier implementation in parallel via AI-chips. The learning curve comparison between CNN and RNN is shown in Figure 4 left.Training CNN-based model converges faster and more stable than RNN-based GRU model.
然而,RNN模型有:(1)比CNN模型复杂度更高,(2)由于梯度爆炸而更难训练,(3)更难并行运行。在本文中,我们使用一维CNN(1D-CNN)作为替代编码和解码模型。虽然最长的依赖长度是固定的,但1D-CNN具有更低的复杂性、更好的可训练性,并且更容易通过AI芯片并行实现。CNN和RNN之间的学习曲线比较如图4所示。基于CNN的训练模型比基于RNN的GRU模型收敛更快、更稳定。
Power Constraint Block: The operation of power constraint blocks depends on the requirement of power constraint.
功率限制块: 功率约束块的操作取决于功率约束的要求。
Soft power constraint normalize the power of code, as E(x) = 0 and E(x^2) = 1. TurboAE-continuous with soft power constraint allows the code x to be continuous. Addressing the statistical estimation issue given a limited batch size, we use normalization method as:xi = (bi−µ(b))/σ(b) . During the training phase, µ(b) and σ(b) are estimated from the whole batch. On the other hand, during the testing phase, µ(b) and σ(b) are pre-computed with multiple batches. The normalization layer can be also considered as BatchNorm without affine projection, which is critical to stabilize the training of the encoder.
软功率约束规范化了代码的实力,如E(x)=0 和 E(x^2)=1。TurboAE continuous 具有软实力约束,允许代码x连续。针对给定有限批量的统计估计问题,我们使用标准化方法:xi=(bi−(b))/ σ(b)。在训练阶段,从整个批次中估计µ(b)和σ(b)。另一方面,在测试阶段,通过多个批次预先计算µ(b)和σ(b)。归一化层也可以被视为无仿射投影的批处理范数,这对于稳定编码器的训练至关重要。
3.2基于直通估计器的TurboAE二值化设计
Some wireless communication system requires hard power constraint, where the encoder output is binary as x ∈ {−1, +1} - so that every symbol has exactly the same power and the information is conveyed in the sign. Hard power constraint is not differentiable, since restricting x ∈ {−1, +1} via x=sign(b) has zero gradient almost everywhere. We combine normalization and Straight-Through Estimator (STE) to bypass this differentiability issue. STE passes the gradient of x=sign(b) as ∂x/∂b=1, enables training encoder by passing estimated gradients to encoder, while enforcing hard power constraint.
一些无线通信系统需要硬功率约束,其中编码器输出为二进制x∈ {−1,+1}——这样每个符号都有完全相同的力量,信息在符号中传递。硬功率约束是不可微的,因为限制x∈ {−1,+1} 通过 x=sigmiod(b) 几乎在所有地方都有零梯度。我们将归一化和直通估计(STE)结合起来,以绕过这个可微性问题。STE将 x=sigmiod(b) 的梯度传递为∂x/∂b=1,通过将估计的梯度传递给编码器来启用编码器训练,同时强制执行硬功率约束。
Simply training with STE cannot learn a good encoder as shown in Figure 4 right. To mitigate the trainability issue, we apply pre-training, which pre-trains TurboAE-continuous firstly, and then add the hard power constraint on top of soft power constraint as x = sign(b−µ(b)/σ(b)), whereas the gradient is estimated via STE. Figure 4 right shows that with pre-training, TurboAE-binary reaches Turbo performance within 100 epochs of fine-tuning.
简单地使用STE训练无法学习一个好的编码器,如图4所示。为了缓解可训练性问题,我们采用预训练,先对TurboAE连续进行预训练,然后在软功率约束的基础上添加硬功率约束,即 x=sign((b−µ(b))/σ(b)), 而梯度是通过STE估计的。图4右侧显示,通过预训练,TurboAE binary在100个微调周期内达到Turbo性能。
TurboAE-binary is slightly worse than TurboAE-continuous as shown in Figure 1, especially at high SNR, since: (a) TurboAE-continuous can be considered as a joint coding and high order modulation scheme, which has a larger capacity than binary coding at high SNR, and (b) STE is an estimated gradient, which makes training encoder more noisy and less stable.
如图1所示,TurboAE二进制比TurboAE continuous稍差,尤其是在高信噪比下,因为:(a)TurboAE continuous可以被视为一种联合编码和高阶调制方案,在高信噪比下比二进制编码具有更大的容量;(b)STE是一种估计的梯度,这使得训练编码器更具噪声,稳定性更低。
3.3神经可训练性设计
The training algorithms for training TurboAE are shown in Algorithm 1. Compared to the conventional deep learning model training, training TurboAE has the following differences:
算法1中显示了训练TurboAE的训练算法。与传统的深度学习模式培训相比,TurboAE培训有以下区别:
Very Large Batch Size. Large batch size is critical to average the channel noise effects. Empirically, TurboAE reaches Turbo performance only when the batch size is grater than 500.
非常大的batch。大批量对平均通道噪声影响至关重要。根据经验,只有当batch大于500时,TurboAE才能达到turbo性能。
Train Encoder and Decoder Separately. We train encoder and decoder separately as shown in Algorithm 1, to avoid getting stuck in local optimum .
分别训练编码器和解码器。我们分别训练编码器和解码器,如算法1所示,以避免陷入局部最优。
Different Training Noise Level for Encoder and Decoder .Empirically, while it is best to train a decoder at a low training SNR as discussed in [25], it is best to train an encoder at a training SNR that matches the testing SNR, e.g training encoder at 2dB results in good encoder when testing at 2dB. In this work, we use random selection of 1.5 to 2 dB for training the decoder, and test and train the encoder at the same SNR.
编码器和解码器的不同训练噪声级。根据经验,虽然最好在[25]中讨论的低训练SNR下训练解码器,但最好在与测试SNR匹配的训练SNR下训练编码器,例如,在2dB下训练编码器在2dB下测试时会产生良好的编码器。在这项工作中,我们使用1.5到2 dB的随机选择来训练解码器,并在相同的信噪比下测试和训练编码器。
4.实验结果
4.1TurboAE的块长编码增益
As block length increases, better reliability can be achieved via channel coding, which is referred to as coding gain. We compare TurboAE with the Turbo code and CNN-AE, tested at BER at 2dB on different block lengths, shown in Figure 5 left. Both CNN-AE and TurboAE are trained with block length 100, and tested on other block lengths. As block length increases, CNN-AE shows saturating coding gain, while TurboAE and Turbo code reduce the error rate as the block length increases. Naively applying general purpose neural network such as CNN to channel coding problem cannot gain performance on long block lengths.
随着块长度的增加,通过信道编码可以实现更好的可靠性,这被称为编码增益。我们将TurboAE与Turbo码和CNN-AE进行比较,在不同的块长度上以2dB的误码率进行测试,如图5所示。CNN-AE和TurboAE都是以100个区块长度进行训练,并在其他区块长度上进行测试。随着块长度的增加,CNN-AE显示出饱和编码增益,而TurboAE和Turbo码则随着块长度的增加而降低错误率。单纯地将CNN等通用神经网络应用于信道编码问题,无法在长块长度上获得性能。
Note that TurboAE is still worse than Turbo when the block length is large, since long block length requires large memory usage and more complicated structure to train. Improving TurboAE on very long block length remains open as an interesting future direction.
请注意,当块长度较大时,TurboAE仍然比Turbo差,因为长块长度需要大量内存使用和更复杂的结构来训练。在很长的区块长度上改进TurboAE仍然是一个有趣的未来方向。
The BER performance boosted by neural architecture design is shown in Figure 5 right. We compare the fine-tuned performance among CNN-AE, TurboAE, and TurboAE without interleaving as xπ = π(x). TurboAE with interleaving significantly outperforms TurboAE without interleaving and CNN-AE.
图5右侧显示了神经结构设计提高的误码率性能。我们比较了CNN-AE、TurboAE和TurboAE在 xπ=π(x)的情况下的微调性能。带交错的TurboAE显著优于不带交错的TurboAE和CNN-AE。
4.2非AWGN信道上的性能
Typically there are no close-form solutions under non-AWGN and non-iid channels. We compare two benchmarks: (1) canonical Turbo code, and (2) DeepTurbo Decoder, a neural decoder fine-tuned at the given channel. We test the performance of both iid channels and non-iid channels in settings which are as:
通常,在非AWGN和非iid信道下没有封闭形式的解决方案。我们比较了两个基准:(1)规范Turbo码,(2)DeepTurbo解码器,一种在给定信道上微调的神经解码器。我们在如下设置中测试iid通道和非iid通道的性能:
(a) iid Additive T-distribution Noise (ATN) Channel, with yi = xi + zi, where iid zi ∼ T (ν, σ^2) is heavy-tail T-distribution noise with variance σ^2. The performance is shown in Figure 6 left.
(a) iid加性T分布噪声(ATN)信道,yi=xi+zi,其中iid zi∼ T(ν,σ^2) 是方差为σ^2的重尾T分布噪声。性能如图6所示。
(b) non-iid Markovian-AWGN channel, is a special AWGN channel with good and bad states. At bad state the noise is worse by 1dB than the SNR, and at good state, the noise is better by 1dB than the SNR. The state transition probability between good and bad states are symmetric as pbg = pgb = 0.8. The performance is shown in Figure 6 right.
(b) 非iid马尔可夫AWGN信道是一种特殊的AWGN信道,具有良好和不良状态。在不良状态下,噪声比SNR差1dB,而在良好状态下,噪声比SNR好1dB。好态和坏态之间的状态转移概率是对称的,pbg=pgb=0.8。性能如右图6所示。
For both ATN and Markovian-AWGN channels, DeepTurbo outperforms canonical Turbo code. TurboAE-continuous with learnable encoder, outperform DeepTurbo in both the cases. TurboAE-binary outperforms DeepTurbo on ATN channel, while on Markovian-AWGN channel, TurboAE-binary while not performing better than DeepTurbo at high SNR regimes, still outperforms canonical Turbo. With the flexibility of designing encoder, TurboAE designs better code than handcrafted Turbo code, for channels without a close-form mathematical representation.
对于ATN和马尔可夫AWGN信道,DeepTurbo都优于规范Turbo码。TurboAE连续可学习编码器,在这两种情况下都优于DeepTurbo。TurboAE binary在ATN信道上的性能优于DeepTurbo,而在马尔可夫AWGN信道上,TurboAE binary在高信噪比下的性能虽然不优于DeepTurbo,但仍优于标准Turbo。由于编码器设计的灵活性,TurboAE为没有封闭形式数学表示的信道设计了比手工制作的Turbo代码更好的代码。
5.结论
In this paper, we propose TurboAE, an end-to-end learnt channel coding scheme with novel neural structure and training algorithms. TurboAE learns capacity-approaching code on various channels under moderate block length by building upon ‘turbo principle’ and thus, exhibits discovery of codes for channels where a closed-form representation may not exist. TurboAE, hence, brings an interesting research direction to design channel coding algorithms via joint encoder and decoder design.
在本文中,我们提出了一种端到端学习信道编码方案TurboAE,它具有新颖的神经结构和训练算法。TurboAE通过建立“turbo原理”,在中等块长的情况下学习各种信道上的接近容量的编码,从而展示了在可能不存在封闭形式表示的信道中发现代码的过程。因此,TurboAE通过联合编码器和解码器设计来设计信道编码算法是一个有趣的研究方向。