快速浏览不同的神经网络架构,它们的优缺点。 (A quick look at the different neural network architectures, their advantages and disadvantages.)
Experimental Machine learning is turning out to be so much fun! After my investigations on replacing some signal processing algorithms with deep neural network, which for the interested reader has been documented in the article “Machine Learning and Signal Processing”, I got around to trying the other two famous neural network architectures: LSTM and CNN.
实验机器学习真是太有趣了! 在我研究了用深度神经网络代替某些信号处理算法之后,对感兴趣的读者进行了研究,该文章已在“ 机器学习和信号处理 ”一文中进行了介绍,之后,我尝试了另外两种著名的神经网络架构:LSTM和CNN。
引入CNN和LSTM (Introducing CNN and LSTM)
Before we get into the details of my comparison, here is an introduction to, or rather, my understanding of the other neural network architectures. We all understand deep neural network, which are simply a set of neurons per layer interconnected sequentially to another set of neurons in the next layer and so on. Each neuron implements the equation y = f(Wx + b) for inputs x and output y, where f is the non-linear activation function, W is the weight matrix and b is the bias. Here is a picture from https://playground.tensorflow.org/
在我们开始进行比较之前,这里先介绍一下我对其他神经网络体系结构的理解。 我们都了解深度神经网络,它只是每层一组神经元,顺序地互连到下一层中的另一组神经元,依此类推。 每个神经元对输入x和输出y实现方程y = f( Wx + b ) ,其中f是非线性激活函数, W是权重矩阵, b是偏差。 这是来自https://playground.tensorflow.org/的图片
有线电视新闻网 (CNN)
A convolutional neural network, CNN, is adding additional “filtering” layers where the filter weights (or convolution kernels if you prefer fancier words :) can be learned in addition to the weights and biases for each neuron. It is still the back propagation that is doing this job for us, but we shall not make it too easy for the trusty workhorse that is backprop!
卷积神经网络CNN正在添加其他“过滤”层,除了每个神经元的权重和偏差之外,还可以在其中学习过滤器权重(如果您更喜欢用词,则可以是卷积核:)。 仍然是反向传播为我们完成了这项工作,但是对于可逆的可信赖的主力马车,我们绝不会太容易!
Here is a picture I made in PowerPoint to explain the CNN. There are better pictures on the web with cool graphics, but I don’t want to copy the hardwork from someone else. When I am creating my content, I have to create my own illustrations too! Which is why content creation is a hard job. Despite that, the internet today is built by people who have created awesome content because they had fun doing so!
这是我在PowerPoint中制作的用于解释CNN的图片。 网络上有更好的图片,带有漂亮的图形,但是我不想复制别人的努力。 创建内容时,我也必须创建自己的插图! 这就是为什么内容创建是一项艰巨的工作。 尽管如此,今天的互联网还是由那些创造了很棒内容的人建立的,因为他们这样做很有趣!
As you can see in the above picture, a CNN has several parallel filters which can be tuned to extract different features of interest. But of course, we won’t design the filters to do so like we do in Signal Processing, but we will let back propagation compute the filter weights.
如上图所示,CNN具有几个并行滤波器,可以对其进行调整以提取感兴趣的不同特征。 但是,当然,我们不会像在信号处理中那样设计滤波器,而是让回传计算滤波器权重。
Those readers who are familiar with Signal Processing can make the connection to filter banks to separate high and low frequencies. This idea plays an important role in compressing images, where filter banks can be used to separate low and high frequencies, and only low frequencies need to be kept. Let us not digress, however.
那些熟悉信号处理的读者可以将其连接到滤波器组以分离高频和低频。 这个想法在压缩图像中起着重要作用,在该图像中,滤波器组可用于分离低频和高频,而仅需保留低频。 但是,让我们不要离题。
The input vector is filtered by each of these “convolutional” layers. They “convolve” the input vector with a kernel (the filter impulse response). Convolution is one of the fundamental operations in linear systems, as fundamental as multiplication is to numbers. In fact, convolution operation is exactly same as polynomial multiplication. If you do multiply two polynomials and evaluate the result with x=10, you will get your regular long multiplication for numbers. I digress again.
输入向量被这些“卷积”层中的每一个过滤。 它们使输入向量与内核(滤波器脉冲响应)“卷积”。 卷积是线性系统中的基本运算之一,因为乘法与数字一样是基本的。 实际上,卷积运算与多项式乘法完全相同。 如果确实将两个多项式相乘并以x = 10评估结果,则将得到数字的常规长乘。 我又离题了。
Each convolutional layer then generates its own output vector, so the dimension increases by K if we have K convolutional layers. To reduce the dimensionality, we use a “pooling” layer — either compute MAX/MIN or average of a certain number of samples. Concatenate the output of all the pooling layers and pass it through a dense layer to generate output.
然后每个卷积层都会生成自己的输出矢量,因此如果我们有K个卷积层,则维数将增加K。 为了降低维数,我们使用“池化”层-计算MAX / MIN或一定数量样本的平均值。 连接所有池化层的输出,并将其通过密集层以生成输出。
RNN和LSTM (RNN and LSTM)
An LSTM (Long Short Term Memory) is a type of Recurrent Neural Network (RNN), where the same network is trained through sequence of inputs across “time”. I say “time” in quotes, because this is just a way of splitting the input vector in to time sequences, and then looping through the sequences to train the network.
LSTM(长期短期记忆)是一种递归神经网