pytorch 文本分类
Hello Readers,
各位读者好,
I am a Data Scientist working with a major bank in Australia in Machine Learning automation space. For a project that i was working on i was looking to build a text classification model and having my focus shift from Tensorflow to Pytorch recently (for no reason other than learning a new framework), i started exploring Pytorch CNN 1d architecture for my model.
我是一名数据科学家,曾在澳大利亚的一家大型银行从事机器学习自动化领域的研究。 对于我正在从事的一个项目,我正在寻求构建文本分类模型,并且最近将重点从Tensorflow转移到Pytorch(除了学习新框架之外,没有其他原因),我开始为我的模型探索Pytorch CNN 1d体系结构。
I personally found it a bit confusing after seeing some examples on internet which show word embedding dimensions as input to Conv1d layer. This may be easy to understand & obvious for many but i like to understand things visually so i started exploring how can i best understand this. Having read some articles it gave me a brief idea but i couldn’t say with 100% confidence that now i completely understand this so i decided to explore this myself using some dummy variables and comparing the Conv1d layer output with manual calculation which i am going to share in this article.
在网上看到一些示例,这些示例显示了单词嵌入维数作为Conv1d层的输入后,我个人觉得有些困惑。 这对于许多人来说可能很容易理解并且很明显,但是我喜欢从视觉上理解事物,因此我开始探索如何才能最好地理解这一点。 阅读了一些文章后,它给了我一个简短的主意,但我不能百分百地说,现在我已经完全理解了这一点,所以我决定使用一些虚拟变量自己探索这个问题,并将Conv1d层的输出与我要进行的手动计算进行比较分享这篇文章。
Before we move on, i am assuming that reader understands the basic concept of Deep Learning, ANN, CNN, Word Embedding etc. If not, this may be a bit difficult to grasp. But let’s get started.
在我们继续之前,我假设读者已经理解了深度学习,ANN,CNN,单词嵌入等基本概念。如果没有,这可能有点难以掌握。 但是,让我们开始吧。
For easiness, i am going to use a simple example where we have sentence length of 5 and word embedding dimension of 3, so
为了简便起见,我将使用一个简单的示例,其中句子长度为5,单词嵌入维为3,因此
n =1 (Number of samples/batch size)
n = 1(样本数量/批次大小)
d=3 (Word Embedding Dimension)
d = 3(词嵌入维)
l = 5 (Sentence Length)
l = 5(句子长度)
so shape of one sample should be (1,5,3) hence i am going to create a random pytorch array having same shape.
因此,一个样本的形状应为(1、5、3),因此我将创建一个具有相同形状的随机pytorch数组。
>> import numpy
>> import torch.nn as nn
>> import torch>> n=1 #batch size
>> l=5 #sentence len
>> d=3 #embedding dimension>> rand_arr = torch.rand(n,l,d)
As mentioned earlier, embedding dimension size can be the input to Conv1d layer and just for show case purpose we would ask Conv1d layer to output 1 channel. Let’s define the Conv1d layer as —
如前所述,嵌入尺寸大小可以作为Conv1d层的输入,仅出于演示目的,我们将要求Conv1d层输出1个通道。 让我们将Conv1d图层定义为-
input channels = 3
输入通道= 3
output channels = 1
输出通道= 1
filter = 2
过滤器= 2
stride = 1 (default)
步幅= 1(默认)
>> conv1 = nn.Conv1d(d, 1, 2)
Now, let’s look at the random array we generated and it’s shape
现在,让我们看一下我们生成的随机数组及其形状
>> print(rand_arr)
[[[0.9527, 0.9451, 0.2209] [0.0332, 0.8993, 0.8718], [0.7281, 0.4627, 0.7274], [0.1490, 0.4004, 0.3260], [0.3055, 0.7935, 0.1360]]]
[[[0.9527,0.9451,0.2209] [0.0332,0.8993,0.8718],[0.7281,0.4627,0.7274],[0.1490,0.4004,0.3260],[0.3055,0.7935,0.1360]]]
>> print(rand_arr.shape)torch.Size([1, 5, 3])
Assuming the 5 word sentence for which we created 3 dimensional word embedding in the above steps was - “Word embedding are so cool” .
假设在上述步骤中我们为其创建了3维单词嵌入的5个单词句子是-“单词嵌入太酷了”。
We can view this above sentence-embedding relationship horizontally as -
我们可以将上述句子-嵌入关系水平地视为-
Word — [0.9527, 0.9451, 0.2209]
字词-[0.9527、0.9451、0.2209]
embedding — [0.0332, 0.8993, 0.8718]
嵌入-[0.0332,0.8993,0.8718]
are — [0.7281, 0.4627, 0.7274]
是-[0.7281、0.4627、0.7274]
so — [0.1490, 0.4004, 0.3260]
因此-[0.1490,0.4004,0.3260]
cool — [0.3055, 0.7935, 0.1360]
酷— [0.3055,0.7935,0.1360]
To make the random array to be fit for defined conv1 layer, we would have to reshape it to have channel (which is this case is embedding) first.
为了使随机数组适合已定义的conv1层,我们必须对其进行重塑以使其具有通道(在这种情况下是嵌入的)。
>> rand_arr_permute = rand_arr.permute(0,2,1)
>> print(rand_arr_permute)tensor([[[0.9527, 0.0332, 0.7281, 0.1490, 0.3055],
[0.9451, 0.8993, 0.4627, 0.4004, 0.7935],
[0.2209, 0.8718, 0.7274, 0.3260, 0.1360]]])
Now, we can view the sentence-embedding relationship vertically as —
现在,我们可以将句子嵌入关系垂直视为-
Word embedding are so cool
[0.9527, 0.0332, 0.7281, 0.1490, 0.3055],
[0.9451, 0.8993, 0.4627, 0.4004, 0.7935],
[0.2209, 0.8718, 0.7274, 0.3260, 0.1360]
In case of Conv1d, we do not have filters that stride through a two dimensional matrix horizontally and vertically as shown below —
对于Conv1d,我们没有水平和垂直跨过二维矩阵的过滤器,如下所示-
Image Source — https://towardsdatascience.com/types-of-convolutions-in-deep-learning-717013397f4d
图像来源-https: //towardsdatascience.com/types-of-convolutions-in-deep-learning-717013397f4d
But, the filter only strides through 1 dimension i.e horizontal so if we say that our kernel size is 2, it means it will stride through pair of 2 words (bi-grams) which are represented vertically after the permutation step we performed above. For illustration, see images below —
但是,过滤器仅跨过1个维度(即水平),因此,如果我们说我们的内核大小为2,则意味着它将跨过2个单词(二元语法)的对,这些单词在我们上面执行的置换步骤后垂直表示。 有关插图,请参见下面的图像—
Can you now guess what would be the shape of our conv1d filter weight matrix?
您现在能猜出我们的conv1d过滤器权重矩阵的形状是什么吗?
It is (1,3,2) wherein shape[0] = 1 is the number of samples, shape[1] = 3 is the input embedding size and shape[2] = 2 is the filter size.
它是(1,3,2),其中shape [0] = 1是样本数,shape [1] = 3是输入嵌入大小,shape [2] = 2是过滤器大小。
Since we have provided input size equal to embedding dimension so it will always have the shape[1] same as embedding size to enable striding on the full word or pair of full words.
由于我们提供的输入大小等于嵌入尺寸,因此它将始终具有与嵌入尺寸相同的shape [1],以便能够跨完整词或一对完整词。
I hope this gives you some clarity in imagining how the 1d convolution takes place on text data along the dimension channels.
我希望这可以使您在想象如何在尺寸通道上的文本数据上进行一维卷积时获得一些清晰度。
If you are someone who do not care about maths of how you get your results then you can skip the below section of article.
如果您不关心如何获得结果的数学方法,则可以跳过本文的以下部分。
But for those who are interested in calculation, for the sake of simplicity lets begin with taking only the first two words of our sentence i.e. Word and embedding
但是对于那些对计算感兴趣的人,为了简单起见,让我们从仅获取句子的前两个单词开始,即单词和嵌入
So our new array would look like —
所以我们的新数组看起来像是
Word embedding
0.9527, 0.0332,
0.9451, 0.8993,
0.2209, 0.8718,
Lets, now look at the initial weights of the conv1 layer we defined earlier.
现在,让我们看一下我们先前定义的conv1层的初始权重。
>> print(conv1.weight)Parameter containing:
tensor([[[ 0.1952, -0.1954],
[ 0.3689, -0.2420],
[-0.1060, 0.0735]]], requires_grad=True)
If we were to perform convolution calculation on the above weight and embedding values, we should perform element wise multiplication of weights (which is 3*2) and embedding (which is also 3*2). Sum them all up to a single value and add bias term to it which is exactly what we are going to do in next step.
如果要对上述权重和嵌入值进行卷积计算,则应该对权重(3 * 2)和嵌入(也为3 * 2)进行元素逐级乘法。 将它们全部求和为单个值,并向其添加偏差项,这正是我们下一步要做的。
But before that i should mention that the shape of bias term in conv1d is equal to number of output channels which is one bias term for each channel. Since we have defined the output channel as 1 so it is supposed to have shape of 1 and the randomly assigned value is 0.2665.
但是在此之前,我应该提到conv1d中偏置项的形状等于输出通道的数量,这是每个通道的一个偏置项。 由于我们已将输出通道定义为1,因此假定它的形状为1,并且随机分配的值为0.2665。
>> print(conv1.bias)
Parameter containing:
tensor([0.2665], requires_grad=True)>> conv1.bias.shape
torch.Size([1])
Lets, now do the convolution calculation.
现在,进行卷积计算。
>> print(torch.sum((rand_arr_permute[:,:,:2]*conv1.weight)) + conv1.bias)tensor([0.6177], grad_fn=<AddBackward0>)
which is equivalent to —
等效于-
((0.9527*0.1952)+(0.0332*-0.1954)+(0.9451*0.3689)+(0.8993*-0.242)+(0.2209*-0.106)+(0.8718*0.0735)) + (0.2665) = 0.6177
((0.9527 * 0.1952)+(0.0332 * -0.1954)+(0.9451 * 0.3689)+(0.8993 * -0.242)+(0.2209 * -0.106)+(0.8718 * 0.0735))+(0.2665)= 0.6177
After this calculation the filter will shift 1 step to the right in the matrix and perform same calculation on the next pair of words i.e. “embedding are”. So after one full cycle we will have 4 outputs generated by pairs —
在此计算之后,滤波器将在矩阵中向右移动1步,并对下一对单词(即“嵌入是”)执行相同的计算。 因此,在一个完整的周期之后,我们将有成对产生4个输出-
- (Words, embedding) — 0.6177 (单词,嵌入)— 0.6177
- (embedding, are) — 0.3115 (嵌入,是)-0.3115
- (are, so) — 0.4002 (是如此)-0.4002
- (so, cool) — 0.1670 (很酷)-0.1670
>> print(conv1(rand_arr_permute))tensor([[[0.6177, 0.3115, 0.4002, 0.1670]]], grad_fn=<SqueezeBackward1>)
so your output would be shape torch.Size([1, 1, 4]) wherein shape[0]=1 is sample size, shape[1]=1 is output channels and shape[2]=4 which is the reduced convoluted (or reduced) embedding dimension. We can now apply max pool on it which will reduce the dimension size even further.
因此您的输出将是shape torch.Size([1,1,4]),其中shape [0] = 1是样本大小,shape [1] = 1是输出通道,shape [2] = 4是减少的卷积(或缩小)嵌入尺寸。 现在,我们可以对其应用最大池,这将进一步减小尺寸大小。
Hope you found this article helpful in understanding how 1d convolution takes place in Pytorch and also in visualizing how the kernel strides though the pair of words in sentences.
希望您发现本文对理解Pytorch中的1d卷积如何发生以及可视化内核如何跨句子中的一对单词有帮助。
Thanks & Regards,Sam
谢谢与问候,山姆
For feedback please email at sumanshusamarora@gmail.com
有关反馈,请发送电子邮件至sumanshusamarora@gmail.com
pytorch 文本分类