TextCNN结构
以下为原论文中的模型结构图:
- embedding
- 卷积层
- MaxPooling
- Flatten
- 全连接层
卷积操作
- 在卷积神经网络中仅涉及离散卷积的情形。
- 卷积运算的作用就类似与滤波,因此也称卷积核为filter滤波器。
- 卷积神经网络的核心思想是捕捉局部特征(n-gram)。CNN的优势在于能够自动地对g-gram特征进行组合和筛选,获得不同抽象层次的语义信息。
- 下图为用于文本分类任务的TextCNN结构描述(这里详细解释了TextCNN架构以及词向量矩阵是如何做卷积的)
- 输入层: n ∗ k n*k n∗k的矩阵,n为句子中的单词数,k为embedding_size。(为了使向量长度一致,对原句进行了padding操作)
- 卷积层:在NLP中输入层是一个由词向量拼成的词矩阵,且卷积核的宽和该词矩阵的宽相同,该宽度即为词向量大小,且卷积核只会在高度方向移动。输入层的矩阵与我们的filter进行convolution,然后经过激活函数得到feature map。filter这里有三种大小(3,4,5)。
- 池化层:max-pooling
- softmax输出结果。
简单代码实现
import torch
import numpy as np
import torch.nn as nn
import torch.optim as optim
import torch.utils.data as Data
import torch.nn.functional as F
dtype = torch.FloatTensor
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# 3 words sentences (=sequence_length is 3)
sentences = ["i love you", "he loves me", "she likes baseball", "i hate you", "sorry for that", "this is awful"]
labels = [1, 1, 1, 0, 0, 0] # 1 is good, 0 is not good.
embedding_size = 2
sequence_length = len(sentences[0])
num_classes = len(set(labels))
batch_size = 3
word_list = " ".join(sentences).split()
vocab = list(set(word_list))
word2idx = {
w:i for i,w in enumerate(vocab)}
vocab_size = len(vocab)
def make_data(sentences, labels):
inputs = []
for sen in sentences:
inputs.append([word2idx[n] for n in sen.split()])
targets = []
for out in labels:
targets.append(out)
return inputs, targets
input_batch, target_batch = make_data(sentences, labels)
input_batch, target_batch = torch.LongTensor(input_batch), torch.LongTensor(target_batch)
dataset = Data.TensorDataset(input_batch,target_batch)
loader = Data.DataLoader(dataset, batch_size, True)
class TextCNN(nn.Module):
def __init__(self):
super(TextCNN, self).__init__()
self.W = nn.Embedding(vocab_size, embedding_size)
output_channel = 3
self.conv = nn.Sequential(nn.Conv2d(1, output_channel, (2,embedding_size)), # inpu_channel, output_channel, 卷积核高和宽 n-gram 和 embedding_size
nn.ReLU(),
nn.MaxPool2d((2,1)))
self.fc = nn.Linear(output_channel,num_classes)
def forward(self, X):
'''
X: [batch_size, sequence_length]
'''
batch_size = X.shape[0]
embedding_X = self.W(X) # [batch_size, sequence_length, embedding_size]
embedding_X = embedding_X.unsqueeze(1) # add channel(=1) [batch, channel(=1), sequence_length, embedding_size]
conved = self.conv(embedding_X) # [batch_size, output_channel,1,1]
flatten = conved.view(batch_size, -1)# [batch_size, output_channel*1*1]
output = self.fc(flatten)
return output
维度变换
- 输入X:[batch_size, sequence_length]
- embedding:相当于把单词增加了一个维度。[batch_size, sequence_length, embedding_size];然后我们对它做了一个unsqueeze(1)操作,原因是卷积操作的需要。[batch_size, channel(=1), sequence_length, embedding_size]
- conved:我们这了进行了一个二维卷积,input_channel为1,output_channel为3,filter_size为(2,embedding_size),相当于bi-gram。
[batch_size, output_channel,sequence_len-1, 1]
卷积输出的高和宽的计算公式:
h e i g h