祸莫大于不知足,咎莫大于欲得,故知足之足,常足矣。
这是第4章的第二个案例,用卷积网络代替感知机来解决人名分类问题。
代码笔记:
1. 经过向量化后,数据的结构是怎样的?
答:[batch_size, vocab_size, max_surname_length]
2. 文本是一维数据,因此用的是一维卷积(在word-level上是一维卷积;虽然文本经过词向量表达后是二维数据,但是在embedding-level上的二维卷积没有意义)。一维卷积带来的问题是需要通过设计不同 kernel_size 的 filter 获取不同宽度的视野。
3. 卷积网络中的层数,核大小,膨胀系数,步幅都是如何选取的?
答:难道真的靠猜和试验?
4. 当输出通道数增大(num_channels默认256),计算量也随之增大
以下是网络构建:
class SurnameClassifier(nn.Module):
""" A 2layer multilayer perceptron for classifying surnames """
def __init__(self,initial_num_channels,num_classes,num_channels):
"""
Args:
initial_num_channels (int): size of the incoming feature vector
num_classes (int): size of the output prediction vector
num_channels (int): constant channel size to use throughout network
"""
super(SurnameClassifier,self).__init__()
self.convnet=nn.Sequential(
nn.Conv1d(in_channels=initial_num_channels,out_channels=num_channels,kernel_size=3),
nn.ELU(),
nn.Conv1d(in_channels=num_channels,out_channels=num_channels,kernel_size=3,stride=2),
nn.ELU(),
nn.Conv1d(in_channels=num_channels,out_channels=num_channels,kernel_size=3,stride=2),
nn.ELU(),
nn.Conv1d(in_channels=num_channels,out_channels=num_channels,kernel_size=3),
nn.ELU()
)
self.fc=nn.Linear(num_channels,num_classes)
def forward(self,x_surname,apply_softmax=False):
"""The forward pass of the classifier
Args:
x_surname (torch.Tensor): an input data tensor
x_surname.shape should be (batch, initial_num_channels,
max_surname_length)
apply_softmax (bool): a flag for the softmax activation
should be false if used with the Cross Entropy losses
Returns:
the resulting tensor. tensor.shape should be (batch,)
"""
features1=self.convnet(x_surname) # output shape [128,256,1]
features=features1.squeeze(dim=2) # output shape [128,256]
prediction_vector=self.fc(features)
if apply_softmax:
prediction_vector=F.softmax(prediction_vector,dim=1)
return prediction_vector