Ch4. Example 4-4 Classify surnames with Convolutional Neural Network-CSDN博客

本文链接：https://blog.csdn.net/w295286543/article/details/100837190

祸莫大于不知足，咎莫大于欲得，故知足之足，常足矣。

这是第4章的第二个案例，用卷积网络代替感知机来解决人名分类问题。

代码笔记：

1. 经过向量化后，数据的结构是怎样的？

答：[batch_size, vocab_size, max_surname_length]

2. 文本是一维数据，因此用的是一维卷积（在word-level上是一维卷积；虽然文本经过词向量表达后是二维数据，但是在embedding-level上的二维卷积没有意义）。一维卷积带来的问题是需要通过设计不同 kernel_size 的 filter 获取不同宽度的视野。

3. 卷积网络中的层数，核大小，膨胀系数，步幅都是如何选取的？

答：难道真的靠猜和试验？

4. 当输出通道数增大（num_channels默认256），计算量也随之增大

以下是网络构建：

class SurnameClassifier(nn.Module):
    """ A 2layer multilayer perceptron for classifying surnames """
    def __init__(self,initial_num_channels,num_classes,num_channels):
        """
        Args:
            initial_num_channels (int): size of the incoming feature vector
            num_classes (int): size of the output prediction vector
            num_channels (int): constant channel size to use throughout network        
        """
        super(SurnameClassifier,self).__init__()
        self.convnet=nn.Sequential(
            nn.Conv1d(in_channels=initial_num_channels,out_channels=num_channels,kernel_size=3),
            nn.ELU(),
            nn.Conv1d(in_channels=num_channels,out_channels=num_channels,kernel_size=3,stride=2),
            nn.ELU(),
            nn.Conv1d(in_channels=num_channels,out_channels=num_channels,kernel_size=3,stride=2),
            nn.ELU(),
            nn.Conv1d(in_channels=num_channels,out_channels=num_channels,kernel_size=3),
            nn.ELU()
            )
        self.fc=nn.Linear(num_channels,num_classes)
    
    def forward(self,x_surname,apply_softmax=False):
        """The forward pass of the classifier
        
        Args:
            x_surname (torch.Tensor): an input data tensor
            x_surname.shape should be (batch, initial_num_channels,
                                        max_surname_length)
            apply_softmax (bool): a flag for the softmax activation
                should be false if used with the Cross Entropy losses
        Returns:
            the resulting tensor. tensor.shape should be (batch,)
        """
        features1=self.convnet(x_surname) # output shape [128,256,1]
        features=features1.squeeze(dim=2) # output shape [128,256]
        prediction_vector=self.fc(features)
    
        if apply_softmax:
            prediction_vector=F.softmax(prediction_vector,dim=1)
        
        return prediction_vector