1、构建MNIST数据的CNN模型
为更好的完成这道题,作者先去学习了一下经典字符识别MNIST的CNN模型:
class Net(nn.Module):
def __init__(self):
super(Net,self).__init__()
self.conv1=torch.nn.Conv2d(1,10,kernel_size=5)
self.conv2=torch.nn.Conv2d(10,20,kernel_size=5)#卷积
self.pooling=torch.nn.MaxPool2d(2)#下采样
self.fc=nn.Linear(320,10)
def forward(self,x):
batch_size=x.size(0)
x=F.relu(self.pooling(self.conv1(x)))
x=F.relu(self.pooling(self.conv2(x)))
x=x.view(batch_size,-1)
x=self.fc(x)
return x
model=Net()
其中torch.nn.Conv2d表示卷积,torch.nn.MaxPool2d(2)是池化过程,nn.Linear(320,10)是全联接过程,因为分类成10类,所以输出值要为10.
在forward预测中使用了经典的F.relu的函数计算,是一种sigmiod函数。
2、CNN模型重要参数Padding-----扩展output的大小
input=[3,4,5,6,7,
4,7,2,5,8,
1,5,8,3,6,
2,4,5,7,9,
3,4,5,6,8]
input=torch.Tensor(input).view(1,1,5,5)
#batch_size\channel\h\w
conv_layer=nn.Conv2d(1,1,kernel_size=3,padding=1,bias=False)
kernel=torch.Tensor([1,2,3,4,5,6,7,8,9]).view(1,1,3,3)
conv_layer.weight.data=kernel.data
output=conv_layer(input)
print(output)
输出结果为:
tensor([[[[134., 164., 187., 218., 158.],
[133., 208., 226., 253., 149.],
[116., 192., 235., 291., 184.],
[111., 191., 235., 296., 194.],
[ 55., 87., 112., 144., 89.]]]]
如果没有padding=1,输出就是3x3的矩阵
3、使用本题数据构建CNN模型**
# 定义模型
class SVHN_Model1(nn.Module):
def __init__(self):
super(SVHN_Model1, self).__init__()
# CNN提取特征模块
self.cnn = nn.Sequential(
nn.Conv2d(3, 16, kernel_size=(3, 3), stride=(2, 2)),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Conv2d(16, 32, kernel_size=(3, 3), stride=(2, 2)),
nn.ReLU(),
nn.MaxPool2d(2)
)
#
self.fc1 = nn.Linear(32*3*7, 11)
self.fc2 = nn.Linear(32*3*7, 11)
self.fc3 = nn.Linear(32*3*7, 11)
self.fc4 = nn.Linear(32*3*7, 11)
self.fc5 = nn.Linear(32*3*7, 11)
def forward(self, img):
feat = self.cnn(img)
feat = feat.view(feat.shape[0], -1)
c1 = self.fc1(feat)
c2 = self.fc2(feat)
c3 = self.fc3(feat)
c4 = self.fc4(feat)
c5 = self.fc5(feat)
return c1, c2, c3, c4, c5
model = SVHN_Model1()
这个CNN模型包括两个卷积层
与MNIST字符模型不同,由于这题每张图中不止一个字符,所以在本题中我们采用了定字长的方法,假设每张图上均有5个字符,从而需要并联5个全连接层进行分类