Datawhale 零基础入门CV赛事-Task3 字符识别模型

最新推荐文章于 2024-08-27 19:27:18 发布

jxwnj_1210

最新推荐文章于 2024-08-27 19:27:18 发布

阅读量86

点赞数

分类专栏： CV入门学习记录文章标签：神经网络

本文链接：https://blog.csdn.net/weixin_46167928/article/details/106362863

版权

CV入门学习记录专栏收录该内容

5 篇文章 0 订阅

订阅专栏

1、构建MNIST数据的CNN模型
为更好的完成这道题，作者先去学习了一下经典字符识别MNIST的CNN模型：

class Net(nn.Module):
    def __init__(self):
        super(Net,self).__init__()
        self.conv1=torch.nn.Conv2d(1,10,kernel_size=5)
        self.conv2=torch.nn.Conv2d(10,20,kernel_size=5)#卷积
        self.pooling=torch.nn.MaxPool2d(2)#下采样
        self.fc=nn.Linear(320,10)
        
    def forward(self,x):
        batch_size=x.size(0)
        x=F.relu(self.pooling(self.conv1(x)))
        x=F.relu(self.pooling(self.conv2(x)))
        x=x.view(batch_size,-1)
        x=self.fc(x)
        return x

model=Net()

其中torch.nn.Conv2d表示卷积，torch.nn.MaxPool2d(2)是池化过程，nn.Linear(320,10)是全联接过程，因为分类成10类，所以输出值要为10.

在forward预测中使用了经典的F.relu的函数计算，是一种sigmiod函数。

2、CNN模型重要参数Padding-----扩展output的大小

input=[3,4,5,6,7,
      4,7,2,5,8,
      1,5,8,3,6,
      2,4,5,7,9,
      3,4,5,6,8]
input=torch.Tensor(input).view(1,1,5,5)
#batch_size\channel\h\w

conv_layer=nn.Conv2d(1,1,kernel_size=3,padding=1,bias=False)
kernel=torch.Tensor([1,2,3,4,5,6,7,8,9]).view(1,1,3,3)
conv_layer.weight.data=kernel.data

output=conv_layer(input)
print(output)

输出结果为：
tensor([[[[134., 164., 187., 218., 158.],
[133., 208., 226., 253., 149.],
[116., 192., 235., 291., 184.],
[111., 191., 235., 296., 194.],
[ 55., 87., 112., 144., 89.]]]]
如果没有padding=1，输出就是3x3的矩阵

3、使用本题数据构建CNN模型**

# 定义模型
class SVHN_Model1(nn.Module):
    def __init__(self):
        super(SVHN_Model1, self).__init__()
        # CNN提取特征模块
        self.cnn = nn.Sequential(
            nn.Conv2d(3, 16, kernel_size=(3, 3), stride=(2, 2)),
            nn.ReLU(),  
            nn.MaxPool2d(2),
            nn.Conv2d(16, 32, kernel_size=(3, 3), stride=(2, 2)),
            nn.ReLU(), 
            nn.MaxPool2d(2)
        )
        # 
        self.fc1 = nn.Linear(32*3*7, 11)
        self.fc2 = nn.Linear(32*3*7, 11)
        self.fc3 = nn.Linear(32*3*7, 11)
        self.fc4 = nn.Linear(32*3*7, 11)
        self.fc5 = nn.Linear(32*3*7, 11)
    
    def forward(self, img):        
        feat = self.cnn(img)
        feat = feat.view(feat.shape[0], -1)
        c1 = self.fc1(feat)
        c2 = self.fc2(feat)
        c3 = self.fc3(feat)
        c4 = self.fc4(feat)
        c5 = self.fc5(feat)
        return c1, c2, c3, c4, c5
    
model = SVHN_Model1()

这个CNN模型包括两个卷积层
与MNIST字符模型不同，由于这题每张图中不止一个字符，所以在本题中我们采用了定字长的方法，假设每张图上均有5个字符，从而需要并联5个全连接层进行分类

jxwnj_1210

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Datawhale 零基础入门CV赛事-Task3 字符识别模型

1、构建MNIST数据的CNN模型为更好的完成这道题，作者先去学习了一下经典字符识别MNIST的CNN模型：class Net(nn.Module): def __init__(self): super(Net,self).__init__() self.conv1=torch.nn.Conv2d(1,10,kernel_size=5) self.conv2=torch.nn.Conv2d(10,20,kernel_size=5)#卷积
复制链接

扫一扫