Datawhale 零基础入门CV赛事-Task1 赛题理解

最新推荐文章于 2021-02-20 21:27:09 发布

weixin_43248815

最新推荐文章于 2021-02-20 21:27:09 发布

阅读量91

点赞数

本文链接：https://blog.csdn.net/weixin_43248815/article/details/106246394

版权

一、比赛背景

赛题以街道字符为为赛题数据，该数据来自收集的SVHN街道字符，并进行了匿名采样处理。
选手提交结果与实际图片的编码进行对比，以编码整体识别准确率为评价指标。具体计算公式如下：
Score=编码识别正确的数量/测试集图片数量
数据示例：

二、解题思路

简单入门思路：定长字符识别
可以将赛题抽象为一个定长字符识别问题，在赛题数据集中大部分图像中字符个数为2-4个，最多的字符个数为6个。
因此可以对于所有的图像都抽象为6个字符的识别问题，字符23填充为23XXXX，字符231填充为231XXX。
标注
2. 数据读取，JSON中标签的读取方式：

import json
train_json = json.load(open('../input/train.json'))

# 数据标注处理

def parse_json(d):
   arr = np.array([
       d['top'], d['height'], d['left'],  d['width'], d['label']
   ])
   arr = arr.astype(int)
   return arr

img = cv2.imread('../input/train/000000.png')
arr = parse_json(train_json['000000.png'])

plt.figure(figsize=(10, 10))
plt.subplot(1, arr.shape[1]+1, 1)
plt.imshow(img)
plt.xticks([]); plt.yticks([])

for idx in range(arr.shape[1]):
   plt.subplot(1, arr.shape[1]+1, idx+2)
   plt.imshow(img[arr[0, idx]:arr[0, idx]+arr[1, idx],arr[2, idx]:arr[2, idx]+arr[3, idx]])
   plt.title(arr[4, idx])
   plt.xticks([]); plt.yticks([])

三、零基础自学colab与pytorch

# 查看torch版本
import torch
print(torch.__version__)  #注意是双下划线

# colab中可以自己安装个版本torch

!pip3 install torch

v = torch.tensor([1,2,3])
print(v)

x = torch.arange(18).view(3,2,3)
x

x = torch.arange(18).view(3,6)
x

x[0][2]

x = torch.arange(18).view(3,2,3)
x

x[0][1]

x = torch.tensor(2.0,requires_grad = True)
y = 9*x**4 + 2*x**3 + 3*x**2 + 6*x + 1
y.backward()
x.grad

x = torch.tensor(1.0,requires_grad = True)
z = torch.tensor(2.0, requires_grad = True)
y = x**2 + z**3
y.backward()
x.grad

z.grad