目录
任务:对食物的图片使用CNN进行分类
一共有11种食物
训练集(Training set):9866张带有标签label的图片(label包含在图片title种)
验证集(Validation set):3430张带有label的图片
测试集(Testing set):3347张没有label的图片
Baseline
Simple:0.50099(运行初始代码即可)
Medium:0.73207 Training Augmentation+Train Longer
进行数据增强,增加训练次数
Strong:0.81872 Training Augmentation+Model Design+Train Longer(Cross
Validation+Ensemble)
Boss:0.88446 Training Augmentation+Model Design+Test Time Augmentation +Train Longer(Cross Validation+Ensemble)
Simple
初始代码运行时遇到了一个bug,在Dataset模块
一开始使用GPU训练时,报错了但无法定位出错位置,后来换成CPU训练,定位到错误
解决了一个bug
使用GPU时的error
RuntimeError Traceback (most recent call last)
<ipython-input-10-b55e170576c6> in <module>()
49
50 # Compute the gradients for parameters.
---> 51 loss.backward()
52
53 # Clip the gradient norms for stable training.
D:\Python_resource\ANACONDA\anaconda\envs\pytorch\lib\site-packages\torch\_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
使用CPU时的error
切换CPU训练时,得重启项目
IndexError Traceback (most recent call last)
<ipython-input-9-8535fa60da04> in <module>()
43 # Calculate the cross-entropy loss.
44 # We don't need to apply softmax before computing cross-entropy as it is done automatically.
---> 45 loss = criterion(logits, labels.to(device))
46
47 # Gradients stored in the parameters in the previous step should be cleared out first.
IndexError: Target -1 is out of bounds.
发现是在计算training set的loss时出现了-1的下标越界,然后就通过train_dataset查看了读取到的training data,发现所有training_data读取进来的label都是-1,然后查看数据集,所有training图片都没有-1的label
#所有training_data读取进来的label都是-1
print(train_set.__getitem__(1))
'''
(tensor([[[0.0118, 0.0196, 0.0235, ..., 0.0078, 0.0000, 0.0039],
[0.1843, 0.3843, 0.4392, ..., 0.1647, 0.0118, 0.0039],
[0.5804, 0.6314, 0.6353, ..., 0.2863, 0.0157, 0.0000],
...,
[0.3804, 0.3686, 0.3647, ..., 0.7294, 0.2314, 0.0078],
[0.4078, 0.4039, 0.3922, ..., 0.6549, 0.1412, 0.0039],
[0.2039, 0.3294, 0.3765, ..., 0.2980, 0.0314, 0.0039]],
[[0.0118, 0.0235, 0.0314, ..., 0.0118, 0.0039, 0.0000],
[0.2118, 0.4157, 0.4784, ..., 0.1608, 0.0118, 0.0000],
[0.6275, 0.6824, 0.6902, ..., 0.2745, 0.0157, 0.0000],
...,
[0.0157, 0.0157, 0.0157, ..., 0.7216, 0.2235, 0.0039],
[0.0196, 0.0118, 0.0157, ..., 0.6471, 0.1333, 0.0039],
[0.0196, 0.0314, 0.0196, ..., 0.2863, 0.0235, 0.0039]],
[[0.0196, 0.0275, 0.0353, ..., 0.0275, 0.0039, 0.0157],
[0.1804, 0.3765, 0.4275, ..., 0.1451, 0.0078, 0.0157],
[0.5608, 0.6118, 0.6118, ..., 0.2902, 0.0314, 0.0118],
...,
[0.1176, 0.1137, 0.1137, ..., 0.5882, 0.1882, 0.0118],
[0.1216, 0.1216, 0.1216, ..., 0.5451, 0.1176, 0.0078],
[0.0549, 0.1020, 0.1176, ..., 0.2431, 0.0353, 0.0118]]]), -1)
'''
Datasets
然后查看读取数据的Dataset部分,定位到产生label的部分
class FoodDataset(Dataset):
def __init__(self,path,tfm=test_tfm,files = None):
super(FoodDataset).__init__()
self.path = path
self.files = sorted([os.path.join(path,x) for x in os.listdir(path) if x.endswith(".jpg")])
if files != None:
self.files = files
print(f"One {
path} sample",self.files[0])
self.transform = tfm
def __len__(self):
return len(self.files)
def __getitem__(self,idx):
fname = self.files