此项目出自Kaggle竞赛
项目介绍:
谁是好狗?谁喜欢搔耳朵?好吧,看来那些花哨的深度神经网络并没有解决所有问题。然而,也许它们能回答我们在遇到四条腿的陌生人时普遍会问的问题:这是什么样的好狗狗?
在这个操场竞赛级别中,您将得到ImageNet的一个严格的犬类子集,以便练习细粒度图像分类。你能多好地区分你的诺福克梗和诺威奇梗?
数据集介绍:
你会得到一组训练集和一组狗的图像测试集。每张图片都有一个文件名,这是它唯一的id。该数据集包含120个品种的狗。比赛的目标是创造一个分类器,能够从一张照片决定狗的品种。犬种名单如下:
affenpinscher
afghan_hound
african_hunting_dog
airedale
american_staffordshire_terrier
appenzeller
australian_terrier
basenji
basset
beagle
bedlington_terrier
bernese_mountain_dog
black-and-tan_coonhound
blenheim_spaniel
bloodhound
bluetick
border_collie
border_terrier
borzoi
boston_bull
bouvier_des_flandres
boxer
brabancon_griffon
briard
brittany_spaniel
bull_mastiff
cairn
cardigan
chesapeake_bay_retriever
chihuahua
chow
clumber
cocker_spaniel
collie
curly-coated_retriever
dandie_dinmont
dhole
dingo
doberman
english_foxhound
english_setter
english_springer
entlebucher
eskimo_dog
flat-coated_retriever
french_bulldog
german_shepherd
german_short-haired_pointer
giant_schnauzer
golden_retriever
gordon_setter
great_dane
great_pyrenees
greater_swiss_mountain_dog
groenendael
ibizan_hound
irish_setter
irish_terrier
irish_water_spaniel
irish_wolfhound
italian_greyhound
japanese_spaniel
keeshond
kelpie
kerry_blue_terrier
komondor
kuvasz
labrador_retriever
lakeland_terrier
leonberg
lhasa
malamute
malinois
maltese_dog
mexican_hairless
miniature_pinscher
miniature_poodle
miniature_schnauzer
newfoundland
norfolk_terrier
norwegian_elkhound
norwich_terrier
old_english_sheepdog
otterhound
papillon
pekinese
pembroke
pomeranian
pug
redbone
rhodesian_ridgeback
rottweiler
saint_bernard
saluki
samoyed
schipperke
scotch_terrier
scottish_deerhound
sealyham_terrier
shetland_sheepdog
shih-tzu
siberian_husky
silky_terrier
soft-coated_wheaten_terrier
staffordshire_bullterrier
standard_poodle
standard_schnauzer
sussex_spaniel
tibetan_mastiff
tibetan_terrier
toy_poodle
toy_terrier
vizsla
walker_hound
weimaraner
welsh_springer_spaniel
west_highland_white_terrier
whippet
wire-haired_fox_terrier
yorkshire_terrier
文件结构
如图:
train文件夹下包括训练图片 10222 张
test文件夹下包括了测试图片 10357 张
如图:
labels.csv文件为train文件夹中图片与对应的label
如图:
准备数据集
对于很多新手来说 这一直是一件很麻烦的事情,实在是有太多的方法可以用,所以不知道学哪一种,看多了头晕,我的建议是:
pandas(Kaggle),
lxml(目标检测),
PIL(图片),
opencv(图片 视频) ,
学习以上四种基本上可以解决了
对于这个项目,由于使用的是pytorch框架,我使用了通过继承类torch.nn.utils.Dataset类(必须要重写__getitem__ 和__len__ 方法) , 从而将数据读入,直接放代码
from torch.utils.data import Dataset
from PIL import Image
class MyData(Dataset):
def __init__(self, txt_path, transform=None):
super(MyData, self).__init__()
self.txt_path = txt_path
self.transform = transform
self.imgs = [] # 用于保留图片的路径和标签
with open(txt_path, 'r') as f:
print("正在读入路径下 {0} 文件".format(txt_path))
line = f.readline()
count = 0
while len(line) != 0:
line = line.strip().split()
self.imgs.append((line[0], line[1]))
line = f.readline()
count += 1
print("该文件长度为{0} 读取完毕".format(count))
def __getitem__(self, item):
img_path, label = self.imgs[item][0], self.imgs[item][1]
img = Image.open(img_path).convert("RGB")
if self.transform is not None:
img = self.transform(img)
label = int(label