Kaggle入门(二)——Dogs vs. Cats

0 前言

比赛网址:

https://www.kaggle.com/c/dogs-vs-cats-redux-kernels-edition

参考解法:

https://www.kaggle.com/jeffd23/catdognet-keras-convnet-starter

https://www.kaggle.com/sentdex/full-classification-example-with-convnet

1 简介

  • 卷机网络模型——对VGG16修改
  • 基于Keras
  • test loss: 0.23535
  • Titan V,73epoch,训练时间1h+

导包:

import cv2                 # working with, mainly resizing, images
import numpy as np         # dealing with arrays
import os                  # dealing with directories
from random import shuffle # mixing up or currently ordered data that might lead our network astray in training.
from tqdm import tqdm      # a nice pretty percentage bar for tasks. Thanks to viewer Daniel Bühler for this suggestion
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Input, Dropout, Flatten, Conv2D, MaxPooling2D, Dense, Activation
from keras.optimizers import RMSprop
from keras.callbacks import ModelCheckpoint, Callback, EarlyStopping, ReduceLROnPlateau
from keras.utils import np_utils
from keras import backend

%matplotlib inline

2 数据准备

2.1 导入数据

TRAIN_DIR = './data/train'
TEST_DIR = './data/test'
IMG_SIZE = 128

MODEL_NAME = 'dogsvscats-{}-{}.model'.format(LR, '2conv-basic')
  • 将标签变成one-hot编码(直接用0,1做标签,会出问题,在测试集上表现奇差,换成One-hot之后解决,未研究清楚为什么)
# one-hot 编码
def label_img(img):
    word_label = img.split('.')[-3]
    if word_label == 'cat': 
        return [1,0]
    elif word_label == 'dog': 
        return [0,1]
  • 导入训练和测试数据
    • 将图片转为灰度图
    • 图片尺寸改为IMG_SIZE * IMG_SIZE
    • 将处理后的图片保存为.npy格式,方便下次读取
    • 使用tqdm库,可以将处理过程用进度条表示出来 awesome ?
# 处理训练数据
def create_train_data():
    training_data = []
    for img in tqdm(os.listdir(TRAIN_DIR)):
        label = label_img(img)
        path = os.path.join(TRAIN_DIR,img)
        img = cv2.imread(path,cv2.IMREAD_GRAYSCALE)
        img = cv2.resize(img, (IMG_SIZE,IMG_SIZE))
        training_data.append([np.array(img), label])
    shuffle(training_data)
    np.save('train_data.npy', training_data)
    return training_data

# 处理测试数据
def process_test_data():
    testing_data = []
    for img in tqdm(os.listdir(TEST_DIR)):
        path = os.path.join(TEST_DIR,img)
        img_num = img.split('.')[0]
        img = cv2.imread(path,cv2.IMREAD_GRAYSCALE)
        img = cv2.resize(img, (IMG_SIZE,IMG_SIZE
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值