Deep Learning for Computer Vision with Python

最新推荐文章于 2024-04-13 11:37:13 发布

ywq9696

最新推荐文章于 2024-04-13 11:37:13 发布

阅读量328

点赞数

分类专栏： 1234 文章标签：深度学习计算机视觉 python

原文链接：https://blog.csdn.net/bashendixie5/article/details/121741980

版权

1234 专栏收录该内容

13 篇文章 0 订阅

订阅专栏

本章节详细介绍了如何使用Python和深度学习技术处理Kaggle上的猫狗图像分类竞赛。内容涵盖图像预处理（如均值减法、补丁提取和裁剪）、HDF5数据集生成器的实现，以及通过ResNet50进行特征提取和训练逻辑回归分类器以提高分类准确率。通过这些技术，最终在测试集上实现了98.69%的高准确率。

摘要由CSDN通过智能技术生成

第二卷第九章 Kaggle竞赛：Cat与Dog
在本章中，我们将扩展我们的工作并学习如何为HDF5数据集定义一个图像生成器，适用于使用Keras训练卷积神经网络。该生成器将打开HDF5数据集，为要训练的网络生成批量图像和相关的训练标签，并继续这样做，直到我们的模型达到足够低的损失/高精度。

为了完成这个过程，我们将首先尝试三个旨在提高分类精度的新图像预处理器——均值减法、补丁提取和裁剪。一旦我们定义了新的预处理器集，我们将继续定义实际的HDF5数据集生成器。

我们将实现AlexNet架构。然后将在KaggleDogsvs.Cats挑战中训练AlexNet。给定训练好的模型，我们将评估其在测试集上的性能，然后使用过采样方法进一步提高分类精度。

1、额外的图像处理器
（1）平均减法预处理器，旨在从输入图像（这是数据标准化的一种形式）中减去数据集中的平均红色、绿色和蓝色像素强度。

（2）一个补丁预处理器，用于在训练期间从图像中随机提取M×N个像素区域。

（3）在测试时使用的过采样预处理器对输入图像的五个区域（四个角+中心区域）及其相应的水平翻转（总共10个裁剪）进行采样。使用过采样，我们可以通过我们的CNN传递10个图像，然后对10个预测进行平均，从而提高我们的分类准确度。

（1）均值预处理
在上一章中，我们学习了如何将图像数据集转换为HDF5格式——这种转换的一部分涉及计算整个数据集中所有图像的平均红色、绿色和蓝色像素强度。现在我们有了这些平均值，我们将从我们的输入图像中逐像素地减去这些值，作为数据标准化的一种形式。给定输入图像I及其R、G、B通道，我们可以通过以下方式执行均值减法：

·R=R-μR

·G=G-μG

·B=B-μB

其中μR、μG和μB是在图像数据集转换为HDF5格式时计算的。下图包括从输入图像中减去平均RGB值的可视化，注意减法是如何逐像素完成的。

通过按像素减去R=124.96、G=115.97、B=106.13对输入图像（左）应用均值减法的示例，得到输出图像（右）。平均减法用于减少分类过程中光照变化的影响。
meanpreprocessor.py代码如下：

# import the necessary packages
import cv2

class MeanPreprocessor:
def __init__(self, rMean, gMean, bMean):
# store the Red, Green, and Blue channel averages across a
# training set
self.rMean = rMean
self.gMean = gMean
self.bMean = bMean

def preprocess(self, image):
# split the image into its respective Red, Green, and Blue
# channels
(B, G, R) = cv2.split(image.astype("float32"))

# subtract the means for each channel
R -= self.rMean
G -= self.gMean
B -= self.bMean

# merge the channels back together and return the image
return cv2.merge([B, G, R])
（2）补丁预处理
PatchPreprocessor负责在训练过程中随机采样图像的M×N个区域。当输入图像的空间维度大于CNN的预期时，我们应用补丁预处理——这是一种有助于减少过度拟合的常用技术，因此是一种正则化形式。我们没有在训练期间使用整个图像，而是裁剪其中的随机部分并将其传递给网络（有关裁剪预处理的示例，请参见下图）。

左：我们的原始256×256输入图像。右：从图像中随机裁剪227×227个区域。
应用这种裁剪意味着网络永远不会看到完全相同的图像（除非碰巧），类似于数据增强。正如您从我们的前一章中了解到的，我们构建了Kaggle Dogs vs Cats图像的HDF5数据集，其中每张图像为256×256像素。然而，我们将在本章后面实现的AlexNet架构只能接受大小为227×227像素的图像。

所以应用SimplePreprocessor将我们的每个256×256像素调整为227×227？不，那会很浪费，特别是因为这是在训练期间通过从256×256个图像中随机裁剪227×227个区域来执行数据增强的绝佳机会——事实上，这个过程正是Krizhevsky等人在ImageNet数据集上训练AlexNet的方法。

patchpreprocessor.py代码如下：

# import the necessary packages
from sklearn.feature_extraction.image import extract_patches_2d

class PatchPreprocessor:
def __init__(self, width, height):
# store the target width and height of the image
self.width = width
self.height = height

def preprocess(self, image):
# extract a random crop from the image with the target width
# and height
return extract_patches_2d(image, (self.height, self.width), max_patches=1)[0]
使用scikit-learn库中的extract_patches_2d函数可以轻松提取大小为self.width self.height的随机补丁。给定输入图像，该函数从图像中随机提取一个补丁。这里我们提供max_patches=1，表示我们只需要输入图像中的一个随机补丁。

PatchPreprocessor类看起来并不多，但它实际上是通过应用另一层数据增强来避免过度拟合的非常有效的方法。我们将在训练AlexNet时使用PatchPreprocessor。下一个预处理器CropPreprocessor将在评估我们训练完的网络上使用。

（3）修剪预处理
接下来，我们需要定义一个CropPreprocessor，负责计算过采样的10-crops。在CNN的评估阶段，我们将裁剪输入图像的四个角+中心区域，然后进行相应的水平翻转，例如每个输入图像总共10个样本（下图）。

左：原始256×256输入图像。右：应用10-crop预处理器提取图像的10个227×227个裁剪，包括中心、四个角和它们对应的水平镜像。
这十个样本将通过CNN，然后将概率取平均值。应用这种过采样方法往往会使分类准确度提高1-2%（在某些情况下甚至更高）。

croppreprocessor.py代码如下：

# import the necessary packages
import numpy as np
import cv2

class CropPreprocessor:
def __init__(self, width, height, horiz=True, inter=cv2.INTER_AREA):
# store the target image width, height, whether or not
# horizontal flips should be included, along with the
# interpolation method used when resizing
self.width = width
self.height = height
self.horiz = horiz
self.inter = inter

def preprocess(self, image):
# initialize the list of crops
crops = []

# grab the width and height of the image then use these
# dimensions to define the corners of the image based
(h, w) = image.shape[:2]
coords = [
[0, 0, self.width, self.height],
[w - self.width, 0, w, self.height],
[w - self.width, h - self.height, w, h],
[0, h - self.height, self.width, h]]

# compute the center crop of the image as well
dW = int(0.5 * (w - self.width))
dH = int(0.5 * (h - self.height))
coords.append([dW, dH, w - dW, h - dH])

# loop over the coordinates, extract each of the crops,
# and resize each of them to a fixed size
for (startX, startY, endX, endY) in coords:
crop = image[startY:endY, startX:endX]
crop = cv2.resize(crop, (self.width, self.height), interpolation=self.inter)
crops.append(crop)

# check to see if the horizontal flips should be taken
if self.horiz:
# compute the horizontal mirror flips for each crop
mirrors = [cv2.flip(c, 1) for c in crops]
crops.extend(mirrors)

# return the set of crops
return np.array(crops)
使用MeanPreprocessor进行归一化和使用CropPreprocessor进行过采样，我们将能够获得比其他方式更高的分类准确度。

2、HDF5数据集生成器
在我们实现AlexNet架构并在Kaggle Dogs vs Cats数据集上训练它之前，我们首先需要定义一个类，负责从我们的上一章生成的HDF5数据集中生成一批图像和标签。

以前，我们所有的图像数据集都可以加载到内存中，因此我们可以依靠Keras生成器实用程序来生成我们的批量图像和相应的标签。然而，现在我们的数据集太大而无法放入内存，我们需要自己处理实现这个生成器。

hdf5datasetgenerator.py代码如下：

# import the necessary packages
from keras.utils import np_utils
import numpy as np
import h5py

class HDF5DatasetGenerator:
def __init__(self, dbPath, batchSize, preprocessors=None, aug=None, binarize=True, classes=2):
# store the batch size, preprocessors, and data augmentor,
# whether or not the labels should be binarized, along with
# the total number of classes
self.batchSize = batchSize
self.preprocessors = preprocessors
self.aug = aug
self.binarize = binarize
self.classes = classes

# open the HDF5 database for reading and determine the total
# number of entries in the database
self.db = h5py.File(dbPath)
self.numImages = self.db["labels"].shape[0]

def generator(self, passes=np.inf):
# initialize the epoch count
epochs = 0

# keep looping infinitely -- the model will stop once we have
# reach the desired number of epochs
while epochs < passes:
# loop over the HDF5 dataset
for i in np.arange(0, self.numImages, self.batchSize):
# extract the images and labels from the HDF dataset
images = self.db["images"][i: i + self.batchSize]
labels = self.db["labels"][i: i + self.batchSize]
# check to see if the labels should be binarized
if self.binarize:
labels = np_utils.to_categorical(labels, self.classes)
# check to see if our preprocessors are not None
if self.preprocessors is not None:
# initialize the list of processed images
procImages = []

# loop over the images
for image in images:
# loop over the preprocessors and apply each
# to the image
for p in self.preprocessors:
image = p.preprocess(image)

# update the list of processed images
procImages.append(image)

# update the images array to be the processed
# images
images = np.array(procImages)
# if the data augmenator exists, apply it
if self.aug is not None:
(images, labels) = next(self.aug.flow(images,labels, batch_size=self.batchSize))

# yield a tuple of images and labels
yield (images, labels)

# increment the total number of epochs
epochs += 1

def close(self):
# close the database
self.db.close()
3、实施AlexNet
现在让我们继续实现AlexNet架构。总结AlexNet架构的表格见下表。

AlexNet架构的表格总结。每层都包含输出体积大小，以及相关时的卷积过滤器大小/池大小。
在我们的实现中，我们将在激活后包含批量标准化，这是使用卷积神经网络的大多数图像分类任务的标准。我们还将在每次POOL操作后包含非常少量的dropout，以进一步帮助减少过度拟合。

实现AlexNet是一个相当简单的过程，尤其是当您拥有上表中所示的架构“蓝图”时。每当从出版物中实现架构时，请尝试查看它们是否提供了这样的表，因为它使实现更容易。对于您自己的网络架构，请使用第一卷中有关可视化网络架构的第19章来帮助您确保输入量和输出量大小符合您的预期。

4、在Kaggle Dogs vs Cats上训练AlexNet
现在已经定义了AlexNet架构，让我们将其应用于Kaggle Dogs vs Cats挑战赛。打开一个新文件，命名为train_alexnet.py，插入如下代码：

# import the necessary packages
# set the matplotlib backend so figures can be saved in the background
import matplotlib
matplotlib.use("Agg")

import sys, os
BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
# __file__获取执行文件相对路径，整行为取上一级的上一级目录
sys.path.append(BASE_DIR)

# import the necessary packages
import dogs_vs_cats.dogs_vs_cats_config as config
from customize.tools.imagetoarraypreprocessor import ImageToArrayPreprocessor
from customize.tools.simplepreprocessor import SimplePreprocessor
from customize.tools.patchpreprocessor import PatchPreprocessor
from customize.tools.meanpreprocessor import MeanPreprocessor
from customize.tools.trainingmonitor import TrainingMonitor
from customize.tools.hdf5datasetgenerator import HDF5DatasetGenerator
from models.alexnet import AlexNet
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import Adam
import json

# construct the training image generator for data augmentation
aug = ImageDataGenerator(rotation_range=20, zoom_range=0.15,
width_shift_range=0.2, height_shift_range=0.2, shear_range=0.15, horizontal_flip=True, fill_mode="nearest")

# load the RGB means for the training set
means = json.loads(open(config.DATASET_MEAN).read())

# initialize the image preprocessors
sp = SimplePreprocessor(227, 227)
pp = PatchPreprocessor(227, 227)
mp = MeanPreprocessor(means["R"], means["G"], means["B"])
iap = ImageToArrayPreprocessor()

# initialize the training and validation dataset generators
trainGen = HDF5DatasetGenerator(config.TRAIN_HDF5, 128, aug=aug, preprocessors=[pp, mp, iap], classes=2)
valGen = HDF5DatasetGenerator(config.VAL_HDF5, 128, preprocessors=[sp, mp, iap], classes=2)

# initialize the optimizer
print("[INFO] compiling model...")
opt = Adam(lr=1e-3)
model = AlexNet.build(width=227, height=227, depth=3, classes=2, reg=0.0002)
model.compile(loss="binary_crossentropy", optimizer=opt, metrics=["accuracy"])

# construct the set of callbacks
path = os.path.sep.join([config.OUTPUT_PATH, "{}.png".format(os.getpid())])
callbacks = [TrainingMonitor(path)]

# train the network
model.fit_generator(
trainGen.generator(),
steps_per_epoch=trainGen.numImages // 128,
validation_data=valGen.generator(),
validation_steps=valGen.numImages // 128,
epochs=75,
max_queue_size=128 * 2,
callbacks=callbacks, verbose=1)

# save the model to file
print("[INFO] serializing model...")
model.save(config.MODEL_PATH, overwrite=True)

# close the HDF5 datasets
trainGen.close()
valGen.close()

在 Kaggle Dogs vs. Cats 比赛中训练 AlexNet，我们在验证集上获得了 92:97% 的分类准确率。我们的学习曲线是稳定的

5、评估AlexNet
为了使用我们的标准方法和过采样技术在测试集上评估 AlexNet，让我们创建一个名为crop_accuracy.py 的新文件：

# import the necessary packages
import sys, os
BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
# __file__获取执行文件相对路径，整行为取上一级的上一级目录
sys.path.append(BASE_DIR)

import dogs_vs_cats.dogs_vs_cats_config as config
from customize.tools.imagetoarraypreprocessor import ImageToArrayPreprocessor
from customize.tools.simplepreprocessor import SimplePreprocessor
from customize.tools.meanpreprocessor import MeanPreprocessor
from customize.tools.croppreprocessor import CropPreprocessor
from customize.tools.hdf5datasetgenerator import HDF5DatasetGenerator
from customize.tools.ranked import rank5_accuracy
from keras.models import load_model
import numpy as np
import progressbar
import json

# load the RGB means for the training set
means = json.loads(open(config.DATASET_MEAN).read())

# initialize the image preprocessors
sp = SimplePreprocessor(227, 227)
mp = MeanPreprocessor(means["R"], means["G"], means["B"])
cp = CropPreprocessor(227, 227)
iap = ImageToArrayPreprocessor()

# load the pretrained network
print("[INFO] loading model...")
model = load_model(config.MODEL_PATH)

# initialize the testing dataset generator, then make predictions on
# the testing data
print("[INFO] predicting on test data (no crops)...")
testGen = HDF5DatasetGenerator(config.TEST_HDF5, 64,
preprocessors=[sp, mp, iap], classes=2)
predictions = model.predict_generator(testGen.generator(),
steps=testGen.numImages // 64, max_queue_size=64 * 2)

# compute the rank-1 and rank-5 accuracies
(rank1, _) = rank5_accuracy(predictions, testGen.db["labels"])
print("[INFO] rank-1: {:.2f}%".format(rank1 * 100))
testGen.close()

# re-initialize the testing set generator, this time excluding the
# ‘SimplePreprocessor‘
testGen = HDF5DatasetGenerator(config.TEST_HDF5, 64, preprocessors=[mp], classes=2)
predictions = []

# initialize the progress bar
widgets = ["Evaluating: ", progressbar.Percentage(), " ", progressbar.Bar(), " ", progressbar.ETA()]
pbar = progressbar.ProgressBar(maxval=testGen.numImages // 64, widgets=widgets).start()

# loop over a single pass of the test data
for (i, (images, labels)) in enumerate(testGen.generator(passes=1)):
# loop over each of the individual images
for image in images:
# apply the crop preprocessor to the image to generate 10
# separate crops, then convert them from images to arrays
crops = cp.preprocess(image)
crops = np.array([iap.preprocess(c) for c in crops], dtype="float32")

# make predictions on the crops and then average them
# together to obtain the final prediction
pred = model.predict(crops)
predictions.append(pred.mean(axis=0))

# update the progress bar
pbar.update(i)

# compute the rank-1 accuracy
pbar.finish()
print("[INFO] predicting on test data (with crops)...")
(rank1, _) = rank5_accuracy(predictions, testGen.db["labels"])
print("[INFO] rank-1: {:.2f}%".format(rank1 * 100))
testGen.close()
要在 Kaggle Dog vs Cats 数据集上评估 AlexNet，只需执行上面的脚本：

正如我们的结果所示，我们在测试集上达到了 92.60% 的准确率。然而，通过应用 10-crop 过采样方法，我们能够将分类准确度提高到 94.00%，增加了 1.4%，这一切都是通过对输入图像进行多次裁剪并平均结果来实现的。在评估您的网络时，这个简单的技巧是一种简单的方法，可以让您多几个百分点。

6、进入Kaggle排行榜
如果您查看 Kaggle Dogs vs Cats 排行榜，您会注意到即使要进入前 25 名的位置，我们也需要 96.69% 的准确率，而我们目前的方法无法达到这一点。那么，有什么解决办法呢？

答案是迁移学习，特别是通过特征提取的迁移学习。虽然 ImageNet 数据集包含 1,000 个对象类别，但其中很大一部分包括狗种和猫种。因此，在 ImageNet 上训练的网络不仅可以告诉您图像是狗还是猫，还可以告诉您该动物是什么特定品种。鉴于在 ImageNet 上训练的网络必须能够区分这种细粒度的动物，很自然地假设从预先训练的网络中提取的特征很可能有助于在 Kaggle Dogs vs Cats 排行榜上名列前茅 .

为了验证这个假设，让我们首先从预训练的 ResNet 架构中提取特征，然后在这些特征之上训练逻辑回归分类器。

（1）使用 ResNet 提取特征
我们将在本节中使用的通过特征提取技术进行迁移学习在很大程度上基于第 3 章。为了完整性，我将回顾 extract_features.py 的全部内容；但是，如果您需要进一步了解，请参阅第 3 章使用 CNN 进行特征提取的知识。

首先，打开一个新文件，将其命名为 extract_features.py，然后插入以下代码：

# import the necessary packages
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.applications import imagenet_utils
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.preprocessing.image import load_img
from sklearn.preprocessing import LabelEncoder
from imutils import paths
import numpy as np
import progressbar
import argparse
import random
import os
from hdf5DatasetWriter import HDF5DatasetWriter

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", required=True, help="path to input dataset")
ap.add_argument("-o", "--output", required=True, help="path to output HDF5 file")
ap.add_argument("-b", "--batch-size", type=int, default=32, help="batch size of images to be passed through network")
ap.add_argument("-s", "--buffer-size", type=int, default=1000,
help="size of feature extraction buffer")
args = vars(ap.parse_args())

# store the batch size in a convenience variable
bs = args["batch_size"]

# grab the list of images that we’ll be describing then randomly
# shuffle them to allow for easy training and testing splits via
# array slicing during training time
print("[INFO] loading images...")
imagePaths = list(paths.list_images(args["dataset"]))
random.shuffle(imagePaths)

# extract the class labels from the image paths then encode the
# labels
labels = [p.split(os.path.sep)[-2] for p in imagePaths]
le = LabelEncoder()
labels = le.fit_transform(labels)
# load the VGG16 network
print("[INFO] loading network...")
model = ResNet50(weights="imagenet", include_top=False)

# initialize the HDF5 dataset writer, then store the class label
# names in the dataset
dataset = HDF5DatasetWriter((len(imagePaths), 2048), args["output"], dataKey="features", bufSize=args["buffer_size"])
dataset.storeClassLabels(le.classes_)
# initialize the progress bar
widgets = ["Extracting Features: ", progressbar.Percentage(), " ",
progressbar.Bar(), " ", progressbar.ETA()]
pbar = progressbar.ProgressBar(maxval=len(imagePaths),
widgets=widgets).start()

# loop over the images in patches
for i in np.arange(0, len(imagePaths), bs):
# extract the batch of images and labels, then initialize the
# list of actual images that will be passed through the network
# for feature extraction
batchPaths = imagePaths[i:i + bs]
batchLabels = labels[i:i + bs]
batchImages = []
# loop over the images and labels in the current batch
for (j, imagePath) in enumerate(batchPaths):
# load the input image using the Keras helper utility
# while ensuring the image is resized to 224x224 pixels
image = load_img(imagePath, target_size=(224, 224))
image = img_to_array(image)

# preprocess the image by (1) expanding the dimensions and
# (2) subtracting the mean RGB pixel intensity from the
# ImageNet dataset
image = np.expand_dims(image, axis=0)
image = imagenet_utils.preprocess_input(image)

# add the image to the batch
batchImages.append(image)

# pass the images through the network and use the outputs as
# our actual features
batchImages = np.vstack(batchImages)
features = model.predict(batchImages, batch_size=bs)
# reshape the features so that each image is represented by
# a flattened feature vector of the ‘MaxPooling2D‘ outputs
features = features.reshape((features.shape[0], 2048))

# add the features and labels to our HDF5 dataset
dataset.add(features, batchLabels)
pbar.update(i)

# close the dataset
dataset.close()
pbar.finish()
运行代码：

python extract_features.py --dataset ../datasets/kaggle_dogs_vs_cats/train \ --output ../datasets/kaggle_dogs_vs_cats/hdf5/features.hdf5

命令完成执行后，您现在应该在输出目录中有一个名为 dogs_vs_cats_features.hdf5 的文件。鉴于这些特征，我们可以在它们之上训练逻辑回归分类器（理想情况下）在 Kaggle Dogs vs Cats 排行榜上获得前 5 名。

（2）训练逻辑回归分类器
# import the necessary packages
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score

wap.ihain.cn/thread-205654111-1-1.html
wap.ihain.cn/thread-205654052-1-1.html
wap.ihain.cn/thread-205653997-1-1.html
wap.ihain.cn/thread-205653957-1-1.html
wap.ihain.cn/thread-205653904-1-1.html
import argparse
import pickle
import h5py

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--db", required=True, help="path HDF5 database")
ap.add_argument("-m", "--model", required=True, help="path to output model")
ap.add_argument("-j", "--jobs", type=int, default=-1, help="# of jobs to run when tuning hyperparameters")
args = vars(ap.parse_args())

# open the HDF5 database for reading then determine the index of
# the training and testing split, provided that this data was
# already shuffled *prior* to writing it to disk
db = h5py.File(args["db"], "r")
i = int(db["labels"].shape[0] * 0.75)

# define the set of parameters that we want to tune then start a
# grid search where we evaluate our model for each value of C
print("[INFO] tuning hyperparameters...")
# params = {"C": [0.0001, 0.001, 0.01, 0.1, 1.0]}
params = {"C": [0.001]}
model = GridSearchCV(LogisticRegression(), params, cv=3, n_jobs=args["jobs"])
model.fit(db["features"][:i], db["labels"][:i])
print("[INFO] best hyperparameters: {}".format(model.best_params_))

# generate a classification report for the model
print("[INFO] evaluating...")
preds = model.predict(db["features"][i:])
print(classification_report(db["labels"][i:], preds, target_names=db["label_names"]))

# compute the raw accuracy with extra precision
acc = accuracy_score(db["labels"][i:], preds)
print("[INFO] score: {}".format(acc))

# serialize the model to disk
print("[INFO] saving model...")
f = open(args["model"], "wb")
f.write(pickle.dumps(model.best_estimator_))
f.close()

# close the database
db.close()
要在 ResNet50 特征上训练我们的模型，只需执行以下命令：

从输出中可以看出，我们通过特征提取使用迁移学习的方法产生了令人印象深刻的 98:69% 的准确率。

7、小结
在本章中，我们深入研究了 Kaggle Dogs vs Cats 数据集，并研究了在其上获得 > 90% 分类准确率的方法：

1. 从头开始训练 AlexNet。

2. 通过 ResNet 应用迁移学习。

使用我们的 AlexNet 实现，我们达到了 94% 的分类准确率。这是一个非常可观的准确性，尤其是对于从头开始训练的网络。可以通过以下方式获得进一步的准确性：

1.获取更多的训练数据。

2. 应用更积极的数据增强。

3.深化网络。

然而，我们获得的 94% 甚至不足以让我们进入前 25 名排行榜，更不用说前 5 名了。因此，为了获得我们的前 5 名位置，我们依赖于通过特征提取的迁移学习，特别是在 ImageNet 数据集上训练的 ResNet50 架构。由于 ImageNet 包含许多狗和猫品种的示例，因此将预训练网络应用于此任务是一种自然而简单的方法，可确保我们以更少的努力获得更高的准确性。正如我们的结果所表明的那样，我们能够获得 98.69% 的分类准确率，足以在 Kaggle Dogs vs. Cats 排行榜上获得较高的排名。