使用keras进行深度学习_如何在Keras中通过深度学习对蝴蝶进行分类

最新推荐文章于 2024-01-21 16:55:28 发布

cumi6497

最新推荐文章于 2024-01-21 16:55:28 发布

阅读量2.3k

点赞数

文章标签：神经网络 python 计算机视觉机器学习人工智能

原文链接：https://www.freecodecamp.org/news/classify-butterfly-images-deep-learning-keras/

版权

本文介绍了如何利用Keras进行深度学习，特别是针对蝴蝶分类任务。通过Flickr API下载蝴蝶图片，使用数据扩充避免过拟合，构建包含卷积层、池化层和全连接层的模型。实验结果显示，尽管模型在训练和验证集上仍有过拟合现象，但深度学习模型已经能够帮助区分蝴蝶物种。

摘要由CSDN通过智能技术生成

使用keras进行深度学习

A while ago I read an interesting blog post on the website of the Dutch organization Vlinderstichting. Every year they organize a count of butterflies. Volunteers help in determining the different butterfly species in their garden. The Vlinderstichting gathers and analyses the results.

不久前，我在荷兰Vlinderstichting组织的网站上读了一篇有趣的博客文章。每年他们都会组织蝴蝶。志愿者帮助确定花园中不同的蝴蝶种类。 Vlinderstichting收集并分析结果。

As the determination of the butterfly species is done by the volunteers, inevitably this process is prone to errors. As a result, the Vlinderstichting has to manually check the submissions, which is time-consuming.

由于志愿者确定蝴蝶的种类，这一过程不可避免地容易出错。结果，Vlinderstichting有手动检查提交内容，这很耗时。

Specifically, there are three butterflies for which the Vlinderstichting receives many wrong determinations. These are

具体来说，Vlinderstichting收到了三只蝴蝶，因此收到许多错误的决定。这些是

Meadow brown or Maniola jurtina
草甸棕或马尼拉枣
Gatekeeper or Pyronia tithonus
关守或Pyronia tithonus
Small heath or Coenonympha pamphilus
小荒地或Coenonympha pamphilus

In this article, I will describe the steps to fit a deep learning model that helps to make the distinction between the first two butterflies.

在本文中，我将描述适合深度学习模型的步骤，该模型有助于区分前两只蝴蝶。

使用Flickr API下载图像 (Downloading images with the Flickr API)

To train a convolutional neural network I need to find images of butterflies with the correct label. Surely I could take pictures myself of the butterflies that I want to classify. They sometimes fly around in my garden…

为了训练卷积神经网络，我需要找到带有正确标签的蝴蝶图像。当然，我可以自己为要分类的蝴蝶拍照。他们有时在我的花园里飞来飞去……

Just kidding, that would take ages. For this, I need an automated way to get the images. To do that I use the Flickr API via Python.

只是开玩笑，这将需要很长时间。为此，我需要一种自动方式来获取图像。为此，我通过Python使用Flickr API。

设置Flickr API (Setting up the Flickr API)

Firstly, I install the flickrapi package with pip. Then I create the necessary API keys on the Flickr website to connect to the Flickr API.

首先，我用点子安装flickrapi软件包。然后，我在Flickr网站上创建必要的API密钥以连接到Flickr API。

Besides the flickrapi package, I import the os and urllib packages for downloading the images and setting up the directories.

除了flickrapi软件包之外，我还导入了os和urllib软件包，用于下载图像和设置目录。

from flickrapi import FlickrAPI
import urllib
import os
import config

In the config module, I define the public and secret keys for the Flickr API. So this is simply a Python script (config.py) with the code below:

在配置模块中，我为Flickr API定义了公共密钥和秘密密钥。因此，这只是一个带有以下代码的Python脚本(config.py)：

API_KEY = 'XXXXXXXXXXXXXXXXX'  // replace with your key
API_SECRET = 'XXXXXXXXXXXXXXXXX'  // replace with your secret
IMG_FOLDER = 'XXXXXXXXXXXXXXXXX'  // replace with your folder to store the images

I keep these keys in a separate file for security reasons. As a result, you can save the code in a public repository like GitHub or BitBucket and putting the config.py in .gitignore. Consequently, you can share your code with others while not having to worry about someone having access to your credentials.

出于安全原因，我将这些密钥保存在单独的文件中。结果，您可以将代码保存在GitHub或BitBucket等公共存储库中，并将config.py放在.gitignore中。因此，您可以与其他人共享代码，而不必担心有人可以访问您的凭据。

To extract images of different butterfly species, I wrote a function download_flickr_photos. I will explain this function step by step. In addition, I’ve made the full code available on GitHub.

为了提取不同种类的蝴蝶的图像，我编写了一个功能download_flickr_photos。我将逐步解释此功能。另外，我已经在GitHub上提供了完整的代码。

输入参数 (Input parameters)

First of all, I check if the input parameters are of the correct type or values. If not, I raise an error. The explanation of the parameters can be found in the docstring of the function.

首先，我检查输入参数的类型或值是否正确。如果没有，我提出一个错误。参数的说明可以在函数的文档字符串中找到。

if not (isinstance(keywords, str) or isinstance(keywords, list)):
    raise AttributeError('keywords must be a string or a list of strings')
if not (size in ['thumbnail', 'square', 'medium', 'original']):
    raise AttributeError('size must be "thumbnail", "square", "medium" or "original"')
if not (max_nb_img == -1 or (max_nb_img > 0 and isinstance(max_nb_img, int))):
    raise AttributeError('max_nb_img must be an integer greater than zero or equal to -1')

Secondly, I define some of the parameters that will be used in the walk method later on. I create a list for the keywords and determine from which URL the images need to be downloaded.

其次，我定义了稍后将在walk方法中使用的一些参数。我为关键字创建一个列表，并确定需要从哪个URL下载图像。

if isinstance(keywords, str):
    keywords_list = []
    keywords_list.append(keywords)
else:
    keywords_list = keywords
if size == 'thumbnail':
    size_url = 'url_t'
elif size == 'square':
    size_url = 'url_q'
elif size == 'medium':
    size_url = 'url_c'
elif size == 'original':
    size_url = 'url_o'

连接到Flickr API (Connecting to the Flickr API)

Next, I connect to the Flickr API. In the FlickrAPI call I use the API keys defined in the config module.

接下来，我连接到Flickr API。在FlickrAPI调用中，我使用config模块中定义的API密钥。

flickr = FlickrAPI(config.API_KEY, config.API_SECRET)

为每个蝴蝶物种创建子文件夹 (Creating subfolders per butterfly species)

I save the images of each butterfly species in a separate subfolder. The name of each subfolder is the butterfly species’ name, given by the keyword. If the subfolder does not exist yet, I create it.

我将每种蝴蝶种类的图像保存在单独的子文件夹中。每个子文件夹的名称是蝴蝶种类的名称，由关键字给出。如果子文件夹尚不存在，请创建它。

results_folder = config.IMG_FOLDER + keyword.replace(" ", "_") + "/"
if not os.path.exists(results_folder):
    os.makedirs(results_folder)

在Flickr库中走动 (Walking around in the Flickr library)

photos = flickr.walk(
    text=keyword,
    extras='url_m',
    license='1,2,4,5',
    per_page=50)

I use the walk method of the Flickr API to search for images for the specified keyword. This walk method has the same parameters as the search method in the Flickr API.

我使用Flickr API的walk方法搜索指定关键字的图像。此walk方法具有与Flickr API中的搜索方法相同的参数。

In the text parameter, I use the keyword to search for images related to this keyword. Secondly, in the extras parameter, I specify url_m for a small, medium size of the images. More explanation on the image sizes and their respective URL is given in this Flickcurl C library.

在text参数中，我使用关键字搜索与此关键字相关的图像。其次，在extras参数中，我为中等大小的图像指定url_m。在Flickcurl C库中提供了有关图像大小及其相应URL的更多说明。

Thirdly, in the license parameter, I select images with a non-commercial license. More on the license codes and their meaning can be found on the Flickr API platform. Finally, the per_page parameter specifies how many images I allow per page.

第三，在license参数中，我选择具有非商业许可证的图像。有关许可证代码及其含义的更多信息，请参见Flickr API平台。最后，per_page参数指定每页允许多少张图像。

As a result, I have a generator called photos to download the images.

结果，我有了一个名为photos的生成器来下载图像。

下载Flickr图像 (Downloading Flickr images)

With the photos generator, I can download all the images found for the search query. First I get the specific URL at which I will download the image. Then I increment the count variable and use this counter to create the image filenames.

借助照片生成器，我可以下载为搜索查询找到的所有图像。首先，我将获得下载图像的特定URL。然后，我增加count变量，并使用此计数器创建图像文件名。

With the urlretrieve method, I download the image and save it in the folder for the butterfly species. If an error occurs I print out the error message.

使用urlretrieve方法，我下载了图像并将其保存在蝴蝶种类的文件夹中。如果发生错误，我将打印出错误消息。

for photo in photos:
    try:
        url=photo.get('url_m')
        print(url)
        count += 1
        urllib.request.urlretrieve(url,  results_folder + str(count) +".jpg")
    except Exception as e:
        print(e, 'Download failure')

To download multiple butterfly species, I create a list and call the function download_flickr_photos in a for loop. For simplicity, I only download two butterfly species of the three mentioned above.

要下载多个蝴蝶种类，我创建了一个列表，并在for循环中调用了功能download_flickr_photos。为简单起见，我仅下载上述三种蝴蝶中的两种。

butterflies = ['meadow brown butterfly', 'gatekeeper butterfly']
for butterfly in butterflies:
    download_flickr_photos(butterfly)

图像数据扩充 (Data augmentation of images)

Training a convnet on a small number of images will result in overfitting. Consequently, the model will make errors in classifying new, unseen images. Data augmentation can help to avoid this. Luckily Keras has some nice tools to transform images easily.

在少量图像上训练卷积网络会导致过拟合。因此，该模型在分类新的看不见的图像时会出错。数据扩充可以帮助避免这种情况。幸运的是Keras有一些不错的工具可以轻松地转换图像。

I’d like to compare it with how my son classifies cars on the road. At the moment he’s only 2 years old and hasn’t seen as many cars as an adult. So you could say his training set of images is rather small. Therefore he’s more likely to misclassify cars. For instance, he sometimes takes an ambulance mistakenly for a police van.

我想将其与我儿子对道路上的汽车进行分类的方式进行比较。目前，他只有2岁，还没有成年人看过那么多汽车。因此，您可以说他的训练图像集很小。因此，他更有可能对汽车进行错误分类。例如，他有时会误以救护车为警车。

As he will grow older, he will see more ambulances and police vans, with the corresponding label that I will give him. So his training set will become larger and thus he will classify them more correctly.

随着年龄的增长，他会看到更多的救护车和警车，并带有我会给他的相应标签。因此，他的训练集将变得更大，因此他将对其进行更正确的分类。

For that reason, we need to provide the convnet with more butterfly images than we have at the moment. An easy solution for that is data augmentation. In short, this means applying a set of transformations to the Flickr images.

因此，我们需要为convnet提供比目前更多的蝴蝶图像。一个简单的解决方案是数据扩充 。简而言之，这意味着对Flickr图像应用一组转换。

Keras provides a wide range of image transformations. But first, we’ll have to convert the images so that Keras can work with them.

Keras提供了广泛的图像转换。但是首先，我们必须转换图像，以便Keras可以使用它们。

将图像转换为数字 (Converting an image to numbers)

We start by importing the Keras module. We will demonstrate the image transformations with one example image. For that purpose, we use the load_img method.

我们首先导入Keras模块。我们将用一个示例图像演示图像转换。为此，我们使用load_img方法。

from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
i = load_img('data/train/maniola_jurtina/1.jpg' )
x = img_to_array(i)
x = x.reshape((1,) + x.shape)

The load_img method creates a Python Image Library file. We’ll need to convert this to a Numpy array to use it in the ImageDataGenerator method later on. That’s done with the handy img_to_array method. As a result, we have an array of shape 75x75x3. These dimensions reflect the width, height and RGB values.

load_img方法创建一个Python图像库文件。我们需要将其转换为Numpy数组，以便稍后在ImageDataGenerator方法中使用它。这是通过方便的img_to_array方法完成的。结果，我们得到了一个形状为75x75x3的数组。这些尺寸反映了宽度，高度和RGB值。

In fact, each pixel of the image has 3 RGB values. These range between 0 and 255 and represent the intensity of Red, Green and Blue. A lower value stands for higher intensity and a higher value for lower intensity. For instance, one pixel can be represented as a list of these three values [ 78, 136, 60]. Black would represented as [0, 0, 0].

实际上，图像的每个像素都有3个RGB值。这些范围介于0到255之间，代表红色，绿色和蓝色的强度。较低的值代表较高的强度，较高的值代表较低的强度。例如，一个像素可以表示为这三个值的列表[78、136、60]。黑色将表示为[0，0，0]。

Finally, we need to add an extra dimension to avoid a ValueError when applying the transformations. This is done with the reshape function.

最后，我们需要添加一个额外的维度以避免在应用转换时出现ValueError。这是通过重塑功能完成的。

Alright, now we have something to work with. Let’s continue with the transformations.

好了，现在我们可以合作了。让我们继续进行转换。

回转 (Rotation)

By specifying a value between 0 and 180, Keras will randomly choose an angle to rotate the image. It will do this clockwise or counter-clockwise. In our example, the image will be rotated with maximum of 90 degrees.

通过指定介于0到180之间的值，Keras将随机选择一个角度来旋转图像。它将按顺时针或逆时针方向执行此操作。在我们的示例中，图像将最大旋转90度。

ImageDataGenerator also has a parameter fill_mode. The default value is ‘nearest’. By rotating the image within the width and height of the original image we end up with “empty” pixels. The fill_mode then uses the nearest pixels to fill this empty space.

ImageDataGenerator也具有参数fill_mode。默认值为“最近”。通过在原始图像的宽度和高度内旋转图像，我们最终得到“空”像素。然后fill_mode使用最近的像素填充此空白空间。

imgGen = ImageDataGenerator(rotation_range = 90)
i = 1
for batch in imgGen.flow(x, batch_size=1, save_to_dir='example_transformations', save_format='jpeg', save_prefix='trsf'):
    i += 1
    if i &gt; 3:
        break

In the flow method, we specify where to save the transformed images. Make sure this directory exists! We also prefix the newly created images for convenience. The flow method would run infinitely, but for this example, we only generate three images. So when our counter reaches this value, we break the for loop. You can see the result below.

在flow方法中，我们指定将转换后的图像保存在何处。确保该目录存在！为了方便起见，我们还为新创建的图像添加了前缀。 flow方法将无限运行，但是在此示例中，我们仅生成三个图像。因此，当我们的计数器达到该值时，我们将中断for循环。您可以在下面看到结果。

宽度偏移 (Width shift)

In the width_shift_range parameter, you specify the ratio of the original width by which the image can be shifted to the left or right. Again, the fill_mode will fill up the newly created empty pixels. For the remaining examples, I will only show how to instantiate the ImageDataGenerator with the respective parameter. The code to generate the images is the same as in the rotation example.

在width_shift_range参数中，指定图像可向左或向右移动的原始宽度的比率。同样，fill_mode将填充新创建的空白像素。对于其余示例，我将仅展示如何使用各自的参数实例化ImageDataGenerator。生成图像的代码与旋转示例中的代码相同。

imgGen = ImageDataGenerator(width_shift_range = 90)

In the transformed images we see that the image is shifted to the right. The empty pixels are filled which gives it a bit of a stretched look.

在变换后的图像中，我们看到图像向右移动。空像素被填充，使其看起来有些拉长。

The same can be done for shifting up or down by specifying a value for the height_shift_range parameter.

通过为height_shift_range参数指定一个值，可以对上移或下移进行相同的操作。

重新缩放 (Rescale)

Rescaling an image will multiply the RGB values of each pixel by a chosen value before any other preprocessing. In our example, we apply min-max scaling to the values. As a result, these values will range between 0 and 1. This makes the values smaller and easier for the model to process.

重新缩放图像将在进行任何其他预处理之前将每个像素的RGB值乘以所选值。在我们的示例中，我们对值应用了最小-最大缩放。结果，这些值的范围在0到1之间。这使得这些值更小并且更易于模型处理。

imgGen = ImageDataGenerator(rescale = 1./255)

剪力 (Shear)

With the shear_range parameter, we can specify how the shearing transformations must be applied. This transformation can produce rather weird images when the value is set too high. So don’t set it too high.

使用shear_range参数，我们可以指定必须如何应用剪切变换。当该值设置得太高时，这种转换会产生怪异的图像。所以不要设置得太高。

imgGen = ImageDataGenerator(shear_range = 0.2)

放大 (Zoom)

This transformation will zoom inside the picture. Just like the shearing parameter, this value should not be exaggerated to keep the images realistic.

此变换将放大图片内部。就像剪切参数一样，不应夸大此值以保持图像逼真。

imgGen = ImageDataGenerator(zoom_range = 0.2)

水平翻转 (Horizontal flip)

This transformation flips an image horizontally. Life can be simple sometimes…

此变换水平翻转图像。生活有时可能很简单……

imgGen = ImageDataGenerator(horizontal_flip = True)

所有转换合并 (All transformations combined)

Now that we have seen the effect of each transformation separately, we apply all the combinations together.

现在我们已经分别看到了每个转换的效果，我们将所有组合一起应用。

imgGen = ImageDataGenerator(
    rotation_range = 40,
    width_shift_range = 0.2,
    height_shift_range = 0.2,
    rescale = 1./255,
    shear_range = 0.2,
    zoom_range = 0.2,
    horizontal_flip = True)
i = 1
for batch in imgGen.flow(x, batch_size=1, save_to_dir='example_transformations', save_format='jpeg', save_prefix='all'):
    i += 1
    if i &gt; 3:
        break

设置文件夹结构 (Setting up the folder structure)

We need to store these images in a specific folder structure. As such we can use the method flow_from_directory to augment the images and create the corresponding labels. This folder structure needs to look like this:

我们需要将这些图像存储在特定的文件夹结构中。这样，我们可以使用flow_from_directory方法来扩大图像并创建相应的标签。此文件夹结构应如下所示：

train
培养
maniola_jurtina
maniola_jurtina
0.jpg
0.jpg
1.jpg
1.jpg
…
…
pyronia_tithonus
pyronia_tithonus
0.jpg
0.jpg
1.jpg
1.jpg
…
…
validation
验证
maniola_jurtina
maniola_jurtina
0.jpg
0.jpg
1.jpg
1.jpg
…
…
pyronia_tithonus
pyronia_tithonus
0.jpg
0.jpg
1.jpg
1.jpg
…
…

To create this folder structure I created a gist img_train_test_split.py. Feel free to use it in your projects.

为了创建此文件夹结构，我创建了一个主旨img_train_test_split.py 。随时在您的项目中使用它。

创建发电机 (Creating the generators)

Just as before, we specify the configuration parameters for the training generator. The validation images will not be transformed as the training images. We only divide the RGB values to make them smaller.

和以前一样，我们为训练生成器指定配置参数。验证图像将不会转换为训练图像。我们仅将RGB值除以使其更小。

The flow_from_directory method takes the images from the train or validation folder and generates batches of 32 transformed images. By setting the class_mode to ‘binary’ a one-dimensional label is created based on the image’s folder name.

flow_from_directory方法从train或validation文件夹中获取图像，并生成32个转换图像的批次。通过将class_mode设置为“ binary”，将基于图像的文件夹名称创建一维标签。

train_datagen = ImageDataGenerator(
    rotation_range = 40,
    width_shift_range = 0.2,
    height_shift_range = 0.2,
    rescale = 1./255,
    shear_range = 0.2,
    zoom_range = 0.2,
    horizontal_flip = True)
validation_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
    'data/train',
    batch_size=32,
    class_mode='binary')
validation_generator = validation_datagen.flow_from_directory(
    'data/validation',
    batch_size=32,
    class_mode='binary')

那么不同的图像尺寸呢？ (What about different image sizes?)

The Flickr API lets you download images of specific sizes. However, in real-world applications the image sizes are not always constant. If the aspect ratio of the images is the same, we can simply resize the images. Otherwise, we can crop the images. Unfortunately, it is difficult to crop the image while keeping the object we want to classify intact.

Flickr API使您可以下载特定大小的图像。但是，在实际应用中，图像大小并不总是恒定的。如果图像的宽高比相同，我们可以简单地调整图像的大小。否则，我们可以裁剪图像。不幸的是，很难在保留我们要分类的对象的同时裁剪图像。

Keras can deal with different image sizes. When configuring the model you can specify None for the width and height in input_shape.

Keras可以处理不同的图像尺寸。配置模型时，可以在input_shape中为宽度和高度指定None。

input_shape=(3, None, None)  # Theano
input_shape=(None, None, 3)  # Tensorflow

I wanted to show that it is possible to work with different image sizes, however, it has some drawbacks.

我想表明可以使用不同的图像尺寸，但是，它具有一些弊端。

not all layers (e.g. Flatten) will work with None as an input dimension
并非所有图层(例如，展平)都可以使用“无”作为输入尺寸
it can be computationally heavy to run
运行起来可能很繁琐

建立深度学习模型 (Building the deep learning model)

For the remainder of this article, I will discuss the structure of a convolutional neural network, illustrated with some examples for our butterfly project. At the end of this article, we’ll have our first classification results.

在本文的其余部分，我将讨论卷积神经网络的结构，并为我们的蝴蝶项目提供一些示例。在本文的结尾，我们将获得第一个分类结果。

卷积神经网络由哪些层组成？ (What layers does a convolutional neural network consist of?)

Of course, you can choose how many layers and their type to add to your convolutional neural network (also called CNN or convnet). In this project we will start with the following structure:

当然，您可以选择要添加到卷积神经网络(也称为CNN或convnet)的层数和类型。在这个项目中，我们将从以下结构开始：

Let’s understand what each layer does and how we create them with Keras.

让我们了解每个图层的作用以及如何使用Keras创建它们。

输入层 (Input layer)

These different versions of the images were modified via several transformations. Then, these images are converted into a numerical representation or a matrix.

图像的这些不同版本通过几次转换进行了修改。然后，将这些图像转换为数字表示形式或矩阵。

The dimensions of this matrix will be width x height x number of (color) channels. For RGB images the number of channels will be three. For grayscale images, this is equal to one. Below you can see a numerical representation of a 7×7 RGB image.

该矩阵的尺寸将是宽度x高度x(颜色)通道数。对于RGB图像，通道数为3。对于灰度图像，该值等于1。在下面，您可以看到7×7 RGB图像的数字表示。

As our images are of size 75×75, we need to specify that in the input_shape parameter when adding the first convolutional layer.

由于我们的图像尺寸为75×75，因此在添加第一个卷积层时需要在input_shape参数中指定该图像。

cnn = Sequential()
cnn.add(Conv2D(32,(3,3), input_shape = (3 ,75 ,75)))

卷积层 (Convolutional layer)

In the first layers, the convolutional neural network will look for lower-level features, like horizontal or vertical edges. The further we go in the network it will look for higher-level features, such as a wing of a butterfly, for example. But how does it detect features when it gets only numbers as input? That’s where filters come in.

在第一层中，卷积神经网络将寻找较低级别的特征，例如水平或垂直边缘。我们在网络中走得越远，它将寻找更高层次的功能，例如蝴蝶的翅膀。但是，当仅获取数字作为输入时，如何检测特征？那就是过滤器进入的地方。

过滤器(或内核) (Filters (or kernels))

You can think of a filter as a searchlight of a specific size that scans over the image. The filter example below has dimensions of 3x3x3 and contains weights that will detect a vertical edge. For a grayscale image, the dimensions would have been 3x3x1. Usually, a filter has smaller dimensions than the image we want to classify. 3×3, 5×5 or 7×7 are typically used. The third dimension should always be equal to the number of channels.

您可以将滤镜看作是扫描图像的特定尺寸的探照灯。以下过滤器示例的尺寸为3x3x3，并包含将检测垂直边缘的权重。对于灰度图像，尺寸应为3x3x1。通常，滤镜的尺寸小于我们要分类的图像。通常使用3×3、5×5或7×7。第三维应始终等于通道数。

While scanning the image, the RGB values are transformed. It does this transformation by multiplying the RGB values with the filter’s weights. Finally, the multiplied values are then summed over all channels. In our 7x7x3 image example and the 3x3x3 filter, this would result in a 5x5x1 outcome.

扫描图像时，将转换RGB值。它通过将RGB值乘以滤镜的权重来完成此转换。最后，然后将所有通道上的相乘值相加。在我们的7x7x3图像示例和3x3x3滤镜中，这将导致5x5x1的结果。

The animation below illustrates this convolutional operation. For simplicity, we only look for a vertical edge in the Red channel. Thus, the weights for the Green and Blue channels are all equal to zero. But you should keep in mind that the multiplication results for these channels are added to the result of the Red channel.

下面的动画说明了这种卷积运算。为简单起见，我们仅在红色通道中寻找垂直边缘。因此，绿色和蓝色通道的权重都等于零。但是，请记住，这些通道的乘法结果会添加到红色通道的结果中。

As shown below the convolutional layer will produce numerical outcomes. When you have higher numbers, this means that the filter came across the feature it was looking for. In our example, a vertical edge.

如下所示，卷积层将产生数值结果。当您拥有更高的数字时，这意味着过滤器会遇到它要寻找的功能。在我们的示例中，垂直边缘。

We can specify that we want more than one filter. These filters could have their own feature to look for in an image. Suppose we use 32 filters of size 3x3x3. The result of all filters is stacked and we end up with a 5x5x32 volume in our example. In the code snippet above we added 32 filters of size 3x3x3.

我们可以指定我们想要多个过滤器。这些过滤器可以具有自己的功能以在图像中查找。假设我们使用32个大小为3x3x3的过滤器。所有过滤器的结果都叠加在一起，在本示例中，我们最终得到了5x5x32的体积。在上面的代码段中，我们添加了32个大小为3x3x3的过滤器。

大步走 (Stride)

In the example above we saw that the filter moves up one pixel at a time. This is the so-called stride. We could increase the number of pixels the filter moves up. Increasing the stride will reduce the dimensions of the original image much faster. In the example below, you see how the filter moves around with a stride of 2, which would result in a 3x3x1 outcome for a 3x3x3 filter and a 7x7x3 image.

在上面的示例中，我们看到滤镜每次向上移动一个像素。这就是所谓的大步前进。我们可以增加滤镜向上移动的像素数。增加步幅会更快地减小原始图像的尺寸。在下面的示例中，您将看到滤镜如何以2的步幅移动，这将导致3x3x3滤镜和7x7x3图像获得3x3x1的结果。

填充 (Padding)

By applying a filter, the dimensions of the original image are quickly reduced. Especially the pixels at the edges of the image are only used once in the convolutional operation. This results in a loss of information. If you want to avoid that, you can specify padding. Padding adds “extra pixels” around the image.

通过应用滤镜，可以快速缩小原始图像的尺寸。尤其是图像边缘的像素在卷积运算中仅使用一次。这导致信息丢失。如果要避免这种情况，可以指定填充。填充会在图像周围添加“额外像素”。

Suppose we add padding of one pixel around the 7x7x3 image. This results in a 9x9x3 image. If we apply a 3x3x3 filter and a stride of 1, we end up with a 7x7x1 outcome. So, in that case, we preserve the dimensions of the original image and the outer pixels are used more than once.

假设我们在7x7x3图像周围添加一个像素的填充。这将产生9x9x3的图像。如果我们应用3x3x3过滤器且跨度为1，则最终结果为7x7x1。因此，在这种情况下，我们保留了原始图像的尺寸，并且不止一次使用外部像素。

You can calculate the resulting outcome of the convolutional operation with specific padding and stride as follows:

您可以使用特定的填充和跨度来计算卷积运算的结果，如下所示：

1 + [(original dimension + padding x 2 — filter dimension) / stride size]

1 + [(原始尺寸+填充x 2-过滤器尺寸)/步幅尺寸]

For example, suppose we have this set-up of our conv layer:

例如，假设我们具有转换层的以下设置：

7x7x3 image
7x7x3图片
3x3x3 filter
3x3x3滤镜
padding of 1 pixel
1像素的填充
stride of 2 pixels
跨度为2像素

That will give 1 + [(7 + 1 x 2–3) / 2] = 4

那将得到1 + [(7 + 1 x 2–3)/ 2] = 4

为什么我们需要卷积层？ (Why do we need convolutional layers?)

A benefit of using conv layers is that the number of parameters to estimate is much lower. Much lower compared to having a normal hidden layer. Suppose we continue with our example image of 7x7x3 and a filter of 3x3x3 with no padding and stride of 1. The convolutional layer would have 5x5x1 + 1 bias = 26 weights to estimate. In a neural network with 7x7x3 inputs and 5x5x1 neurons in the hidden layer, we would need to estimate 3.675 weights. Imagine what this number is when you have larger images…

使用conv层的好处是要估计的参数数量要少得多。与具有常规隐藏层相比要低得多。假设我们继续使用示例图像7x7x3和3x3x3的滤镜，并且没有填充和跨度为1。卷积层将具有5x5x1 +1偏差= 26权重进行估计。在隐藏层中具有7x7x3输入和5x5x1神经元的神经网络中，我们需要估计3.675权重。想象一下，当您有更大的图像时，这个数字是多少...

ReLu层 (ReLu layer)

Or Rectified Linear unit layer. This layer adds nonlinearity to the network. The convolutional layer is a linear layer as it sums up the multiplications of the filter weights and RGB values.

或整流线性单位层。该层为网络增加了非线性。卷积层是线性层，因为它将滤镜权重和RGB值的乘积相加。

The outcome of a ReLu function is equal to zero for all values of x <= 0. Otherwise, it is equal to the value of x. The code in Keras to add a ReLu layer is:

对于所有x <= 0的值，ReLu函数的结果均等于零。否则，其结果等于x的值。 Keras中添加ReLu图层的代码是：

cnn.add(Activation(‘relu’))

汇集 (Pooling)

Pooling aggregates the input volume in order to reduce the dimensions further. This speeds up computation time as the number of parameters to be estimated are reduced. Besides that, it helps to avoid overfitting by making the network more robust. Below we illustrate max pooling with a size of 2×2 and stride of 2.

合并汇总输入量，以进一步减小尺寸。由于要估计的参数数量减少，因此可以加快计算时间。除此之外，它还可以通过使网络更加健壮来避免过度拟合。下面我们举例说明最大池化，其大小为2×2，步幅为2。

The code in Keras to add pooling with a size of 2×2 is:

Keras中添加2×2大小的池的代码是：

cnn.add(MaxPooling2D(pool_size = (2 ,2)))

全连接层 (Fully connected layer)

At the end, the convnet is able to detect higher level features in the input images. This can then serve as an input for a fully connected layer. Before we can do that, we will flatten the output of the last ReLu layer. Flattening means we convert it to a vector. The vector values are then connected to all neurons in the fully connected layer. To do that in Python we use the following Keras functions:

最后，卷积网络能够检测输入图像中的高级功能。然后可以将其用作完全连接的层的输入。在此之前，我们将拉平最后一个ReLu层的输出。展平意味着我们将其转换为向量。然后，将向量值连接到完全连接层中的所有神经元。为此，我们使用以下Keras函数：

cnn.add(Flatten())        
cnn.add(Dense(64))

退出 (Dropout)

Just like pooling, dropout can help to avoid overfitting. It randomly sets a specified fraction of the inputs to zero, during the training of the model. A dropout rate between 20 and 50% is considered to work well.

就像汇集一样，辍学可以帮助避免过度拟合。在训练模型期间，它将输入的指定比例随机设置为零。辍学率在20％到50％之间被认为效果很好。

cnn.add(Dropout(0.2))

乙状结肠激活 (Sigmoid activation)

Because we want to produce a probability that the image is one of two butterfly species (i.e. binary classification), we can use a sigmoid activation layer.

因为我们想产生图像是两个蝴蝶种类之一(即二进制分类)的可能性，所以我们可以使用S形激活层。

cnn.add(Activation('relu'))
cnn.add(Dense(1))
cnn.add(Activation( 'sigmoid'))

在蝴蝶图像上应用卷积神经网络 (Applying the convolutional neural network on the butterfly images)

Now we can define the complete convolutional neural network structure as displayed at the beginning of this post. First, we need to import the necessary Keras modules. Then we can start adding the layers that we explained above.

现在，我们可以定义完整的卷积神经网络结构，如本文开头所示。首先，我们需要导入必要的Keras模块。然后，我们可以开始添加上面说明的图层。

from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Flatten, Dense, Dropout
from keras.preprocessing.image import ImageDataGenerator
import time
IMG_SIZE = # Replace with the size of your images
NB_CHANNELS = # 3 for RGB images or 1 for grayscale images
BATCH_SIZE = # Typical values are 8, 16 or 32
NB_TRAIN_IMG = # Replace with the total number training images
NB_VALID_IMG = # Replace with the total number validation images

I made some additional parameters explicit for the conv layers. Here is a short explanation:

我为转换层明确指定了一些其他参数。这是一个简短的解释：

kernel_size specifies the filter size. So for the first conv layer this is size 2×2
kernel_size指定过滤器的大小。因此，对于第一个转换层，其大小为2×2
padding = ‘same’ means applying zero padding as such the original image size is preserved.
padding ='same'表示应用零填充，因为这样会保留原始图像大小。
padding = ‘valid’ means we do not apply any padding.
padding ='valid'表示我们不应用任何填充。
data_format = ‘channels_last’ is just to specify that the number of color channels is specified last in the input_shape argument.
data_format ='channels_last'仅用于指定在input_shape参数中最后指定颜色通道的数量。

cnn = Sequential()
cnn.add(Conv2D(filters=32, 
               kernel_size=(2,2), 
               strides=(1,1),
               padding='same',
               input_shape=(IMG_SIZE,IMG_SIZE,NB_CHANNELS),
               data_format='channels_last'))
cnn.add(Activation('relu'))
cnn.add(MaxPooling2D(pool_size=(2,2),
                     strides=2))
cnn.add(Conv2D(filters=64,
               kernel_size=(2,2),
               strides=(1,1),
               padding='valid'))
cnn.add(Activation('relu'))
cnn.add(MaxPooling2D(pool_size=(2,2),
                     strides=2))
cnn.add(Flatten())        
cnn.add(Dense(64))
cnn.add(Activation('relu'))
cnn.add(Dropout(0.25))
cnn.add(Dense(1))
cnn.add(Activation('sigmoid'))
cnn.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

Finally, we compile this network structure and set the loss parameter to binary_crossentropy which is good for binary targets and use accuracy as the evaluation metric.

最后，我们编译该网络结构，并将损失参数设置为binary_crossentropy，这对二进制目标很有利，并使用精度作为评估指标。

After having specified the network structure, we create the generators for the training and validation samples. On the training samples, we apply data augmentation as explained above. On the validation samples, we do not apply any augmentation as they are just used to evaluate the model performance.

指定网络结构后，我们为训练和验证样本创建生成器。在训练样本上，我们如上所述应用数据增强。在验证样本上，我们不应用任何扩充，因为它们仅用于评估模型性能。

train_datagen = ImageDataGenerator(
    rotation_range = 40,                  
    width_shift_range = 0.2,                  
    height_shift_range = 0.2,                  
    rescale = 1./255,                  
    shear_range = 0.2,                  
    zoom_range = 0.2,                     
    horizontal_flip = True)
validation_datagen = ImageDataGenerator(rescale = 1./255)
train_generator = train_datagen.flow_from_directory(
    '../flickr/img/train',
    target_size=(IMG_SIZE,IMG_SIZE),
    class_mode='binary',
    batch_size = BATCH_SIZE)
validation_generator = validation_datagen.flow_from_directory(
    '../flickr/img/validation',
    target_size=(IMG_SIZE,IMG_SIZE),
    class_mode='binary',
    batch_size = BATCH_SIZE)

With the flow_from_directory method on the generators we can easily go through all the images in the specified directories.

使用flow_from_directory 生成器上的方法，我们可以轻松浏览指定目录中的所有图像。

Lastly, we can fit the convolutional neural network on the training data and evaluate with the validation data. The resulting weights of the model can be saved and reused later on.

最后，我们可以将卷积神经网络拟合到训练数据上，并使用验证数据进行评估。可以保存模型的最终权重，以后再使用。

start = time.time()
cnn.fit_generator(
    train_generator,
    steps_per_epoch=NB_TRAIN_IMG//BATCH_SIZE,
    epochs=50,
    validation_data=validation_generator,
    validation_steps=NB_VALID_IMG//BATCH_SIZE)
end = time.time()
print('Processing time:',(end - start)/60)
cnn.save_weights('cnn_baseline.h5')

The number of epochs is arbitrarily set to 50. An epoch is the cycle of forward propagation, checking the error and then adjusting the weights during backpropagation.

时期数可任意设置为50。一个时期是前向传播的周期，先检查误差，然后在反向传播期间调整权重。

The steps_per_epoch is set to the number of training images divided by the batch size (by the way, the double division symbol will make sure the result is an integer and not a float). Specifying a batch size greater than 1 will speed up the process. Idem for the validation_steps parameter.

将steps_per_epoch设置为训练图像的数量除以批处理大小(顺便说一句，双分割符号将确保结果是整数而不是浮点数)。指定大于1的批处理大小将加快该过程。 Validation_steps参数的同上。

结果 (Results)

After running 50 epochs, we have a training accuracy of 0.8091 and validation accuracy of 0.7359. So the convolutional neural network still suffers from quite some overfitting. We also see that the validation accuracy varies quite a lot. This is because we have a small set of validation samples. It would be better to do k-fold cross-validation for each evaluation round. But that would take quite some time.

运行50个纪元后，我们的训练精度为0.8091，验证精度为0.7359。因此，卷积神经网络仍然存在一些过拟合的问题。我们还看到，验证准确性差异很大。这是因为我们有少量的验证样本。最好在每个评估回合进行k倍交叉验证。但这将花费一些时间。

To address the overfitting we could:

为了解决过度拟合问题，我们可以：

increase the dropout rate
增加辍学率
apply dropout at each layer
在每一层应用辍学
find more training data
查找更多训练数据

We’ll look into the first two options and monitor the result. The results of our first model will serve as a baseline. After applying an extra dropout layer and increasing the dropout rates, the model is a bit less overfitted.

我们将研究前两个选项并监视结果。我们第一个模型的结果将作为基线。在应用了额外的辍学层并提高了辍学率之后，该模型的过拟合程度有所降低。

I hope you’ve all enjoyed reading this post and learned something new. The full code is available on Github. Cheers!

希望大家都喜欢阅读这篇文章并学到一些新东西。完整的代码可以在Github上找到。干杯!