# fastai 图像分类_使用fastai v2在15行代码下构建任何深度学习图像分类器

fastai 图像分类

Fastai is a high level deep learning library built on top of PyTorch. Jeremy Howard has recently introduced a newer version of the library along with a very handy, beginner-friendly book and a course. I was pretty surprised at its abstraction level, which helps you to create state-of-the-art models in minutes, without having to worry about the math behind it.

Fastai是建立在PyTorch之上的高级深度学习库。 杰里米·霍华德(Jeremy Howard )最近推出了该库的较新版本，以及一本非常方便，对初学者友好的和一门课程。 我对它的抽象级别感到非常惊讶，它可以帮助您在几分钟内创建最新模型，而不必担心其背后的数学原理。

This article was written with total beginners in mind, and you would be able to follow it even if you have little coding experience. After reading this article you will be in a position to write a piece of code from scratch which can identify your favorite superhero or even recognize the animal by the sound it makes.

Here’s a snapshot of the outcome of the mask classifier I’ve trained using little training data, few lines of code, and a few minutes of training on a GCP cluster Click here to setup your own FastAI GPU VM for *free

In order to achieve this result firstly, we need to find faces in the image also known as localization, then classify each face and draw a colored bounding box according to the category (green: with_mask, red: no_mask, yellow: mask_worn_improperly) it belongs. Today lets understand the second part of the problem multi-class image classification

Project code is available here

I will explain some basic concepts of deep learning and computer vision while we write some code. I highly recommend you to run the code line by line on a jupyter notebook as we understand the ideas behind the abstract functions

# 处理数据(Processing the data)

The library is split up into modules, we primarily have tabular, text, and vision. Our problem for today will involve vision so let’s import all the functions which we are gonna need from the vision library

In [1]: from fastai.vision.all import *

Just like how we learn to identify objects by observing images our computer to needs data to recognize images. For mask detection I’ve curated a labeled dataset collected from kaggle and other sources, you may download it from here

We store the path where our dataset resides. Path returns apathlib object which can be used to perform some file operations very easily

In [2]: DATASET_PATH = Path('/home/kavi/datasets')

Before we train our model (teach our algorithm to recognize images) we first need to tell it a few things

• What is the expected input and output? What is the problem domain?

预期的输入和输出是什么？ 什么是问题域？
• Where is the data located and how it is labeled?

数据位于何处以及如何标记？
• How much data do we want to keep it aside to evaluate the performance of the model?

我们要保留多少数据以评估模型的性能？
• Do we need to transform the data? If so how?

我们需要转换数据吗？ 如果可以，怎么办？

Fastai has this super flexible function called DataBlock which take inputs to the above questions and prepares a template

Fastai具有称为DataBlock超级灵活功能，该功能将输入上述问题并准备模板

get_items=get_image_files,
get_y=parent_label,
blocks=(ImageBlock, CategoryBlock),
item_tfms=RandomResizedCrop(224, min_scale=0.3),
splitter=RandomSplitter(valid_pct=0.2, seed=100),
batch_tfms=aug_transforms(mult=2)
)
• get_image_files function grabs all the image file locations recursively in the given path and returns them, this way we tell fastai where to get_items

get_image_files功能抓住所有的图像文件中给出的路径递归地点和回报他们，这样我们就告诉fastai哪里get_items

• In our dataset I’ve placed the images in separate folders named according to the category, parent_label function returns the parent directory of the file according to the path

在我们的数据集中，我将图像放置在根据类别命名的单独文件夹中， parent_label函数根据路径返回文件的父目录

You may write your own function according to how your data is labeled

• After knowing the file paths of the input images and target labels we need to preprocess the data based on the type of problem. Sample pre-processing steps for images includes creating images from file paths using Pillow and converting them into a tensor

在了解了输入图像和目标标签的文件路径之后，我们需要根据问题的类型对数据进行预处理。 图像的示例预处理步骤包括使用Pillow从文件路径创建图像并将其转换为张量

## 如何在我们的PC中显示图像？(How an image is represented in our PC?)

Every image is a matrix of pixel intensities. Each value ranges from 0 –255. 0 being the darkest and 255 being the brightest intensity of respective channel

A color image is a 3 layered matrix / 3rd order tensor. Each layer comprises Red, Green, Blue intensities whereas a black and white picture is a 1-D matrix

We pass a tuple (input_transformation, output_transformation) of type TransformBlock to blocks In our problem we need to predict a category for an image hence we pass (ImageBlock, CategoryBlock) If suppose you want to predict the age of a person based on his picture you would need to pass (ImageBlock, RegressionBlock)

• Deep learning models work best when all the images are of the same size, also they learn quicker when the resolution is lower. We will tell how to rescale the images by passing a suitable resize function to item_tfms and this function will be applied to each image

当所有图像的大小相同时，深度学习模型的效果最佳，而在分辨率较低时，深度学习模型的学习速度也会更快。 我们将通过将适当的调整大小函数传递给item_tfms来说明如何调整图像的缩放item_tfms并且此函数将应用于每个图像

Fastai provides various resizing methods: Crop, Pad or Squish. Each of them is somewhat problematic. Sometimes we might even end up losing some critical information just like in the third image after the center crop

Fastai提供了各种调整大小的方法：裁切，填充或挤压。 他们每个人都有些问题。 有时我们甚至可能会丢失一些关键信息，就像中心裁剪后的第三张图片一样

We can use RandomResizedCrop to overcome this issue. Here we randomly select a part of the picture and train the model for a couple of epochs (One complete pass through all the images in the dataset) covering most regions of every picture. min_scale determines how much of the image to select at minimum each time.

• A model accuracy assessment on the training dataset would result in a biassed score and may lead to poor performance on unseen data. We need to tell our DataBlock API to set aside some of the pre-processed data to evaluate the performance of the model. The data which the algorithm sees is called training data and the data kept aside is called validation data. Often the datasets will have validation set defined, but in our case we don't have one so we need to pass a split function to splitter

在训练数据集上进行模型准确性评估将导致分数出现偏差，并可能导致看不见的数据表现不佳。 我们需要告诉我们的DataBlock API预留一些预处理数据以评估模型的性能。 算法看到的数据称为训练数据，保留的数据称为验证数据。 通常，数据集会定义验证集，但是在我们的情况下，我们没有验证集，因此我们需要将split函数传递给splitter

Fastai has a couple of split functions, lets use RandomSplitter for today’s problem valid_pct will determine what fraction of training data needs to set aside and seed will make sure that the same random images are set aside always

Fastai有几个分割函数，让我们使用RandomSplitter解决今天的问题valid_pct将确定需要保留训练数据的哪一部分， seed将确保始终保留相同的随机图像

• Having a diverse data set is crucial to the performance of any deep learning model. So what do you do if you don’t have the appropriate amount of data? We generate new data based on existing data this process is called data augmentation

拥有多样化的数据集对于任何深度学习模型的性能至关重要。 那么，如果没有足够的数据量该怎么办？ 我们根据现有数据生成新数据，此过程称为数据扩充

Data augmentation refers to creating random variations of our input data, such that they appear different, but do not actually change the meaning of the data. — Fastbook

The above images are generated from a single picture of teddy. In the real world, we often have to make predictions on unseen data, the model will perform poorly if it just memorizes the training data (overfitting), rather it should understand the data. Augmenting the data is proven helpful in improving the performance of the models in many cases

In fastai we have a predefined function aug_transforms which performs some default image transforms such as flipping, altering the brightness, skewing, and few others. We pass this function to batch_tfms and the interesting thing to note is that these transforms are performed on a GPU (if available)

DataBlock holds the list instructions which will be performed on our dataset. This acts as a blueprint to create a DataLoader which takes our dataset path and apply the pre-processing transforms on the images as defined by the DataBlock object and load them into a GPU. After loading the data batch_tfms will be applied on the batch. The default batch size is 64, you may increase/decrease based on your GPU memory by passing bs=n to dataloders function

Tip: !nvidia-smi command can be executed in your notebook anytime to know your GPU usage details. You may restart the kernel to free the memory

dls is a dataloader object which holds training and validation data. You may have a look at our transformed data using dls.show_batch()

dls是一个数据加载器对象，其中包含训练和验证数据。 您可以使用dls.show_batch()查看我们转换后的数据

A model is a set of values also known as weights that can be used to identify patterns. Pattern recognition is everywhere, in living beings it's a cognitive process that happens in the brain without us being consciously aware of it. Much as how we learn to recognize colors, objects, and letters as a child by staring at them. Training a model is determining the right set of weights which can solve a particular problem, in our case to classify an image into 3 categories: with_mask, without_mask, mask_weared_incorrect

## 如何快速训练模型？ (How to train models quickly?)

As a grown-up adult, we can learn to recognize objects almost instantly this is because we have been learning patterns all the way since we were born. Initially, we have learned to recognize colors, then simple objects like balls, flowers. Later after few years, we were able to recognize persons and other complex objects. Similarly in machine learning, we have pre-trained models that have been trained to solve a similar problem and can be modified to solve our problem

Using a pretrained model for a task different to what it was originally trained for is known as transfer learningFastbook

resnet34 is one such model trained on the ImageNet dataset which contains around 1.3 million images and can classify into 1000’s of image categories. In our problem, we have only 3 categories so we use the initial layers that are useful to recognize basic patterns such as lines, corners, simple shapes and then retrain on the final layers

resnet34是在ImageNet数据集上训练的一种这样的模型，其中包含约130万张图像，并且可以分类为1000幅图像类别。 在我们的问题中，我们只有3个类别，因此我们使用初始层，这些初始层可用于识别基本模式(例如线，角，简单形状)，然后在最终层上进行重新训练

In[11]: learn = cnn_learner(dls, resnet34, metrics=error_rate)

Fastai offers a cnn_learner function that is particularly useful in the training of computer vision models. Which takes DataLoader object dls, pre-trained model resnet34 (here 34 mean we have 34 layers), and a metric error_rate which calculates the percentage of images that are classified incorrectly on validation data

A metric is a function that measures the quality of the model’s predictions using the validation set — Fastbook

## 转移学习如何工作？ (How does transfer learning work?)

Initially, we replace the final layer of our pretrained model with one or more new layers with randomized weights this part is known as head. We update the weights of the head using the back-propagation algorithm which we will learn in a different article

Fastai provides a method fine_tune which performs the task of tuning the pre-trained model to solve our specific problem using the data we have curated

Fastai提供了fine_tune方法，该方法执行调整预训练模型的任务，以使用我们fine_tune的数据解决我们的特定问题

In[12]: learn.fine_tune(4)

We pass a number to fine_tune which tells how many epochs (the number of times we fully go through the dataset) you need to train. This is something you need to play with, there is no hard and fast rule. It depends on your problem, dataset, and the time you want to spend on training. You may run the function multiple times with different epochs

Tips from Jeremy and my learnings:

• Training for large no of epochs can lead to overfitting, which may lead to poor performance on unseen data. If the validation loss continues to increase in consecutive epochs it means that our model is memorizing the training data and we need to stop training

大量培训可能会导致过度拟合，这可能会导致看不见数据的性能下降。 如果验证损失在连续的时期内继续增加，则意味着我们的模型正在存储训练数据，我们需要停止训练

• We can improve performance by training the model with different resolutions. Eg. Train with 224x224 and later with 112x112 pixel images

我们可以通过训练具有不同分辨率的模型来提高性能。 例如。 使用224x224进行训练，然后使用112x112像素图像进行训练

• Data augmentation will help prevent overfitting to some extent

数据扩充将在一定程度上帮助防止过度拟合

# 使用推理模型(Using the Model for Inference)

Now we have our trained mask classifier which can classify whether a person is wearing a mask properly, improperly, or not wearing a mask from the picture of his face. We can export this model and use it to predict the class elsewhere

In[13]: learn.export()
In[14]: learn.predict('path/to/your/image.jpg')

I went on to make a REST API exposing this model to the internet and with the help of my friends: Vaishnavi and Jaswanth, made a web application which takes an input image and draws bounding boxes according to the category the face belongs and also the count of face categories. Please feel free to drop your feedback there which will help improve the model. The web app is live at https://findmask.ml

Article on how did I build a Deep Learning REST API is coming soon :)

# 结论 (Conclusion)

Now you are in spot to build your image classifier, you may also use this technique to classify sounds by converting them into spectrograms or any suitable image form. If you are starting to learn deep-learning I highly recommend you to take the fastai v2 course and join the study group organised by Sanyam Bhutani and MLT where we read and discuss each chapter of Fastbook everyweek.

Feel free to reach me for any feedback or if you are facing any issue, happy to help. I’m active at Twitter and LinkedIn

fastai 图像分类

02-25 1649
06-27 5793
06-24 3406
08-04 206
07-14 2403
01-05 1037
08-05 1377
07-29 1368