Deep Residual Networks for Image Classification with Python + NumPy

https://dnlcrl.github.io/projects/2016/06/22/Deep-Residual-Networks-for-Image-Classification-with-Python+NumPy.html

TL;DR

I wanted to implement “Deep Residual Learning for Image Recognition” from scratch with Python for my master’s thesis in computer engineering, I ended up implementing a simple (CPU-only) deep learning framework along with the residual model, and trained it on CIFAR-10MNIST and SFDDDResults speak by themselves.

allowtransparency="true" scrolling="no" frameborder="0" src="https://buttons.github.io/buttons.html#href=https%3A%2F%2Fgithub.com%2Fdnlcrl%2FPyFunt&text=Star%20PyFunt&data.count.api=%2Frepos%2Fdnlcrl%2FPyFunt%23stargazers_count&data.count.href=%2Fdnlcrl%2FPyFunt%2Fstargazers&data.count.aria.label=%23%20stargazers%20on%20GitHub&data.style=mega&data.icon=octicon-star&aria.label=Star%20dnlcrl%2FPyFunt%20on%20GitHub" style="width: 155px; height: 28px; border-style: none;">    allowtransparency="true" scrolling="no" frameborder="0" src="https://buttons.github.io/buttons.html#href=https%3A%2F%2Fgithub.com%2Fdnlcrl%2Fdeep-residual-networks-pyfunt&text=Star%20deep-residual-networks-pyfunt&data.count.api=%2Frepos%2Fdnlcrl%2Fdeep-residual-networks-pyfunt%23stargazers_count&data.count.href=%2Fdnlcrl%2Fdeep-residual-networks-pyfunt%2Fstargazers&data.count.aria.label=%23%20stargazers%20on%20GitHub&data.style=mega&data.icon=octicon-star&aria.label=Star%20dnlcrl%2Fdeep-residual-networks-pyfunt%20on%20GitHub" style="width: 295px; height: 28px; border-style: none;">   allowtransparency="true" scrolling="no" frameborder="0" src="https://buttons.github.io/buttons.html#href=https%3A%2F%2Fgithub.com%2Fdnlcrl%2FPyDatSet&text=Star%20PyDatSet&data.count.api=%2Frepos%2Fdnlcrl%2FPyDatSet%23stargazers_count&data.count.href=%2Fdnlcrl%2FPyDatSet%2Fstargazers&data.count.aria.label=%23%20stargazers%20on%20GitHub&data.style=mega&data.icon=octicon-star&aria.label=Star%20dnlcrl%2FPyDatSet%20on%20GitHub" style="width: 162px; height: 28px; border-style: none;">

Convolutional Neural Networks for Computer Vision

On Monday, June 13rd, I graduated with a master’s degree in computer engineering, presenting a thesis on deep convolutional neural networks for computer vision. For now it is available only in Italian, I am working on the english translation but don’t know if and when I’ll got the time to finish it, so I try to describe in brief each chapter.

The document is composed as follows:

  • Introduction

    An introduction of the topic, the description of the thesis’ structure and a rapid description of the neural networks history from perceptrons to NeoCognitron.

  • Neural Networks fundamentals

    A description of the fundamental mathematical concepts behind deep learning.

  • State of the Art

    A description of the main concepts that permitted the goals achieved in the last decade, an introduction of image classification and object localization problems, ILSVRC and the models that obtained best results from 2012 to 2015 in both the tasks.

  • Implementing a Deep Learning Framework

    This chapter contains an explanation on how to implement both forward and backward steps for each one of the layers used by the residual model, the residual model’s implementation and some method to test a network before training.

  • Experimental Results

    After developed the model and a solver to train it, I conducted several experiments with the residual model on CIFAR-10, in this chapter I show how I tested the model and how the behavior of the network changes when one removes the residual paths, applies data-augmenting functions to reduce overfitting or increases the number of the layers, then I show how to foil a trained network using random generated images or images from the dataset.

  • Conclusions

    Here I describe other results obtained training the same model on MNIST and SFDDD (check below for more infos), an overview of the project and possible future works with it.

Thesis links:

Presentation links:

Below I describe in brief how I got all of that, the sources I used, the structure of the residual model I trained and the results I obtained. Please keep in mind that my first objective was to develop and train the model so I didn’t spent much time on the design aspect of the framework, but I’m working on it (and pull requests are welcome)!

Sources

When I started to think I wanted to implement “Deep Residual Networks for Image Recognition”, on GitHub there was only this project from gcr, based on Lua + Torch, this code really helped me a lot when I had to implement the residual model.

Neural Networks and Deep Learning by Michael Nielsen contains a really well organized exhaustive introduction to the subject and a lot of code to help the user understand what is going on on each part of the process.

colah.github.io by Christopher Olah has a lot of very well written posts about deep learning and NNs, for example I found this post about convolution layers really illuminating.

Stanford’s CS231N by Andrej Karpathy et Al., a really interesting course about CNN for visual recognition, I mainly used the course material and my assignments’ solutions to buildPyFunt.

Arxiv, a repository of e-prints of scientific papers in the fields of mathematics, physics, astronomy, computer science, quantitative biology, statistics, and quantitative finance, which can be accessed online. Check alsoArxiv Sanity Preserver by Karpathy.

Many other awesome resources are listed here: https://github.com/ChristosChristofidis/awesome-deep-learning.

When I started studying deep learning I kept track of the best papers and collected titles, authors, years and links in this google sheet, which I should update frequently.

#PyFunt, PyDatSet and Deep Residual Networks

Pyfunt is a simple pythonic imperative deep learning framework: it mainly provides the implementations for the forward and backward steps for most notorious neural layers, some useful initialization function, and a solver, that is essentially a class that you instantiate and to which you pass the model to be trained and the data loaded with pydatset, which contains functions to import some dataset and a set of functions to artificially augment the training set. Just to clarify, PyFunt and PyDatSet are the names for the repos, pyfunt and pydatset are the names for the packages (so you import them with from pydatset import ...).

The residual model implementation resides in deep-residual-networks-pyfunt, which also contains the train.py file.

The residual model proposed in the reference paper is derived from the VGG model, in which convolution filters of 3x3 applied with a step of 1 if the number of channels is constant, 2 if the number of features got doubled (this is done to preserve the computational complexity on each convolutional layer). So the residual model is composed by a cascade of many residual block (or residual layers), which are groups of convolutional layers in series where the output of the last layer output is added to the original input to the block, authors suggest a couple of conv layer for each residual block should work well.

              Input
                 |
         ,-------+-----.
   Downsampling      3x3 convolution+dimensionality reduction
        |               |
        v               v
   Zero-padding      3x3 convolution
        |               |
        `-----( Add )---'
                 |
              Output

Each residual block is composed like above, where, if dimensionality reduction is applied (using a convolution step of 2 instead of 1), downsampling and zero-padding must be applied to the input before the addition, in order to permit the sum of the two ndarrays (skip_path + conv_out).

A parametric residual network have in total (6*n)+2 layers, composed as below (right values represents the dimension of a [3,32,32] sample like CIFAR images):

                                            (image_dim: 3, 32, 32; F=16)
                                            (input_dim: N, *image_dim)
     INPUT
        |
        v
   +-----------------------+
   |conv[F, image_ch, 3, 3]|                    (out_shape: N, 16, 32, 32)
   +-----------------------+
        |
        v
   +-------------------------+
   |n * res_block[F, F, 3, 3]|              (out_shape: N, 16, 32, 32)
   +-------------------------+
        |
        v
   +-------------------------+
   |res_block[2*F, F, 3, 3]  |              (out_shape: N, 32, 16, 16)
   +-------------------------+
        |
        v
   +---------------------------------+
   |(n-1) * res_block[2*F, 2*F, 3, 3]|      (out_shape: N, 32, 16, 16)
   +---------------------------------+
        |
        v
   +-------------------------+
   |res_block[4*F, 2*F, 3, 3]|              (out_shape: N, 64, 8, 8)
   +-------------------------+
        |
        v
   +---------------------------------+
   |(n-1) * res_block[4*F, 4*F, 3, 3]|      (out_shape: N, 64, 8, 8)
   +---------------------------------+
        |
        v
   +-------------+
   |pool[8, 8, 8]|                          (out_shape: N, 64, 1, 1)
   +-------------+
        |
        v
   +- - - - - - - - -+
   |(opt) m * affine |                      (out_shape: N, 64, 1, 1)
   +- - - - - - - - -+
        |
        v
   +-------+
   |softmax|                                (out_shape: N, num_classes)
   +-------+
        |
        v
     OUTPUT

You can see below a sort of package diagram that shows how train.py uses the other components to train the residual model.

Package Diagram

After I had every piece I started experimenting what happens when you remove the residual paths, when you apply or not data augmenting functions for the training set, when increase the number of layers or the number of filters for each layer. Below you can find some image of the results but I suggest to give a look at the respective JuPyter notebooks (in addition to thesis and presentation linked above), for a deeper understanding, as you can find a more exhaustive description of the results on all datasets I show below.

Results

I trained the residual model on CIFAR-10MNIST and SFDDD, and results are really exciting, at least for me. The networks learn well in nearly every test I’ve done, obviously my limit is the capacity of my desktop PC.

CIFAR-10

CIFAR-10

One of the experiments on CIFAR-10 implied training a simple 20 layers resnet, applying data-augmenting regularization functions I obtained a similar result showed in the reference paper as you can see below.

Results on CIFAR-10

Results on CIFAR-10 from MSRA

The training for this model took approximately 10 hours. more infos are available in this jupyter ipython notebook from the repo’s docs folder.

MNIST

MNIST

MNIST is a much simpler dataset in comparison with CIFAR-10, so the training times are relatively shorter and I also tried to use the half of the number of filters of each conv layers.

MNIST results

More infos for experiments with residual networks on MNIST are available here.

MNIST wrong classification from the best model

In the image above you can see all the wrongly classified validation samples from the 32 layers network, trained for just 30 epochs(!). upper left are the ground-truth class, lower left the wrong classification from the net and lower right the second classification for confidence.

SFDDD

SFDDD

State Farm Distracted Driver Detection is a dataset from State Farm on kaggle.com, it contains 640x480 images of drivers in 10 classes of distraction. For this dataset I decided to resize all the images to 64x48 and use random cropping of 32x32 for training and using the center 32x32 crop for testing. I also tried to directly scale all images to 32x32 but results were worse (confirming the fact that scaling the images doesn’t help a lot conv nets to learn more general features).

Below you can see the learning curves for two models of respectively 32 and 44 layers, it looks that both models produce a low error after 80 epochs, but the problem here is that for the validation set I used 2k images randomly extracted from the training set, so my validation set has a correlation factor which is higher than the correlation between the original training set and the validation set proposed by State Farm (on which I got an error of circa 3%).

SFDDD

Below you can see the saliency maps for six images for the class “talking on phone with right hand”, in where the lighter zones represent the portions of the images that most contributed to a correct classification from the network.

SFDDD

Other infos will be available here after competition ends.

Final Words

I hope my projects could help you learn something new. If not, maybe you can teach me something new, comments and pull requests are welcome as always!

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
deep residual learning for image recognition是一种用于图像识别的深度残差学习方法。该方法通过引入残差块(residual block)来构建深度神经网络,以解决深度网络训练过程中的梯度消失和梯度爆炸等问题。 在传统的深度学习网络中,网络层数增加时,随之带来的问题是梯度消失和梯度爆炸。这意味着在网络中进行反向传播时,梯度会变得非常小或非常大,导致网络训练变得困难。deep residual learning则使用了残差连接(residual connection)来解决这一问题。 在残差块中,输入特征图被直接连接到输出特征图上,从而允许网络直接学习输入与输出之间的残差。这样一来,即使网络层数增加,也可以保持梯度相对稳定,加速网络训练的过程。另外,通过残差连接,网络也可以更好地捕获图像中的细节和不同尺度的特征。 使用deep residual learning方法进行图像识别时,我们可以通过在网络中堆叠多个残差块来增加网络的深度。这样,网络可以更好地提取图像中的特征,并在训练过程中学习到更复杂的表示。通过大规模图像数据训练,deep residual learning可以在很多图像识别任务中达到甚至超过人类表现的准确性。 总之,deep residual learning for image recognition是一种利用残差连接解决梯度消失和梯度爆炸问题的深度学习方法,通过增加网络深度并利用残差学习,在图像识别任务中获得了突破性的表现。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值