Title
GhostNet: More Features from Cheap Operations
Year/ Authors/ Journal
2020
/Han, Kai and Wang, Yunhe and Tian, Qi and Guo, Jianyuan and Xu, Chunjing and Xu, Chang
/ CVPR2020
Citation
@inproceedings{
han2020ghostnet,
title={
Ghostnet: More features from cheap operations},
author={
Han, Kai and Wang, Yunhe and Tian, Qi and Guo, Jianyuan and Xu, Chunjing and Xu, Chang},
booktitle={
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={
1580--1589},
year={
2020}
}
Summary
- There are some similarity between the Encoding Layer and the Attention Block in purpose of retain more meaningful features while ignoring disturbing ones.
- There may be some innovations in building a more efficient layer for the aforementioned idea and simultaneously bring the multi-size training into consideration.
- Enlarge its width maybe a interesting direction.
- Ghost bottleneck adopts two ghost module that the first increasing the dimensions and the second one decreases the dimension for matching the output’s dimensions requirement.
My Points
- This ghost module uses squeeze and excitation module as its submodule, so we can consider fury SE module and ghost in one module for performance improvement.
- This ghost module uses s ∈ [ 1 , 2 ] s \in [1, 2] s∈[1,2], so we can consider more available s s s.
- The use of SE module has no guidance.
- low-cost linear operations to construct the Ghost module such as affine transformation and wavelet transformation.
- Adopt multi-size convolution for ghost operation.
Research Objective(s)
Fig. 1. Visualization of some feature maps generated by the first residual group in ResNet-50, where three similar feature map pair examples are annotated with boxes of the same color. One feature map in the pair can be approximately obtained by transforming the other one through cheap operations (denoted by spanners).
Contributions
In this paper, we introduce a novel Ghost module to generate more features by using fewer parameters. Specifically, an ordinary convolutional layer in deep neural networks will be split into two parts. The first part involves ordinary convolutions but their total number will be rigorously controlled. Given the intrinsic feature maps from the first part, a series of simple linear operations are then applied for generating more feature maps. Without changing the size of output feature map, the overall required number of parameters and computational complexities in this Ghost module have been decreased, compared with those in vanilla convolutional neural networks. Based on Ghost module, we establish an efficient neural architecture, namely, GhostNet.
Interesting Point(s)
The ghost module uses linear transformation for producing more redundancy feature maps in a cheap way for less computition cost.
- Ghost module.
- Linear transforming.
Background / Problem Statement
- Traditional handle methods, such as FV, SIFT, have the advantages of accepting arbitrary input images sizes and have no issue when transferring features cross different domains since the low-level features are generic. However, these methods are comprised of stacking self-constrained algorithmic components( feature extraction, dictionary learning, encoding, classifier training) as visualized in Figure 1 (left, center). Consequently,
they have the disadvantage that the features and the encoders are fixed once built, so that feature learning (CNNs and dictionary) does not benefit from labeled data. We present a new approach (Figure 1, right) where the entire pipeline is learned in an end-to-end manner. - Deep-learning is well known as an end-to-end learning of hierarchical features. This paper transfers this method to recognizing textures which needs the spatially invariant representation describing the feature distributions instead of concatenation.
- The challenge is to make the loss function differentiable with respect to the inputs and layer parameters.
All in one word: for better transferring the deep-learning method into texture recognition. Since model in this field always with pretrained in large dataset (such as ImageNet).
Method(s)
Figure 2: The Encoding Layer learns an inherent Dictionary. An illustration of the convolutional layer and the proposed Ghost module for outputting the same number of feature maps. Φ \Phi Φ represents the cheap operation.
Ghost Module
We point out that it is unnecessary to generate these redundant feature maps one by one with large number of F L O P s FLOPs FLOPs and parameters.
Suppose that the output feature maps are “ghosts” of a handful of intrinsic feature maps with some cheap transformations. These intrinsic feature maps are often of smaller size and produced by ordinary convolution filters. Specifically, m intrinsic feature maps Y ′ ∈ R h ′ × w ′ × m Y ′ ∈ R^{h′×w′ ×m} Y′∈Rh′×w′×m are generated using a primary convolution:
Y ’ = X ∗ f ′ , ( 2 ) Y’ = X *f',\qquad(2) Y’=X∗f′,(2)
where f ′ ∈ R c × k × k × m f′ ∈ R^{c×k×k×m} f′∈Rc×k×k×m is the utilized filters, m ≤ n m ≤ n m≤n and the bias term is omitted for simplicity. The hyper-parameters such as filter size, stride, padding, are the same as those in the ordinary convolution (Eq. 1) to keep the spatial size (i.e. h ′ h′ h′ and w ′ w′ w′) of the output feature maps consistent. To further obtain the desired n feature maps, we propose to apply a series of cheap linear operations on each intrinsic feature in Y ′ Y ′ Y′ to generate s s s ghost features according to the following function
y i j = Φ i , j ( y i ′ ) , ∀ i = 1 , . . . , m , j = 1 , . . . , s , ( 3 ) y_{ij} = \Phi_{i,j}(y^′_i), \quad ∀ i = 1, ..., m,\quad j = 1, ..., s,\qquad(3) yij=Φi,j(yi′),∀i=1,...,m,j=1,...,s,(3)
where y i ′ y^′_i yi′ is the i-th in