Spindle Net: Person Re-identification with Human Body Region Guided Feature Decomposition and Fusion

论文Spindle Net: Person Re-identification with Human Body Region Guided Feature Decomposition and Fusion解读

1 Abstruct

In this paper, the author proposed a new CNN, called Spindle Net, which covers two parts: the Feature Extraction Net (FEN) and the Feature Fusion Net (FFN). The unique advantages of this network include: 1) It separately extracts semantic feature from different body region to get macro and micro body features. 2) It merges(fuses) these different regions feature learned from Spindle Net by a competitive scheme and discriminative features can be well preserved.

2 Related work

Figure 1 these
Figure 1. The challenges of ReID. (a-b) denotes that we can’t only use position information to judge two images, which may result in ambiguities. © shows the importance of detail information, these two images are similar but not the same identification. (d) Occasion problem.
The contributions of this essay include:

  1. It is the first time human body structure information considered in a ReID pipeline. Which can describe much more features about different body region and detail information
  2. It designs a Spindle Net to handle ReID task
  3. It proposes a new ReID evaluation dataset, i.e. SenseReID

3 Body Region Proposal Network

Here, the author uses Region Proposal Network(RPN) and ReID pipeline to extract human body features. If you are not familiar with RPN, your can refer this link RPN detail(Faster RCNN). Given an input image, the RPN generates seven rectangle region proposals representing seven sub-regions of person body, including the head-shoulder region, the upper body region, the lower body region, two arm regions, and two leg regions.

  1. Body joint localization:
    It adopts a CNN to output 14 response map F i ∈ R X × Y F_i \in R^{X\times Y} FiRX×Y, where i ∈ \in [1,14], X and Y denotes the size of feature map. It calculates the largest value of each response map as final joint position.
    (1) p i = [ x i , y i ] = a r g max ⁡ x ∈ [ 1 , X ] , y ∈ [ 1 , Y ] F i ( x , y ) p_i = [x_i, y_i] = arg \max \limits_{x \in [1, X], y\in[1,Y]} F_i(x, y) \tag{1} pi=[xi,yi]=argx[1,X],y[1,Y]maxFi(x,y)(1)
  2. Body region generation:
    It obtains 7 body sub-region by RPN, each of body joint set S ∈ { S 1 A , S 2 A , S 3 A , S 1 B , S 2 B , S 3 B , S 4 B } S \in {\{S_1^A, S_2^A, S_3^A, S_1^B, S_2^B, S_3^B, S_4^B\}} S{S1A,S2A,S3A,S1B,S2B,S3B,S4B}, the total numbers of joints are 14, and the corresponding sub-region bounding box β ∈ { β 1 A , β 2 A , β 3 A , β 1 B , β 2 B , β 3 B , β 4 B } \beta \in {\{\beta_1^A, \beta_2^A,\beta_3^A,\beta_1^B,\beta_2^B, \beta_3^B, \beta_4^B\}} β{β1A,β2A,β3A,β1B,β2B,β3B,β4B}. These bounding box β \beta β can be calculated by the following formula:
    (2) β = [ x m i n , x m a x , y m i n , y m a x ] = [ min ⁡ i ∈ S ( x i ) , max ⁡ i ∈ S ( x i ) , min ⁡ i ∈ S ( y i ) , max ⁡ i ∈ S ( y i ) ] \begin{aligned} \beta &= [x_{min},x_{max}, y_{min}, y_{max}]\\ &=[\min\limits_{i \in S}(x_i),\max\limits_{i \in S}(x_i),\min\limits_{i\in S}(y_i), \max\limits_{i\in S}(y_i)] \end{aligned} \tag{2} β=[xmin,xmax,ymin,ymax]=[iSmin(xi),iSmax(xi),iSmin(yi),iSmax(yi)](2)
    image2
    Figure 2. Illustration of the Region Proposal Network. (a) One sample image and the fourteen body joints. (b) The fourteen body joints are assigned to seven sets. ( c ) (c) (c)The seven body sub-regions proposed by the RPN from the corresponding body joint sets.

4 Body Region Guided Spindle Net

Spindle Net
Figure3. The Spindle Net architecture. The left is Feature Extraction Net(FEN) and the right is Feature Fusion Net(FFN).

4.1 Feature Extraction Network (FEN)

We can see the left part of above Spindle Net pipeline, it contains multi-stages network composed with two ROI pooling stages and three CNN stages. The sub-regions features are extracted at different stages. Three macro region features are extracted after FEN-c1, and four micro region features are extracted after FEN-c2. To be specific, when input a image(96*96), getting a full body feature map F 0 c 1 F_0^{c1} F0c1 by FEN-c1 net, then it pools out three macro sub-regions from F 0 c 1 F_0^{c1} F0c1 and gets corresponding feature maps proposed by RPN. At FEN-2 stage, using the same method, it can generate four micro sub-region feature maps. At FEN-3 stage, output eight 256-dimensional feature vectors.
在这里插入图片描述
Figure 4. Two examples to demonstrate the effectiveness of the proposed sub-region features. (a) Two images of the same person. (b) Corresponding feature maps after FEN-C1. ( c ) (c) (c) Feature maps after FEN-P1. (d) Two similar persons. (e) Corresponding feature maps after FEN-C1. (f) Feature maps after FEN-P1. The average L2 distances between feature maps are also listed.

4.2 Feature Fusion Network (FFN)

FFN is showed the right part of above Spindle Net architecture. It also includes multi-stages network. Fusion unit has two processes: 1) First, it computes among these sub-region feature maps by an element-wise maximization operation. 2) Second, the feature transformation process is conducted by a inner product layer. Here it uses a tree-structured fusion strategy to fuse different sub-region feature maps, which can be seen clearly from the net pipeline.
在这里插入图片描述
Figure 5. Illustration of feature fusion. Feature entries are sorted for better visualization. (a) Input image. (b-d) Three input feature vectors of the body fusion unit. The features of the head-shoulder region, the upper body region, and lower body region are marked in red, green and blue, respectively. (f) Result of the max oper- ation. The head-shoulder features win 46.1% of the competition, much more than the other two region features in green and blue.

  • 2
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值