[Pang, 2011] Y. Pang, Y. Yuan, X. Li, et al. Efficient HOG human detection [J]. Signal Processing, 2011, 91: 773-781. 阅读笔记
We call this method as cell-based trilinear interpolation.
To decrease the computation complexity, we develop a sub-cell based trilinear interpolation.
The proposed method avoids unimportant interpolation by omitting the gradients whose locations are far from the cell (sub-cell) in concern. Specifically, each cell (8×8 pixels) in a block (16×16 pixels) is divided into 4 sub-cells (there are 4×4 pixels in a sub-cell). We classify the 4 sub-cells into 3 types: corner sub-cell, inner sub-cell, and semi-inner sub-cell. As illustrated in Fig. 4, a block contains 4 cells: C1, C2, C3, and C4 (see Fig. 4(a)) and Ci contains 4 sub-cells: Ci1, Ci2, Ci3, and Ci4 (see Fig. 4(b)). Note from Fig. 4(c) that C11, C22, C33, and C44 are called corner sub-cells, C14,C23, C41, andC32 are called inner sub-cells, and C12, C21, C24, C42, C43, C34, C31, and C13 are called semi-inner sub-cells. In our algorithm the 3 types of sub-cells have different roles in computing the HOG features:
-
(1)
-
Because that the corner sub-cells are at the corners of a block and are far from the other 3 cells, the gradients in the corner sub-cells are merely used to compute the histogram corresponding to their own cells. That is, the gradients in C11, C22, C33, and C44 contribute only to the histograms of C1, C2, C3, and C4, respectively.
(2)
-
Because that the inner sub-cells are near to all the 4 cells, so we let the gradients in the inner sub-cells to contribute to the histograms of all the 4 cells.
(3)
-
Each semi-inner sub-cell in a cell is a neighbor of a unique cell. Therefore, the gradients in each semi-inner cell are involved in computing the histograms of its own cell and its neighbor cell. Take the semi-inner cell C13 for example, C13 is contained in C1 and is a neighbor of C3. So the gradients in C13 are used for computing the histograms of C1 and C3, but they are independent to the histograms of C2 and C4.
-
3.3. Efficient implementation of pixel based interpolation
Now the question is how to compute the item hist(Cij) of Eq. (6). A straightforward way is to utilize Eqs.(1) and (2) to calculate each bin of hist(Cij). According to Eqs. (1) and (2), the contribution of gradient at location (x,y) to orientation θ2 is given by
(8)(9)(10)(11)Generally, it is time-consuming to compute Eq. (8) because it has to calculate the coefficients.
To speed up the computation process, we propose to compute the coefficients a and b offline and save them to a look-up-table. Given (x,y) the coefficients a and b can be obtained by the look-up-table. Note that the coefficient c cannot be computed by the look-up-table trick because the value of θ(x,y) is continuous instead of discrete. Using the look-up-table, computing Eq. (8) requires only 4 multiplications and 1 addition.
It is worth noting that Wang et al. [27] proposed to estimate Eq. (8) by convolving the gradients at (x,y) with the following 7 by 7 kernel:
(12)Throughout this paper we call this method “convolution method” [27]. The convolution takes 50 multiplications and 49 additions. Even with fast Fourier transformation (FFT) the process takes more operations than our proposed look-up-table method. But the convolution method can be integrated with the trick of integral image [16], so it is faster than the traditional HOG method [10].
三线插值实现代码:
//三线插值 //针对快速实现的方法 void calHogBlock(Mat& grad_ang, Point btl, float *votes) //btl就是每个块每个块的头坐标 { static double *gaussmask = getGaussMask(); int x = btl.x; int y = btl.y; int index_x, index_y, index_z; for (int i = 0; i < parm.block * parm.cell; i++) //一个block 的一维的长度 { for (index_x = 0; index_x < 2; index_x++) //索引cell { if ((i - parm.mid[index_x]) < 0) //要在那两个cell的右边 break; } float *gpd = grad_ang.ptr<float>(x+i); //在这个基础上的第几行 double *spd = gaussmask + i*16; // i*parm.block*parm.cell; //每一行的第一个的指针 for (int j = 0; j < parm.block * parm.cell; j++) { float temp = gpd[(j + y) * 2]; //每个的grident的值 float angle = gpd[(j + y) * 2 + 1]; //*2 表示通道数 for (index_y = 0; index_y < 2; index_y++) //cell 的列坐标 { if ((j - parm.mid[index_y]) < 0) break; } for (index_z = 0; index_z < DIR; index_z++) //维度 { if ((angle - dirs[index_z]) < 0) //找到他所在的那个区间,然后进行投影 break; //angle比那个小的一个,即在那个比angle大一个的那个区段上限 } float weight = float(spd[j] * temp); float x_interpolate = (float)(i-parm.mid[0]) / parm.cell; //插值的权重 float y_interpolate = (float)(j-parm.mid[0])/ parm.cell; //插值的权重 float z_interpolate = (float)(dirs[index_z] - angle) * DIR/PI; // if (index_x == 0) { if (index_y == 0) { if (index_z == 0) { votes[0] += weight; // } else if(index_z == 9) { votes[8] += weight; } else { votes[index_z-1] += weight * (1-z_interpolate); votes[index_z] += weight * z_interpolate; } } else if (index_y == 1) { if (index_z == 0) { votes[0] += weight * y_interpolate; votes[DIR] += weight * (1 - y_interpolate); } else if(index_z == 9) { votes[8] += weight * y_interpolate; votes[DIR+8] += weight * (1 - y_interpolate); } else { votes[index_z-1] += weight * y_interpolate * (1-z_interpolate); votes[index_z] += weight * y_interpolate * z_interpolate; votes[DIR+index_z-1] += weight * (1 - y_interpolate) * (1 - z_interpolate); votes[DIR+index_z] += weight * (1 - y_interpolate) * z_interpolate; } } else //index_y = 2 { if (index_z == 0) { votes[DIR] += weight; } else if(index_z == 9) { votes[DIR+8] += weight; } else { votes[DIR+index_z-1] += weight * (1 - z_interpolate); votes[DIR+index_z] += weight * z_interpolate; } } } else if(index_x == 1) { if (index_y == 0) { if (index_z == 0) { votes[0] += weight * x_interpolate; votes[parm.block * DIR] += weight * (1 - x_interpolate); } else if(index_z == 9) { votes[8] += weight * x_interpolate; votes[parm.block * DIR + 8] += weight * (1 - x_interpolate); } else { votes[index_z-1] += weight * x_interpolate * (1-z_interpolate); votes[index_z] += weight * x_interpolate * z_interpolate; votes[parm.block * DIR + index_z-1] += weight * (1-x_interpolate) * (1-z_interpolate); votes[parm.block * DIR + index_z] += weight * (1-x_interpolate) * z_interpolate; } } else if (index_y == 1) { if (index_z == 0) { votes[0] += weight * x_interpolate * y_interpolate; votes[DIR] += weight * x_interpolate * (1-y_interpolate); votes[parm.block * DIR] += weight * (1-x_interpolate) * y_interpolate; votes[parm.block * DIR + DIR] += weight * (1-x_interpolate) * (1-y_interpolate); } else if(index_z == 9) { votes[8] += weight * x_interpolate * y_interpolate; votes[DIR + 8] += weight * x_interpolate * (1-y_interpolate); votes[parm.block * DIR + 8] += weight * (1-x_interpolate) * y_interpolate; votes[parm.block * DIR + DIR + 8] += weight * (1-x_interpolate) * (1-y_interpolate); } else { votes[index_z-1] += weight * x_interpolate * y_interpolate * (1-z_interpolate); votes[index_z] += weight * x_interpolate * y_interpolate * z_interpolate; votes[DIR+index_z-1] += weight * x_interpolate * (1-y_interpolate) * (1-z_interpolate); votes[DIR+index_z] += weight * x_interpolate * (1-y_interpolate) * z_interpolate; votes[parm.block * DIR+index_z-1] += weight * (1-x_interpolate) * y_interpolate * (1-z_interpolate); votes[parm.block * DIR+index_z] += weight * (1-x_interpolate) * y_interpolate * z_interpolate; votes[parm.block * DIR+DIR+index_z-1] += weight * (1-x_interpolate) * (1-y_interpolate) * (1-z_interpolate); votes[parm.block * DIR+DIR+index_z] += weight * (1-x_interpolate) * (1-y_interpolate) * z_interpolate; } } else { if (index_z == 0) { votes[DIR] += weight * x_interpolate; votes[parm.block * DIR + DIR] += weight * (1-x_interpolate); } else if(index_z == 9) { votes[8+DIR] += weight * x_interpolate; votes[parm.block * DIR+ DIR+8] += weight * (1-x_interpolate); } else { votes[DIR+index_z-1] += weight * x_interpolate * (1-z_interpolate); votes[DIR+index_z] += weight * x_interpolate * z_interpolate; votes[parm.block * DIR+DIR+index_z-1] += weight * (1-x_interpolate) * (1-z_interpolate); votes[parm.block * DIR+DIR+index_z] += weight * (1-x_interpolate) * z_interpolate; } } } else { if (index_y == 0) { if (index_z == 0) { votes[parm.block * DIR] += weight; } else if(index_z == 9) { votes[parm.block * DIR+8] += weight; } else { votes[parm.block * DIR+index_z-1] += weight * (1-z_interpolate); votes[parm.block * DIR+index_z] += weight * z_interpolate; } } else if (index_y == 1) { if (index_z == 0) { votes[parm.block * DIR] += weight * y_interpolate; votes[parm.block * DIR + DIR] += weight * (1-y_interpolate); } else if(index_z == 9) { votes[parm.block*DIR+8] += weight * y_interpolate; votes[parm.block*DIR+DIR+8] += weight * (1-y_interpolate); } else { votes[parm.block*DIR+index_z-1] += weight * y_interpolate * (1-z_interpolate); votes[parm.block*DIR+index_z] += weight * y_interpolate * z_interpolate; votes[parm.block*DIR+DIR+index_z-1] += weight * (1-y_interpolate) * (1-z_interpolate); votes[parm.block*DIR+DIR+index_z] += weight * (1-y_interpolate) * z_interpolate; } } else { if (index_z == 0) { votes[parm.block*DIR+DIR] += weight; } else if(index_z == 9) { votes[parm.block*DIR+DIR+9] += weight; } else { votes[parm.block*DIR+DIR+index_z-1] += weight * (1-z_interpolate); votes[parm.block*DIR+DIR+index_z] += weight * z_interpolate; } } } } } normL2(votes); }