6D姿态估计从0单排——看论文的小鸡篇——Hashmod: A Hashing Method for Scalable 3D Object Detection...

1492311-20190309153108428-923586862.png
To this end, we rely on an efficient representation of object views and employ hashing techniques to match these views against the input frame in a scalable way.
Our approach to 3D object detection is based on 2D view-specific templates which cover the appearance of the objects over multiple viewpoints. Since viewpoints include the whole object, they can generally handle objects with poor visual features, however they have not been shown to scale well with the number of images so far. We apply hash functions to image descriptors computed over bounding boxes centered at each image location of the scene, so to match them efficiently against a large descriptor database of model views. In our work, we rely on the LineMOD descriptor.
Hashing for Object Recongnition and 3D Pose Estimation
1492311-20190309153118412-406769632.png
\(M\) objects, \(N\)views for each object from poses regularly sampled on a hemisphere of a given radius. From this, we compute a set \(D\) of d-dimensional binary descriptors: \(D=\{x_{1,1},...,x_{M,N}\}\), where \(x_{i,j}\in B^d\) is the descriptor for the \(i\)-th object seen under the \(j\)-th pose. We use LineMOD in practice to compute these descriptors.
1492311-20190309153127963-1953232149.png
As usually done in template-based approaches, we parse the image with a sliding window looking for the objects of interest. We extract at each image location the corresponding descriptor \(x\). If the distance between \(x\) and its nearest neighbor \(x_{i,j}\) in \(D\) is small enough, it is very likely that the image location contains object \(i\) under pose \(j\). We tackle the issue of object scale and views of different 2D sizes by dividing the views up into clusters \(D_s \subset D\) of similar scale \(s\).

  1. Selecting the Hashing Keys
    The descriptors \(x\) are already binary strings, we design our hashing functions \(h(x)\) to return a short binary string made of \(b\) bits directly extracted from \(x\).
    Randomness-based selection: Given a set of descriptors, we select the b bits randomly among all possible d bits. Some bits are more discriminant than others in our template representation.
    Probability-based selection: we focus on the bits for which the probabilities of being 0 and 1 are close to 0:5 with a given set of descriptors. This strategy provides a high accuracy since it focuses on the most discriminant bits. However, This strategy results in a high variance in the number of elements per bucket.
    Tree-based selection: Starting with a set of descriptors at the root, we determine the bit that splits this set into two subsets with sizes as equal as possible, and use it as the first bit of the key. For the second bit, we decide for the one that splits those two subsets further into four equallysized subsets and so forth. We stop if b bits have been selected or one subset becomes empty. The \(j\)-th bit \(B\) of the key is selected bt solving: \({\arg\min}_B\sum_i\left||S^B_L(N_i)|-|S^B_R(N_i)|\right|\), where \(N_i \subset D\) is the set of descriptors contained by the \(i\)-th node at level \(j\).
    1492311-20190309153137836-1301532301.png
    Tree-based selection with view scattering: To improve detection rates we favor similar views of the same object to go into different branches. The idea behind this strategy is to reduce misdetections due to noise or clutter in the descriptor. We optimize the previous criterion with an additional term: \({\arg\min}_B\frac{1}{N_i}\sum\left||S^B_L(N_i)|-|S^B_R(N_i)|\right|+\frac{1}{|N_i|^2}(P(S_L^B(N_i))+P(S_R^B(N_i)))\), where
    1492311-20190309153148559-574466009.png
    , where \(\mathbb{I}(x,y)\) indicates if decriptors \(x\) and \(y\) encode views of the same object and \(q_x,q_y\) are the quaternions associated with the rotational part of the descriptor's poses.
  2. Remarks on the Implementation
    we disallowed all bits closer than T to
    be selected for the same LineMOD value. This forces the bit selection to take different values and positions into account

转载于:https://www.cnblogs.com/LeeGoHigh/p/10501073.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值