Optimization

http://answers.opencv.org/question/755/object-detection-slow/#760


Haar features are inherently slow - they make extensive use of floating point operations, which are a bit slow on mobile devices.

A quick solution would be to turn to LBP cascades - all you need is a few lines changed in your code. The performance gain is significant, and the loss in accuracy is minimal. Look for lbpcascades/lbpcascade_frontalface.xml.

If you want to dig deeper into optimzations, here is a generic optimization tip list (cross-posted from SO) Please note that face detection, being one of the most requested features of OpenCV, is already quite optimized, so advancing it further may mean deep knowledge.

Advice for optimization

A. Profile your app. Do it first on your computer, since it is much easier. Use visual studio profiler, and see what functions take the most. Optimize them. Never ever optimize because you think is slow, but because you measure it. Start with the slowest function, optimize it as much as possible, then take the second slower.

B. First, focus on algorithms. A faster algorithm can improve performance with orders of magnitude (100x). A C++ trick will give you maybe 2x performance boost.

Classical techniques:

  • Resize you video frames to be smaller. many times, you can extract the information from a 200x300px image, instead of a 1024x768. The area of the first one is 10 times smaller.

  • Use simpler operations instead of complicated ones. Use integers instead of floats. And never usedouble in a matrix or a for loop that executes thousands of times.

  • Do as little calculation as possible. Can you track an object only in a specific area of the image, instead of processing it all for all the frames? Can you make a rough/approximate detection on a very small image and then refine it on a ROI in the full frame?

C. In for loops, it may make sense to use C style instead of C++. A pointer to data matrix or a float array is much faster than mat.at<i, j=""> or std::vector<>. But change only if it's needed. Usually, a lot of processing (90%) is done in some double for loop. Focus on it. It doesn't make sense to replace vector<> all over the place, ad make your code look like spaghetti.

D. Some OpenCV functions convert data to double, process it, then convert back to the input format. Beware of them, they kill performance on mobile devices. Examples: warping, scaling, type conversions. Also, color space conversions are known to be lazy. Prefer grayscale obtained directly from native YUV.

E. ARM processors have NEON. Learn and use it. It is powerfull!

A small example:

float* a, *b, *c;
// init a and b to 1000001 elements
for(int i=0;i<1000001;i++)
    c[i] = a[i]*b[i];

can be rewritten as follows. It's more verbose, but trust me it's faster.

float* a, *b, *c;
// init a and b to 1000001 elements
float32x4_t _a, _b, _c;
int i;
for(i=0;i<1000001;i+=4)
{  
    a_ = vld1q_f32( &a[i] ); // load 4 floats from a in a NEON register
    b_ = vld1q_f32( &b[i] );
    c_ = vmulq_f32(a_, b_); // perform 4 float multiplies in parrallel
    vst1q_f32( &c[i], c_); // store the four results in c
}
// the vector size is not always multiple of 4 or 8 or 16. 
// Process the remaining elements
for(;i<1000001;i++)
    c[i] = a[i]*b[i];

Purists say you must write in assembler, but for the regular programmer guy that's a bit daunting. I found good results writing with intrinsics, like in the above example.

Also check this blog post and the following posts about NEON.

And, last but not least, I should mention that I had very good success converting the SSE optimizations (this is the NEON counterpart in x86-64 processors) in OpenCV to NEON, like here. This is the image filtering code for uchar matrices (the regular image format). You should't blindly convert instructions one by one, because there are better ways to do it, but take it as an example to start with.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值