20-lines AR in OpenCV [w/code]

Hi,

Just wanted to share a bit of code using OpenCV’s camera extrinsic parameters recovery, camera position and rotation – solvePnP (or it’s C counterpart cvFindExtrinsicCameraParams2). I wanted to get a simple planar object surface recovery for augmented reality, but without using any of the AR libraries, rather dig into some OpenCV and OpenGL code.
This can serve as a primer, or tutorial on how to use OpenCV with OpenGL for AR.

The program is just a straightforward optical flow based tracking, fed manually with four points which are the planar object’s corners, and solving camera-pose every frame. Plain vanilla AR.

Well the whole cpp file is ~350 lines, but there will only be 20 or less interesting lines… Actually much less. Let’s see what’s up

I wanna run you through the code really quickly and not go into much detail, to keep thing simple. So first of all, we should have two separate threads: Vision and Graphics. The vision thread will track and solve, and the graphics thread will display.

Initialize

1
2
3
4
5
6
7
8
9
10
11
int main(int argc, char** argv) {
     initGL(argc,argv);
     initOCV(NULL);
     
     pthread_t tId;
     pthread_attr_t tAttr;
     pthread_attr_init(&tAttr);
     pthread_create(&tId, &tAttr, startOCV, NULL);
 
     startGL(NULL); 
}

The initGL, initOCV functions just initialize stuff that can’t be initialized statically, like GLUT window definitions, some starting values for the cam-pose estimation and other boring stuff.

GLUT will run off the main thread, it seems putting it on its own thread makes it unhappy and not work.

Tracking

I’m using the simplest form of optical flow in OpenCV (LK Pyramid), and the code is equally very minimal..

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
void* startOCV(void* arg) {
     while (1) {
         cvtColor(img, prev, CV_BGR2GRAY);
 
         //get frame off camera
         cap >> frame;
         if(frame.data == NULL) break;
         
         frame.copyTo(img);
 
         cvtColor(img, next, CV_BGR2GRAY);
 
         //calc optical flow    
         calcOpticalFlowPyrLK(prev, next, points1, points2, status, err, Size(30,30));
         cvtPtoKpts(imgPointsOnPlane, points2);
 
         //switch points vectors (next becomes previous)
         points1.clear();
         points1 = points2;
         
         //calculate camera pose
         getPlanarSurface(points1);
 
         //refresh 3D scene
         glutPostWindowRedisplay(glutwin);
         
         //show tracked points on scene
         drawKeypoints(next, imgPointsOnPlane, img_to_show, Scalar(255));
         imshow("main2", img_to_show);
         int c = waitKey(30);
         if (c == ' ') {
             waitKey(0);
         }
     }
     return NULL;
}

To use OpenCV’s ‘drawKeypoints’, which makes drawing key points much easier, we must use vector<KeyPoint>. So I created these 2 very simple converter funcs: cvtKeyPtoP and cvtPtoKpts.

You think ‘getPlanarSurface’ is complicated? think again! 3 lines:

1
2
3
4
5
6
7
void getPlanarSurface(vector<Point2f>& imgP) {   
     Rodrigues(rotM,rvec);
     
     solvePnP(objPM, Mat(imgP), camera_matrix, distortion_coefficients, rvec, tvec, true);
     
     Rodrigues(rvec,rotM);
}

Booya! Vision stuff is done.

3D Graphics

A little 3D never hurt any AR system… But drawing it is very simple still:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
void display(void)
{
     glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
 
     //Make sure we have a background image buffer
     if(img_to_show.data != NULL) {
         Mat tmp;
         
         //Switch to Ortho for drawing background
         glMatrixMode(GL_PROJECTION);
         glPushMatrix();
         gluOrtho2D(0.0, 0.0, 640.0, 480.0);
         
         glMatrixMode(GL_MODELVIEW);
         
         //Textures can only have power-of-two dimensions, so closest to 640x480 is 1024x512
         tmp = Mat(Size(1024,512),CV_8UC3);
         //However we are going to use only a portion, so create an ROI
         Mat ttmp = tmp(Range(0,img_to_show.rows),Range(0,img_to_show.cols));
 
         //Some frames could be 8bit grayscale, so make sure on the output we always get 24bit RGB.
         if(img_to_show.step == img_to_show.cols)
             cvtColor(img_to_show, ttmp, CV_GRAY2RGB);
         else if(img_to_show.step == img_to_show.cols * 3)
             cvtColor(img_to_show, ttmp, CV_BGR2RGB);
         flip(ttmp,ttmp,0);
         
         glEnable(GL_TEXTURE_2D);
         glTexImage2D(GL_TEXTURE_2D, 0, 3, 1024, 512, 0, GL_RGB, GL_UNSIGNED_BYTE, tmp.data);
         
         //Finally, draw the texture using a simple quad with texture coords in corners.
         glPushMatrix();
         glTranslated(-320.0, -240.0, -500.0);//why these parameters?!
         glBegin(GL_QUADS);
         glTexCoord2i(0, 0); glVertex2i(0, 0);
         glTexCoord2i(1, 0); glVertex2i(640, 0);
         glTexCoord2i(1, 1); glVertex2i(640, 480);
         glTexCoord2i(0, 1); glVertex2i(0, 480);
         glEnd();
         glPopMatrix();
 
         glMatrixMode(GL_PROJECTION);
         glPopMatrix();
         glMatrixMode(GL_MODELVIEW);
     }
     
     glPushMatrix();
     double m[16] = {    _d[0],-_d[3],-_d[6],0,
                         _d[1],-_d[4],-_d[7],0,
                         _d[2],-_d[5],-_d[8],0,
                         tv[0],-tv[1],-tv[2],1   };
     
     //Rotate and translate according to result from solvePnP
     glLoadMatrixd(m);
     
     //Draw a basic cube
     glDisable(GL_TEXTURE_2D);
     glColor3b(255, 0, 0);
     glutSolidCube(1);
     glPopMatrix();
 
     glutSwapBuffers();
}

Not so horrific, huh? Most of it is drawing the background texture, and that’s only trying to avoid using glDrawPixels… The only interesting thing is loading the rotation and translation matrix.
However you will notice the tv[0] (x axis component of translation) doesn’t have a minus sign, that’s because OpenCV’s solvePnP assumes looking down the -z axis, while OpenGL assumes looking up the +z axis (so a 180 rotation around the x axis is needed). Same goes for _d[0] _d[1] and _d[2].
OpenGL in fact is defaulting to the camera looking down the -y axis, where the z axis is facing up (z is elevation). But in initGL I initialized OpenGL to look “normally” down the -z axis where +x goes right and +y goes up.

Proof time

Not that you need it.. :) But here’s a video of it working.

Demo:

http://www.youtube.com/watch?feature=player_embedded&v=OxBa_5HvZyI


BTW: If anyone can solve the problem of the slight misalignment of the 3D and image – let me know.

Code and Salutations

Code can be downloaded from blog’s SVN:

1
svn checkout http://morethantechnical.googlecode.com/svn/trunk/OpenCVAR morethantechnical-OpenCVAR

Now let your imagination run wild!

Farewell,
Roy.


转载自: http://www.morethantechnical.com/2010/11/10/20-lines-ar-in-opencv-wcode/


  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
编译原理是计算机专业的一门核心课程,旨在介绍编译程序构造的一般原理和基本方法。编译原理不仅是计算机科学理论的重要组成部分,也是实现高效、可靠的计算机程序设计的关键。本文将对编译原理的基本概念、发展历程、主要内容和实际应用进行详细介绍编译原理是计算机专业的一门核心课程,旨在介绍编译程序构造的一般原理和基本方法。编译原理不仅是计算机科学理论的重要组成部分,也是实现高效、可靠的计算机程序设计的关键。本文将对编译原理的基本概念、发展历程、主要内容和实际应用进行详细介绍编译原理是计算机专业的一门核心课程,旨在介绍编译程序构造的一般原理和基本方法。编译原理不仅是计算机科学理论的重要组成部分,也是实现高效、可靠的计算机程序设计的关键。本文将对编译原理的基本概念、发展历程、主要内容和实际应用进行详细介绍编译原理是计算机专业的一门核心课程,旨在介绍编译程序构造的一般原理和基本方法。编译原理不仅是计算机科学理论的重要组成部分,也是实现高效、可靠的计算机程序设计的关键。本文将对编译原理的基本概念、发展历程、主要内容和实际应用进行详细介绍编译原理是计算机专业的一门核心课程,旨在介绍编译程序构造的一般原理和基本

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值