Step by Step Camera Pose Estimation for Visual Tracking and Planar Markers

最新推荐文章于 2023-09-16 20:22:58 发布

「已注销」

最新推荐文章于 2023-09-16 20:22:58 发布

阅读量1.9k

点赞数

文章标签：标定

转自stackoverflow上的一篇问答，是关于相机姿态估算的，解释的比较简单明了，所以就贴过来

来源：http://dsp.stackexchange.com/questions/2736/step-by-step-camera-pose-estimation-for-visual-tracking-and-planar-markers

down vote

It is important to understand that the only problem here is to obtain the extrinsic parameters. Camera intrinsics can be measured off-line and there are lots of applications for that purpose.

What are camera intrinsics?

Camera intrinsic parameters is usually called the camera calibration matrix, K . We can write

where

αu and αv are the scale factor in the u and v coordinate directions, and are proportional to the focal length f of the camera: αu=kuf and αv=kvf . ku and kv are the number of pixels per unit distance in u and v directions.
c=[u0,v0]T is called the principal point, usually the coordinates of the image center.
s is the skew, only non-zero if u and v are non-perpendicular.

A camera is calibrated when intrinsics are known. This can be done easily so it is not consider a goal in computer-vision, but an off-line trivial step.

Some links:

ftp://svr-ftp.eng.cam.ac.uk/pub/reports/mendonca_self-calibration.pdf

What are camera extrinsics?

Camera extrinsics or External Parameters [R|t] is a 3×4 matrix that corresponds to the euclidean transformation from a world coordinate system to the camera coordinate system. R represents a 3×3 rotation matrix and t a translation.

Computer-vision applications focus on estimating this matrix.

How do I compute homography from a planar marker?

Homography is an homogeneaous 3×3 matrix that relates a 3D plane and its image projection. If we have a plane Z=0 the homography H that maps a point M=(X,Y,0)T on to this plane and its corresponding 2D point m under the projection P=K[R|t] is

In order to compute homography we need point pairs world-camera. If we have a planar marker, we can process an image of it to extract features and then detect those features in the scene to obtain matches.

We just need 4 pairs to compute homography using Direct Linear Transform.

If I have homography how can I get the camera pose?

The homography H and the camera pose K[R|t] contain the same information and it is easy to pass from one to another. The last column of both is the translation vector. Column one H1 and two H2 of homography are also column one R1 and two R2 of camera pose matrix. It is only left column three R3 of [R|t] , and as it has to be orthogonal it can be computed as the crossproduct of columns one and two: