It is important to understand that the only problem here is to obtain the extrinsic parameters. Camera intrinsics can be measured off-line and there are lots of applications for that purpose.
What are camera intrinsics?
Camera intrinsic parameters is usually called the camera calibration matrix, K . We can write
where
-
αu and αv are the scale factor in the u and v coordinate directions, and are proportional to the focal length f of the camera: αu=kuf and αv=kvf . ku and kv are the number of pixels per unit distance in u and v directions.
-
c=[u0,v0]T is called the principal point, usually the coordinates of the image center.
-
s is the skew, only non-zero if u and v are non-perpendicular.
A camera is calibrated when intrinsics are known. This can be done easily so it is not consider a goal in computer-vision, but an off-line trivial step.
What are camera extrinsics?
Camera extrinsics or External Parameters [R|t] is a 3×4 matrix that corresponds to the euclidean transformation from a world coordinate system to the camera coordinate system. R represents a 3×3 rotation matrix and t a translation.
Computer-vision applications focus on estimating this matrix.
How do I compute homography from a planar marker?
Homography is an homogeneaous 3×3 matrix that relates a 3D plane and its image projection. If we have a plane Z=0 the homography H that maps a point M=(X,Y,0)T on to this plane and its corresponding 2D point m under the projection P=K[R|t] is
In order to compute homography we need point pairs world-camera. If we have a planar marker, we can process an image of it to extract features and then detect those features in the scene to obtain matches.
We just need 4 pairs to compute homography using Direct Linear Transform.
If I have homography how can I get the camera pose?
The homography H and the camera pose K[R|t] contain the same information and it is easy to pass from one to another. The last column of both is the translation vector. Column one H1 and two H2 of homography are also column one R1 and two R2 of camera pose matrix. It is only left column three R3 of [R|t] , and as it has to be orthogonal it can be computed as the crossproduct of columns one and two:
Due to redundancy it is necessary to normalize [R|t] dividing by, for example, element [3,4] of the matrix.