Camera and image--pinhole imaging model and coordinate system--coordinate system conversion

The process of three-dimensional reconstruction using image sequences is equivalent to a process of restoring two-dimensional images composed of many pixels to three-dimensional space. By understanding the entire projection process, it is easy to understand how to use images for 3D reconstruction and what are the key steps in 3D reconstruction. This section mainly describes the projection process of a monocular camera.

A. Pinhole imaging model and coordinate system

The process of taking an image by a camera can be simplified into a form of pinhole imaging, and the mathematical expression of the camera model can be easily obtained by using this form. Through the imaging method of the camera and its mathematical expression, the mapping relationship between the three-dimensional scene and each pixel in the image can be seen. Through this mapping relationship, the pixels in the image can be restored to three-dimensional space. If all the pixels of a series of images are restored to the entire three-dimensional space, the surface of the entire scene can be restored. The camera's pinhole imaging model structure is shown in Figure 2-1. In order to simplify it, the imaging plane is placed in front of the pinhole, and the captured image should also be upright.

Fig. 1 The image of the pinhole imaging model

In the pinhole imaging model, the whole process of projecting the scene from the three-dimensional space to the picture can be roughly decomposed into three steps and four coordinate systems. The specific process is: from the world coordinate system to the camera coordinate system, then from the camera coordinate system to the image coordinate system, and finally from the image coordinate system to the pixel coordinate system. The process of projecting the scene in front of the camera lens to a 3D image can be regarded as a transformation process of the 3D scene from the world coordinate system to the pixel coordinate system. The four coordinate systems are defined as follows.

The world coordinate system is an objective absolute existence. The world coordinate system needs to be predetermined, specifying its origin and orientation. In the defined world coordinate system, any object can be placed. In 3D reconstruction or computer vision, the camera coordinate system of the first frame image opened by the photographic camera is usually predefined as the world coordinate system. Then calculate the next movement path and pose of the camera according to the defined world coordinate system. Coordinates in the world coordinate system are usually represented by (Xw, Yw, Zw).

The current common method of defining the coordinate system of a camera is to use the optical center or principal point of a camera as the origin of the coordinate system. The X-axis and Y-axis are parallel to the horizontal and vertical axes of the image, respectively, when the picture is captured. The Z axis is the direction in which the camera's focal length points. Coordinates in the camera coordinate system are usually represented by the notation (Xc, Yc, Zc).

The image coordinate system is defined according to the image, with the center of the image as the origin, and the X and Y axes are in the same direction as the X and Y axes of the camera coordinate system. An image coordinate system is a coordinate system expressed in physical units such as meters and centimeters. The image coordinate system has only two dimensions and no Z axis. Coordinates in the image coordinate system are usually represented by the notation (x, y).

A pixel coordinate system is a coordinate system in pixels. The pixel coordinate system is also a two-dimensional coordinate system, and the directions of its X-axis and Y-axis are consistent with the image coordinate system, and the origin of the coordinate system is located in the upper left corner of the two-dimensional image. The general representation of a pixel is a square or a rectangle, and the information stored in each pixel is the intensity or gray value of the pixel.

B. Coordinate system transformation

In subsection A, each coordinate system and the representation of the point in the coordinate system have been defined. In this subsection, the entire process of pinhole imaging is deduced in the form of a mathematical model. The whole process is divided into three steps. The process of changing from the world coordinate system to the camera coordinate system involves the external parameters of the camera. The process of converting from the camera coordinate system to the image coordinate system and then to the pixel coordinate system involves the camera's internal parameters. usage of. This section derives the camera's projection process in the form of camera intrinsic and extrinsic parameters.

(1) Camera external parameters

Conversion of the world coordinate system (Xw, Yw, Zw) and the camera coordinate system (Xc, Yc, Zc): The camera coordinate system is an unstable coordinate system, because the position of the origin and each coordinate axis will change as the camera moves direction. In the process of 3D reconstruction or camera positioning, a stable and unchanging world coordinate system is required. In the world coordinate system, all coordinates can be unified.

Suppose point P is a point in three-dimensional space, its position in the camera coordinate system is Pc , and its position in the world coordinate system is Pw. Pw and Pc can be converted to each other by a transformation matrix, which can be subdivided into a rotation matrix (R) and a translation matrix (t). Its mathematical expression is

​                                                                                                    (1)

where R has three degrees of freedom and is a 3×3 matrix representing the rotation of the camera in the world coordinate system. t​ Represents the origin of the camera relative to the world coordinate system or the origin of the coordinate system at the beginning of the camera, which is a 3×1 matrix. Expanding the above formula into a specific form is:

​                                                                               (2)

Its homogeneous coordinate form can be expressed as:

​                                                                              (3)

This can be simplified to:

​                                                                                         (4)

where ​ is called the camera's extrinsic parameter matrix, denoted by T.

(2) Camera internal parameters

Conversion of camera coordinate system (Xc, Yc, Zc) and image coordinate system (x, y): The camera coordinate system is a three-dimensional coordinate system, the image coordinate system is a two-dimensional coordinate system, and there is a dimensionality reduction process from three-dimensional to two-dimensional , mainly because the dimension of depth is lost, and the coordinate vector is converted from three-dimensional to two-dimensional.

As shown in Figure 2, there is any point ​ in the camera coordinate system, which is arbitrary, and the coordinate point ​ on the image corresponds to this three-dimensional point. In Section A, we know that the X​ and Y​ axes of the two coordinate systems are the same, so the image point p​ can add a dimension f​ to form the coordinates of the camera coordinate system , where f​ is the focal length of the camera.

Figure 2 Schematic diagram of the conversion between the camera coordinate system and the image coordinate system

From the similarity relationship of the triangles, the following formula can be obtained:

​                                                                                                  (5)

After arranging the above formula, we can get:

   ​                                                                                                (6)

Convert it to a homogeneous coordinate system as:

​                                                                               (7)

So far, the transformation mathematical model of camera coordinate system and image coordinate system has been deduced.

​Conversion of image coordinate system and pixel coordinate system : The core difference between these two coordinate systems is that their units are inconsistent. Both the image coordinate system and the pixel coordinate system are two-dimensional coordinate systems, and their representation units are respectively physical units represented by centimeters, meters, etc., and pixel units based on pixels. As shown in Figure 3, according to the definition in Section A, the center of the image is the coordinate origin of the image coordinate system, and the upper left corner of the image is the coordinate origin of the pixel coordinate system.

Figure 3 Schematic diagram of conversion between image coordinate system and pixel coordinate system

​Assume ​ is the coordinates of the pixel coordinate system, representing the center of the entire image. The representation of a pixel is usually a rectangle or a rectangle, so the length and width of a pixel are represented as respectively. Then according to the above-defined symbols, any point in the image coordinate system can be in a one-to-one correspondence with any point in the pixel coordinate system. Its mathematical conversion form is:

​                                                                                                  (8)

This form of addition is not conducive to the calculation of the computer, so similar to the above calculation using the homogeneous coordinate system, the homogeneous coordinate does not change the degree of freedom of the vector while increasing the dimension of the vector. The above formula can be converted into:

​                                                                                          (9)​
Equation (9) converts the addition operation in the vector into a multiplication operation, which is convenient for the operation of the computer. The mathematical model of the conversion between the image coordinate system and the pixel coordinate system is obtained by formula (2-9).

Through equations (7) and (9), the entire projection process of a point in a three-dimensional space from the camera coordinate system to the pixel coordinate system can be obtained, and its mathematical form is:

​                                                                        (10)

By integrating the intermediate transformation matrix, we can get:

​                                                                              (11)

where ​ It is called the camera's internal parameter matrix and is represented by .

​Through the deductive derivation of the above formula, you can get a camera's internal parameter matrix , the camera's internal parameter matrix ​ has 4 unknown data, and these four data are related to the structure of the camera. ​ is a short form of ​, ​ is the focal length of the camera, and ​ is the length and width of the unit pixel. ​ is the unknown coordinate in the middle of the image in the pixel coordinate system. Usually the internal parameters of the camera have been calibrated when the camera leaves the factory and are known quantities.

(3) Combination of internal and external parameters

Through the internal and external parameters of the camera, all the transformations between the four coordinate systems in the camera model can be linked. The mathematical form is:

​                                                                   (12)

From the formula (12), it can be known that the three-dimensional spatial point can be projected into the pixel space of the image through the internal and external parameters of the camera. The internal parameters of the camera are generally known, and the scene in the image can be restored to the three-dimensional space by knowing the external parameters of the camera and the depth of the pixel in the image coordinate system. The process of solving the camera extrinsic parameters is actually a process of positioning the camera, which is usually called visual odometry. Solving for image depth is called depth estimation.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值