Projection Matrix

Deriving Projection Matrices

Of the basic matrix transforms in any 3D graphics programmer's toolkit, projection matrices are among the more complicated. Translation and scaling can be understood at a glance, and a rotation matrix can be conjured up by anyone with a basic understanding of trigonometry, but projection is a bit tricky. If you've ever looked up the formula for such a matrix, you know that common sense isn't enough to tell you where it came from. And yet, I haven't seen many resources online that will describe just how one derives a projection matrix. That is the topic I will address in this article.

For those of you just starting out in 3D graphics, I should mention that understanding where a projection matrix comes from may be a matter of curiosity to the mathematically inclined among us, but it's not a necessity. You can make do with just the formula; and if you're using a graphics API like Direct3D that will build a projection matrix for you, you don't even need that. So, if the details of this article seem a little overwhelming, fear not. As long as you understand what projection does, you needn't concern yourself with how it works if you don't want to. This article is for programmers who like to know a bit more detail than is strictly necessary.

Overview: What Is Projection?

A computer monitor is a two-dimensional surface, so if you want to display three-dimensional images, you need a way to transform 3D geometry into a form that can be rendered as a 2D image. And that's exactly what projection does. To use a very simple example, one way to project a 3D object onto a 2D surface would be to simply throw away the z-coordinate of each point. For a cube, that might look something like Figure 1.

Figure 1: Projection onto the xy plane by discarding z-coordinates.

Of course, this is overly simple and not particularly useful in most cases. For starters, you will not be projecting onto a plane at all; rather, your projection formulae will transform your geometry into a new volume, called the canonical view volume. The exact coordinates of the canonical view volume may vary from one graphics API to another, but for the purposes of this discussion, consider it to be the box that extends from (–1, –1, 0) to (1, 1, 1), which is the convention used by Direct3D. Once all your vertices have been mapped into the canonical view volume, only their x- and y-coordinates are used to map them to the screen. The z-coordinate is not useless, however; it's typically used by a depth buffer for visibility determination. This is the reason you transform into a new volume, rather than project onto a plane.

Note that Figure 1 also depicts a left-handed coordinate system, where the camera is looking down the positive z-axis, with the y-axis pointing up and the x-axis pointing to the right. This is again a convention used by Direct3D, and one I'll use throughout the article. None of the calculations are significantly different for a right-handed coordinate system, or for a slightly different canonical view volume, so everything discussed will still apply even if your API of choice uses different conventions than those used by Direct3D.

With that, you can get into the actual projection transforms. There are quite a few different methods of projection out there, and I will cover two of the most common: orthographic and perspective.

Orthographic Projection

Orthographic projection, so called because all the lines of projection are perpendicular to the eventual drawing surface, is a relatively simple projection technique. The view volume—that is, the region of eye space that contains all the geometry you want to display—is an axis-aligned box that you transform into the canonical view volume, as shown in Figure 2.

Figure 2: Orthographic projection.

As you can see, the view volume is defined by six planes:

Because the view volume and the canonical view volume are both axis-aligned boxes, there is no correction for distance in this type of projection. The end result is, in fact, a lot like the result in Figure 1 where you just dropped the z-coordinate of every point. Objects of the same size in 3D space appear the same size in the projection, even if one is much further from the camera than the other. Lines that are parallel in 3D space remain parallel in the final image. Using this kind of projection would be out of the question for something like a first-person shooter—imagine trying to play one of those without being able to tell how far away anything is!—but it does have its uses. You might use it in a tile-based game, for instance, especially one where the camera is positioned at a fixed angle. Figure 3 shows a simple example.

Figure 3: A simple example of orthographic projection.

So without further ado, start to figure out how this is going to work. The easiest approach may be to consider each of your three axes separately, and compute how to map points along that axis from the original view volume into the canonical view volume. You begin with the x-coordinate. A point within your view volume will have an x-coordinate on the range [l, r], and you want to transform it to the range [–1, 1].

Now, in preparation to scale the range down to the size you want, you subtract l from all terms to produce a zero on the left-hand side. Another approach you could take here would be to translate the range so that it centers on zero, rather than having one of its endpoints at zero, but the algebra is a bit neater this way, so I'll do it like this for the sake of readability.

Now that one end of your range is positioned at zero, you can scale it down to the size you want. You want the range of x-values to be two units wide, from 1 to –1, so you multiply through by 2/(r – l). Note that r – l is the width of your view volume and is thus always a positive number, so you don't have to worry about the inequalities changing directions.

Next, you subtract one from all terms to produce your desired range of [–1, 1].

A bit of basic algebra allows you to write the center term as a single fraction:

Finally, you split the center term into two fractions so that it takes the form px + q; you need to group your terms this way so that the equations you derive can be easily translated into matrix form.

The center term of this inequality now gives you the equation you need to transform x into the canonical view volume.

The steps required to obtain a formula for y are exactly the same—just substitute y for x, t for r, and b for l—so rather than repeat them here, I'll just show the result:

Finally, you need to derive a formula for z. It's a little different in this case because you're mapping z to the range [0, 1] rather than [–1, 1], but this should look very familiar. Here's your starting condition, a z-coordinate on the range [n, f]:

You subtract n from all terms so the lower end of the range is positioned at zero:

And now, all that's left is to divide through by f – n to produce a final range of [0, 1]. As before, note that f – n indicates the depth of your viewing volume and thus will never be negative.

Finally, you split this into two fractions so it takes the form pz + q:

This gives you your formula for transforming z:

Now, you're ready to write your orthographic projection matrix. To recap your work thus far, here are the three projection equations you've derived:

If you write this in matrix form, you get:

That's it! Direct3D provides a function called D3DXMatrixOrthoOffCenterLH() (what a mouthful!) that constructs an orthographic projection matrix based on this same formula; you can find it in the DirectX documentation. The "LH" in that unwieldy function name refers to the fact that you're using a left-handed coordinate system. But, what exactly does "OffCenter" mean?

The answer to that question leads you to a simplified form of the orthographic projection matrix. Consider a few points: First, in eye space, your camera is positioned at the origin and looking directly down the z-axis. And second, you usually want your field of view to extend equally far to the left as it does to the right, and equally far above the z-axis as below. If that is the case, the z-axis passes directly through the center of your view volume, and so you have r = –l and t = –b. In other words, you can forget about r, l, t, and b altogether, and simply define your view volume in terms of a width w, and a height h, along with your other clipping planes f and n. If you make those substitutions into the orthographic projection matrix above, you get this rather simplified version:

This equation is implemented by the Direct3D function D3DXMatrixOrthoLH(). You can almost always use this matrix instead of the more general, "off center" version that you derived above, unless you're doing something strange with your projection.

One further point before you finish this section. It's instructive to note that this matrix can be represented as the concatenation of two simpler transforms: a translation followed by a scale. This should make sense to you if you think about it geometrically because all you're doing in an orthographic projection is shifting points from one axis-aligned box to another; the viewing volume doesn't change its shape, only its position and its size. Specifically, you have:

This product form of your projection is perhaps a bit more intuitive because it lets you more easily visualize what's happening. First, the viewing volume is translated along the z-axis so that its near plane coincides with the origin; then, a scale is applied to bring it down to the dimensions of the canonical view volume. That's easy enough to understand, right? The matrix for an off-center orthographic projection also can be represented as the product of a transformation and a scale, but it's similar enough to the result shown above that I won't list it here.

That about wraps it up for orthographic projections, so now you can move onto something a little more challenging.

Perspective Projection

Perspective projection is a slightly more complicated method of projection, and more frequently used because it creates the illusion of distance and thus produces a more realistic image. Geometrically speaking, the difference between this method and orthographic projection is that in perspective projection, the view volume is a frustum—that is, a truncated pyramid—rather than an axis-aligned box. You can see this in Figure 4.

Figure 4: Perspective projection.

As you can see, the near plane of the view frustum extends from (l, b, n) to (r, t, n). The extents of the far plane are found by tracing a line from the origin through each of the four points on the near plane until they intersect the plane z = f. Because the view frustum gets increasingly wider as it extends further from the origin; and because you're transforming that shape into the canonical view volume, which is a box; the far end of the view frustum is compressed to a greater extent than the near end. Thus, objects further back in the view frustum are made to appear smaller, and this gives you the illusion of distance.

Because the shape of the volume changes in this transformation, perspective projection can't be expressed as a simple translation and scale like orthographic projection. You'll have to work out something a bit different. But, that doesn't mean that the work you did on orthographic projection is useless. A handy problem-solving technique in mathematics is to reduce a problem to one that you already know how to solve. So, that's what you can do here. The last time, you examined one coordinate at a time, but this time you'll do the x- and y-coordinates together, and then worry about z later on. Your plan of attack for x and y can be broken down into two steps:

Step 1: Given a point (x, y, z) within the view frustum, project it onto the near plane z = n. Because the projected point is on the near plane, its x-coordinate will be on the range [l, r], and its y-coordinate will be on the range [b, t].

Step 2: Using the formulae you derived in your study of orthographic projection, map the new x-coordinate from [l, r] to [–1, 1], and the new y-coordinate from [b, t] to [–1, 1].

Sound good? Then, have a look at Figure 5.

Figure 5: Projection of a point onto z = n using similar triangles.

In this diagram, you've drawn a line from a point (x, y, z) to the origin, and noted the point at which the line intersects the plane z = n—it's the one marked in black. From those points, you drop two perpendiculars to the z-axis, and suddenly you have a pair of similar triangles. If you've suppressed your memories of high school geometry, similar triangles are triangles that have the same shape but are not necessarily the same size. To show that two triangles are similar, it is sufficient to show that their corresponding angles are equal, and it's not hard to do that in this case. Angle 1 is shared by both triangles, and obviously it's equal to itself. Angles 2 and 3 are corresponding angles made by a traversal intersecting two parallel lines, so they are equal. And, right angles are of course equal to each other, so your two triangles are similar.

The property of similar triangles that you're interested in is that their pairs of corresponding sides all exist in the same proportion. You know the lengths of the sides that lie along the z-axis; they're n and z. That means that the other pairs of sides also exist in the ratio n / z. So, consider what you know. By the Pythagorean theorem, the perpendicular from (x, y, z) down to the z-axis has the following length:

If you knew the length of the perpendicular from your projected point to the z-axis, you could figure out the x- and y-coordinates of that point. But, that's easy! Because you have similar triangles, the length is simply L multiplied by n / z:

Thus, your new x-coordinate is x * n / z, and your new y-coordinate is y * n / z. Thus concludes Step 1. Step 2 simply called for you to perform the same mapping you did in the last section, and so it's time to revisit the formulae you derived in your study of orthographic projection. Recall that you mapped our x- and y-coordinates into the canonical view volume like so:

You now can invoke these same formulae again, except you need to take your projection into account; so, you replace x with x * n / z, and y with y * n / z:

Now, you multiply through by z:

These results are a bit odd. To write these equations directly into a matrix, you need them to be written in this form:

But clearly, that's not going to happen right now, so it looks like you're at a bit of an impasse here. What to do? Well, if you can find a way to get a formula for z'z like you have for x'z and y'z, you can write a matrix transform that maps (x, y, z) to (x'z, y'z, z'z). Then, you'll just divide the components of that point by z, and you'll end up with (x', y', z'), which is what you wanted.

Because you know that the transformation of z into z' does not depend on x or y in any way, you know that you want a formula of the form z'z = pz + q, where p and q are constants. And, you can find those constants pretty easily because you know how to get z' in two special cases: Because you're mapping [n, f] to [0, 1], you know that z' = 0 when z = n, and z' = 1 when z = f. When you plug the first set of values into z'z = pz + q, you're able to solve for q in terms of p:

Now, you plug in the second set of values, and get:

Substitute your value for q into that equation, and you can easily solve for p:

Now that you have a value for p, and you found earlier that q = –pn, you can solve for q:

Finally, if you substitute these expressions for p and q back into your original formula, you get:

You're nearly finished now, but the unusual nature of your approach to this problem requires that you do something with the homogeneous coordinate w. Normally, you're simply content to set w' = 1—you've probably noticed that the bottom row in a basic transform is almost always [0 0 0 1]—but now you're writing a transform to the point (x'z, y'z, z'z, w'z), and so instead of writing w' = 1, you write w'z = z. Thus, the final set of equations you'll use for perspective projection are:

And, when you write this set of equations in matrix form, you get:

When you apply this to the point (x, y, z, 1), it yields (x'z, y'z, z'z, z). But then, you apply the usual step of dividing through by the homogeneous coordinate, and so you end up with (x', y', z', 1). And that's perspective projection. Direct3D implements the above formula in the function D3DXMatrixPerspectiveOffCenterLH(). As with orthographic projection, if you assume that the view frustum is symmetrical and centered on the z-axis (meaning that r = –l and t = –b), you can simplify things considerably by writing the matrix in terms of the view frustum's width w and its height h:

Direct3D has a function for this matrix as well, called D3DXMatrixPerspectiveLH().

Finally, there's one more representation for perspective projection that often comes in handy. In this form, rather than worrying strictly about the dimensions of the view frustum, you define it based on the camera's field of view. See Figure 6 for an illustration of this concept.

Figure 6: The view frustum's height defined in terms of the vertical field of view angle a.

The vertical field of view angle is a. This angle is bisected by the z-axis, so with a bit of basic trigonometry, you can write the following equation that relates a to the near plane n and the height of the screen h:

This expression allows you to replace the height in your projection matrix. Furthermore, you replace the width with the aspect ratio r, defined as the ratio of the width of the display area to its height. So, you have:

Thus, you have a perspective projection matrix in terms of the vertical field of view angle a and the aspect ratio r:

In Direct3D, you can get a matrix of this form by calling D3DXMatrixPerspectiveFovLH(). This form is particularly useful because you can just set r to the aspect ratio of the window you're rendering in, and a field of view angle of p / 4 is usually fine. So, the only things you really need to worry about defining are the extents of the view frustum along the z-axis.

Summary

That's about all you need to know about the mathematics behind projection transforms. There are some other, lesser-used projection methods out there, and of course things are slightly different if you use a right-handed coordinate system or a different canonical view volume, but you should be able to figure out those formulae easily by using the results in this article as a base. If you want more information on projection and other transforms, take a look at Real-Time Rendering by Tomas Moller and Eric Haines; or Computer Graphics: Principles and Practice by James D. Foley, Andries van Dam, Steven K. Feiner, and John F. Hughes; these are two excellent books on computer graphics that I referred to in writing this article.

If you have any questions about this article, or spot any corrections that need to be made, you can contact me via PM on the CodeGuru forums, where I go by the name Smasher/Devourer. Happy coding!



3D视觉---投影矩阵的介绍
2011-05-02 15:00

 自己在学习计算机视觉中的3D建模的时候,在网上几乎找不到汉语的资料。所以想自己写点,防止将来忘记。如果各位看官发现有什么错误,我会随时改正。作者----林宝尉

今天先来记下projectionmatrix,就是投影矩阵P

1.理想针孔模型

在介绍P之前,我们需要假设我们有理想的针孔照相机。在这种理想的针孔模型中,我们可以完全模拟不失真的刚体运动。假设XcameraframeX'worldreference frame。 也就是说,我们要通过针孔模型,将实际物体X'投影到照相机X上。如图所示。

 

所以我们可以得到x=-f*X'/Z'y=-f*Y'/Z'

在这我们不需要关心负号,实际上,这个负号只是代表了投影之后我们得到的方向。所以有 x=f*X'/Z'y=f*Y'/Z'。(*注意,我在homogeneouscoordinates也就是齐次笛卡儿坐标中也使用x来表示向量[f*X'/Z'f*Y'/Z'1]T,T表示转置矩阵。在3D视觉中,我们会经常遇到齐次笛卡儿坐标)

在理想针孔模型中,刚体的运动可以用g=RT)来表示。用之前的XX'来表示就是

X=RX'+T

我已经知道x=[x,y]T=f/Z[X,Y]T,(这里x表示投影后二维的点,XYZ表示投影前的点的三维空间坐标)在homogeneouscoordinates中可以写成



我们经常称ZP的深度,所以我们将Z写成任意的正数d

然后我们可以分解矩阵如下:



我们叫kf为       。

 

    为P'(标准投影矩阵)。

 

又因为X=X'R+T


所以


所以我们得到


相当于rx = Kf P' X= Kf P' g X' (在理想模型中Kf=1


2.相机的内部参数

之前我们讨论的是理想的针孔模型。但是实际上由于技术的局限性,我们并没有完美的针孔模型的照相机。也就是说,在照相机中存在内部技术参数。下面就大体介绍下内部参数具体都是什么。

在实际操作中,我们将得到的照片的左上角成为照片的原点(00.那么被投影到照片上的点,距离照片的原点还有一段距离(dxdy)。另外,在实际成像的时候,理论上的点(xy)需要经过比例变换才能得到真正的CCD(照相机的术语,成像因素)点坐标。

所以我们可得

在这里 叫做比例变换矩阵---scalingmatrix。

 

所以在点坐标中,最终得到的点(x'y')为

x'=xs+dx

y'=ys+dy.

在这,x'是真实的图片坐标,而x是理想的图片坐标。

所以     。 

 

有的时候,CCD的点并不是正方形,会有一定的扭曲。这种扭曲角度我们用q来表示。所以比例变换矩阵变成

 

所以我们得到

 

所以 

 

 

所以   

 

 

这里,Kintrinsic parameter matrix或者calibration matrix

KP'是我们所要的projection matrix

 

至于projectionmatrix的计算,将在实质矩阵以及基础矩阵中一起写出来。(原理一样地!)

 

教材参考:    [1] Yi Ma, Stefano Soatto, Jana Kosecka, S. Shankar Sastry, An Invitation to 3~D Vision From Images to Geometric

                          Models.ISBN 0-387-00893-4, Springer.

                     [2] Richard Szeliski, Computer Vision Algorithms and Applications, Springer.

                     [3] Marc Pollefeys,Visual 3D Modeling from Images, Tutorial Notes.


  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值