Virtual Camera

 

 

 

http://antongerdelan.net/opengl/virtualcamera.html

 

Virtual Camera

We would like to be able to navigate through our scene in 3d. To do this we need to set up a transformation pipeline which will move all of our points relative to the position and orientation of a virtual camera, or view point. We are also going to define how to project our scene with a perspective, rather than the orthogonal (squared) view that we have been using so far. This is going to make it look much more realistic, and create an illusion of depth. We can also add a few more keyboard controls for our camera.

The Transformation Pipeline

We have already seen how we can position our objects using affine transforms, or a "model matrix". When we first create our object, or load a mesh, the vertex points are given, relative to some [0,0,0] local origin. In this coordinate system, we talk about the points being in local coordinate space. If we had 2 triangles, for example, created from the same points, and moved them both to different positions by multiplying them with a model matrix, then we can imagine that they are now relative to some world origin, and are therefore in a world coordinate space. In world space, we can compare the distances or angles between objects. You can imagine world space being a kind of description of the layout of all of the scenery in a computer game world. In order to "move around" in this world we need to rearrange to coordinates relative to a camera position and orientation. We would also like to create the illusion of depth with perspective. This adds a few new steps to our transformations before they are output from the vertex shader.


We typically manipulate the transformation pipeline in the vertex shader. When rendering each vertex of a mesh we position and orient it in the world by multiplying it with a model matrix, then transform it relative to the location and orientation of the camera by multiplying it with the view matrix, then add some depth perception by multiplying it again with the perspective matrix. We usually output it from the vertex shader at this point, and the remaining steps are done automatically. 

Local Space to World Space

In the image above, the first 2 boxes describe our transition from local coordinates to world coordinates. The example uses a character mesh, rather than a triangle, and it has a local origin at the base of the feet. There is a vertex point on one of the hands with value [0.5, 1, 0]. All of the points are multiplied by a model matrix, which moves our character into the world (relative to the world origin). In the second box, you can see that the point on the hand now has the value of [10.5, 1, -20, 1]. This means that it was translated by the model matrix by distance [10, 0, -20]. And, the 1 on the end tells us that we used a 4X4 homogeneous matrix to do this - remember that the 1 means "this is a point that should be translated".

World Space to Eye Space

Now imagine that we want to view the world from the side, rather than straight-on, as we have been doing. This is the first part of our virtual camera idea. What do we need to do to achieve this? We need to give the camera a position and orientation in world coordinates.


A break-down of the transition between world space and eye space. These steps are usually combined into a single view matrix, often by using a lookAt() style of functon.

In the diagram, the camera has a world position of [20, 2, -15]. In world coordinates, the camera is oriented 90 degrees on the Y axis. According to the camera, in eye coordinates, it's own forward direction is the -Z axis, and the point on the character is 5 units to the right, 1 unit below, and 9.5 units ahead, or [5, -1, -9.5]. To map between these two spaces you can see that we just need to subtract the camera's world position, giving us [-9.5, -1, -5], and rotate in the opposite direction (-90 degrees around the y-axis), giving us [5, -1, -9.5].

We can combine these operations into a view matrix, and write a function replicating the old OpenGL lookAt(), where we give it the camera position, a target to look at, and the "up" direction, which tells us how the camera should be pitched or rolled. Most maths libraries for graphics will have a function that does this.


Where U is a unit vector pointing up-wards, F forwards, R right, and P is the position of the camera, all in world coordinates. The U vector must change as the camera pitches forwards and back, or rolls to either side.

Now, to construct a matrix combining a rotation and a translation, we can not simply plug the rotation and translation components into a single matrix. We must create 2 separate matrices, and multiply them together to combine them. 

To combine our 2 matrices into a final view matrix, in column-major layout, we do not say V = T × R, because this would rotate the points around the world origin, and then move them such that the camera position is at the origin - remember that the order of operations in column-major notation is right-to-left. We want to translate such that the camera is at the origin, and then rotate them around the camera, so we say V = R ^times; T. In row-major layout, this is the other way around, of course. The example below shows how to do this for a bird's eye (overhead) view camera.


An example of creating a bird's eye view matrix. The forward vector points -Y, and is negated to +Y, the up vector now points -Z, and the position will need to be set every time that the camera moves.
lookAt()

Older OpenGL implementations had a built-in viewing transform, so we didn't need to create a matrix manually. A common function for building the transform was gluLookAt() which took a camera position called eye, a target to look at called center, and a vector pointing upwards, up, which determined the roll of the camera.

If you would like to construct your own lookAt (camera_position, target_position, up_direction) function, read on.

First construct a translation matrix T, remembering that the translation is the negated world position of the camera.

Next work out the 3d distance d, between the view target and the camera by subtracting the camera position from the target position:

d = [target x - camera x, target y - camera y, target z - camera z]

If we normalise this distance vector, we turn it into our forwards direction vector with a length of 1 unit. To normalise a vector, divide each component by the magnitude of the vector. The magnitude is:

m = sqrt (x * x + y * y + z * z)
f̂ = [x / m, y / m, z / m]

Note that you should never normalise a 4d vector because the w component (1 or 0) isn't really part of the geometric vector - we just use it to differentiate between points and directions.

Our right-pointing direction vector , is the vector cross product of the forward vector and the upwards-pointing vector, . You can find the formula for the cross product on my 3d maths cheat sheet if you need it.

r̂ = f̂ × û

We re-calculate the up vector for consistency:

û = r̂ × f̂

And we can now plug the right, up, and negated forwards vectors into a new rotation matrix, R. The bottom row and right-most column should be zeros, except the bottom-right cell, which should be 1.

We can now combine our two matrices together to form a single view matrix.

Note: do not forget to update the "up" input parameter as your camera pitches and rolls. If you do you can warp the perspective slightly, or if you let the forward vector become equal to the up vector, then you lose the ability to rotate sensibly around the 3 axes.

Eye Space to Homogeneous Clip Space

If the first part of our virtual camera is the view matrix, then the second part is the projection matrix. The projection matrix does 3 things;

  1. Define near and far clipping planes - this defines the visible range along the Z axis. Anything in front of the "near" distance or behind the "far" distance is out of the visible range. Geometry outside this range will be "clipped" - removed from the hardware pipeline in a later stage.
  2. Define a field of view (FOV). This is actually not a field at all, it's an angle. You might remember that games like Quake used a 90 degree horizontal field of view on the standard 4:3 displays. Our field of view is actually for the vertical range, so it's going to be a bit smaller - most modern games use 66 or 67 degrees for this. The horizontal angle will be calculated by our matrix when we give it the aspect ratio of the viewport - how wide compared to how high. Note that on a 4:3 display 67 * 4/3 = 89.33 degrees.

    With the clipping planes, and the angle of view, we have created a volume - the shape of a pyramid with the top chopped off. This volume is called the viewing frustum. Any geometry outside the frustum will be out of the visible range. Interestingly, there is more area visible in the end than in the front. How can we represent that on the 2d screen?

  3. The last job is to tell the next stage of the hardware pipeline how to calculate perspective. The fat ends of the frustum will be pushed together into a box shape - the large collection of things visible at wide end will be squeezed together towards the middle of the screen. This gives us our perspective. Imagine looking along railway tracks and observing as the rails become smaller, and closer together in the distance.

The frustum covers a volume of the scene, and will be compressed into a box to give us perspective. The angle up and down is determined by the field of view given, the position along the Z axis of the near and far planes by a distance each, and the width and height of the near plane by the aspect ratio of the view.

The aspect ratio is the width of the viewport divided by the height of the viewport. near and far clipping distances are just eye coordinate distances in the direction of the camera. Typical values are 0.1 for near and 100.0 for far. If your units are in meters then this means that you can see for 99.9 meters. You can not set the near clip distance to 0. In practise, you want to make the planes as close together as possible without ruining the scene, as it will give you more numerical accuracy and reduce artifacts (errors) with several effects that use depth comparison.


Note that most of what the projection matrix does is scaling (the main diagonal is used), but taking field [angle] and aspect of the view into account. The Pz component adjusts points' range between clipping planes, and the -1 on the bottom row swaps the vector's fourth (w) component, with its negated third (z) component, which will later be used for perspective division.

The image above gives us the layout of a projection matrix, where:
Sx = (2 * near) / (range * aspect + range * aspect)
Sy = near / range
Sz = -(far + near) / (far - near)
Pz = -(2 * far * near) / (far - near)
range = tan (fov * 0.5) * near

Note that the input field of view to the matrix is actually the angle from the horizon to the top of the frustum, so we divide our full fov by 2 here.

Additional Stages in the Transformation Pipeline

We now have all we need to create a virtual camera that can move through a scene. We can optionally use geometry shaders and tessellation shaders here to further modify our geometry (see this diagram), but we will come back to these in later chapters. After the vertex/geometry/tessellation shaders finish there are a few additional transformation stages. These are fixed function stages - we do not write any code to define them; they will be computed automatically. Why do we need to know about these extra steps? When we go in reverse, of course. For example, we might want to click with the mouse cursor, and determine if it intersections with and object in world space.

Perspective Division

Perspective division is not generally explained very well, so I think it's worth doing here, because it's actually quite simple. You may have heard that it is "divide by w", and been left scratching your head because didn't we make w equal to 1? Yes we did, but something has happened to it after multiplying it by the projection matrix.

If you look at the bottom row of the projection matrix you see it has (0, 0, -1, 0). None of our other 4X4 matrices had anything in the bottom row, except the 1 in the corner - we just used it to indicate if it was a point or a direction vector. Now we see a second use of the fourth vector component, and the 4X4 matrix. Recall the matrix*vector function:

Our matrix' bottom row is 0, 0, -1, 0. This means that multiplication gives us 0x + 0y + -1 * z + 0 * w. This has the effect of making the fourth component equal to -z. A positive version of the depth.

After the vertex shader finishes, the hardware pipeline takes our point, and computes a perspective division. Perspective division is:

v' = [x / w, y / w, z / w];

We are dividing the x, y, and z values by a positive depth value. The larger the depth, the smaller the x and y values will become, pushing them closer to the centre of the view, and creating our railway tracks illusion. Because our projection matrix scaled our x and y values into a range +- the range of between the clipping planes, then this step will also scale our x,y,z coordinates into a range where -1:1 are the boundaries of the view - it has changed from a frustum shape to a cube. 

You can try this out - load up a working demo, and change the value of gl_Position. So far, we have left the w component as 1.0, which will have no effect (divide by 1). Try making it smaller - perhaps 0.5. Your geometry should stretch out, giving it a "zoomed in" look. Make it 2.0 and your shape should be squished down, making it look farther away.

NDS

The diagram at the beginning of this article gives us 2 additional stages that happen after clip space. We already know about perspective division. The points are then normalised into a box shape with dimensions [-1 to 1, -1 to 1, -1 to 1]. The w component is discarded. This is called Normalised Device Space (NDS). The dimensions of the box are unit-sized (hence the "normalised" name), so they can be multiplied to scale to any viewport dimensions.

2d

The last major step is to completely flatten the normalised box, so that the objects farther away are drawn behind objects closer to the camera. The dimensions need to be scaled from the normalised box X and Y to the actual pixels of the 2d viewport size. We now have a 2d area called Viewport Space.

Don't Mix Up Coordinate Systems

You can not compare vectors from different coordinate spaces. It is not valid to get the distance between a point in eye space, and a point in world space. Raise the world space point to eye space first. This is a very common source of error in shader programmes. The best way to avoid this is to postfix or prefix variable names with the space that they are in. For example: vec4 point_eye = view_matrix * point_world;.

Reversing

You can reverse any transformation by multiplying a point by the inverse of a matrix. That is, to go from eye space to world space, you can multiply a point in eye space by the inverse view matrix. You can work your way all the way back down the pipeline from viewport space. We do this for things like casting a ray by clicking with the mouse. There is a GLSL function inverse() to do this, but generally we would compute the inverses in C, so we would need an inverse function in our matrix maths library.

Virtual Camera Demo with Keyboard Controls

Start with a minimal demo. You can build on to an existing one, but it will look a bit confusing if your triangle is moving as well as the camera. First, let's define a speed of movement for the camera, and a speed that it will rotate at.

We also need to keep track of the world position and world orientation of the camera. I'm going to simplify this demo by saying that the camera can only rotate around the Y axis. I'm being very careful not to start my camera at world position (0,0,0). This is because our triangle shape is at Z position 0, and we will have a near cut-off distance, so better to start the camera back a bit, or the shape will be clipped from view.

Camera Variables

I'm going to create an initial view matrix. At this point I used my little maths library ; maths_funcs.h, maths_funcs.cpp, but if you have your own matrix * matrix code, and you are happy making a Y rotation matrix and a translation matrix, then you can use that instead. The matrices here are just 1d arrays of floats. I could use a lookAt() style of function, but in this case I'm going to manually create a translation matrix, and a rotation matrix, and combine them together to build my view matrix. 

Remember that we are negating both the camera position, and its orientation to build this matrix. This is for column-order matrices. Row-order has multiplication around the other way. Don't mix up the order of multiplication here, or you'll get weird results when you start moving and rotating the camera.

Next we can set up a projection matrix. We need to define some input variables for that, and work out the values that get plugged-in to the matrix. You would usually use a maths library projection() function to do this, but I think it's worth doing by hand here as an example. Keep in mind that the tangent maths function uses radians, not degrees, so we need to do a conversion here. I made a pre-processor definition to do that, so I don't have to work it out every time. When I calculate the aspect ratio, I make sure to cast the view dimensions as floats, because we don't want to do integer division here.

Now we can plug our values in to an array. Remember that it's going to look a bit funny because the array indices are counting down each column:

Modify Vertex Shader

Add uniforms to use the 2 new matrices:

Then multiply your vertex point by the projection and view matrices. The order here is very important. Once again, this is column-major multiplication. Row order is reversed.

If you are also using a model matrix to move your geometry, then it should look like this:

Where you can change model to the name of your existing matrix.

Update Uniforms

By default, matrix uniforms are zero in the shaders. We don't want that, it will kill our geometry. Let's intialise the uniforms to our initial matrix values. Note that I was using my maths library for the view matrix. The view_mat.m there is retrieving the array of floats from the mat4 structure that I made. The projection matrix was just an array of floats, so no need to do that there.

Keep track of the location of each matrix. We won't need to change the projection matrix unless the view is resized. In a larger programme, you should use GLFW's window resize call-back to tell you when the view has changed size, and recalculate your projection matrix. If you forget to do this the view will become warped when the window resizes.

You can test it now. It should look very much the same as before, because we have not moved or rotated the camera. If you do not see a triangle any more, you can remove either the projection or view matrix from the gl_Position = proj * view * vec4 (vertex_position, 1.0); line in the vertex shader to isolate which matrix is the problem.

Keyboard Controls

We can use GLFW's keyboard manager to see if a button was pressed. I am going to use WASD for moving forwards, backwards, and sliding left and right. I will use the left and right arrow keys to turn around. It's not going to change the direction of movement as you turn, but you could calculate this too if you like. I added pgup/pgdwn for moving vertically. I don't recalculate my view matrix straight away, but I raise a flag. This is just a small optimisation in case more than 1 button is pressed. This code should go inside the main loop.

Common Errors

3d programming tip:. The first thing to do is draw a diagram of your scene on paper. Where is the camera? Where is the geometry? Is the camera in front or behind the geometry? How big is the geometry? Is it pointing the right way? Then you can compare your hand-calculations with what you wrote in the code. This will help you spot 9/10 mistakes in 3d programming. If you can't draw the problem, yet you've started coding then you need to go back to the theory.

The most common mistakes from the lab:

  • I see a white shape and the camera isn't working! - Check your shader logs for errors. Check if your glGetUniformLocation() calls returned less than 0.
  • Error in matrix layout. Perhaps the wrong cell was used, or a small typo. Test by substituting your code with a function from a library.
  • Camera is too close to the object (i.e. both at Z position 0). Remember the near plane will cut off anything in front of a certain distance. If your shape is at 0, try starting the camera at 2.
  • Camera is pointing backwards. Check your -Z and Z values.
  • Camera is above or below or too far either side of object. Check camera position versus extents of object.
  • Matrix rotations in wrong order. There are several matrix rotations. Check the projection, model, view in the shader, as well as the view matrix in C.
  • Object is front/back face culling and camera is looking at it from the culled side. Disable face culling to test.
  • Shader code error. Check logs and run shader validation.
  • Mix up between degrees and radians.
  • Matrix layout given manually, but in wrong column/row order.
  • Screen aspect ratio is not using actual view dimensions. Has your view changed? Use values updated from the resize callback.
  • Field of view angle is too big (no shape) or too small (shape covers entire screen). This should be the angle from horizontal to the top of the display (usually 67 degrees). Test with 67 degrees first.
  • Upwards vector not correct in view matrix calculation. If camera pitches or rolls, this should be re-calculated using vector cross products. If your up vector is wrong the camera may spin on the wrong axis when you rotate.
  • Have you attempted to write a view matrix by plugging in the position and rotations directly? Naughty! This will warp the view. Calculate the rotation and translation separately, then multiply them together.

Going Further

Keyboard Call-Back

GLFW has a keyboard call-back which presents a tidier interface than checking every key in the main loop.

Mouse Controls

You could also use the mouse feedback functions from GLFW to rotate the camera. This isn't going to be suitable if you plan to use the mouse cursor to click on things, but would suit some programmes.

Optimisation

The clipping process can save quite a lot of GPU processing time by removing geometry from the hardware pipeline (remember, this diagram gave an overview of the hardware pipeline). This reduces the number of fragment shaders that need to run. But, notice that the vertices outside the frustum have all been calculated already. Various algorithms exist to remove out-of-scene geometry before rendering. You might investigate scene graphs, quad-trees, oct-trees, (which partition the world into a hierarchy of geometry), and frustum culling, which calculates collisions between a frustum-shaped box, and shapes approximating the geometry. Shapes that do not collide with the frustum-shape are culled (not drawn).

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值