Mouse Picking with Ray Casting

Overview

It can be useful to click on, or "pick" a 3d object in our scene using the mouse cursor. One way of doing this is to project a 3d ray from the mouse, through the camera, into the scene, and then check if that ray intersects with any objects. This is usually called ray casting. This is an entirely mathematical exercise - we don't use any OpenGL code or draw any graphics - this means that it will apply to any 3d application the same way. The mathematical subject is usually called geometricintersection testing.


Instead of starting with a mesh in local space, we are starting with a 2d mouse cursor position in viewport space. We work backwards through the transformation by using inverse matrices, and arrive with a ray in world space.

With ray picking we usually simplify a scene into bounding spheres or boxes. This makes the calculation a bit easier than testing all of the individual triangles. We don't need to create a 3d sphere out of triangles that can be rendered; we just represent the sphere as a simple function. The premise is that we have a mathematical formula for the points along a ray, and a mathematical formula for the points on a sphere. If we substitute the points on the sphere with the equation for points on a ray, then we get the intersection of points that are common to both. It's interesting to do this technique now because it shows us how we can use the transformation pipeline in reverse; from 2d screen to a 3d world space by using the inverse of our matrices e.g ...inverse (view_matrix) * inverse (projection_matrix).... In a later tutorial we will look at an alternative technique using unique colours to determine where the mouse is hovering or clicking.

Calculating a Ray from the Mouse

All ray casting starts with a ray. In this case it has an originO at the position of the camera. We can do ray intersections in any space (world, eye, etc.), but everything needs to be in the same space - let's assume that we are doing our calculations in world space. This means that our ray origin is going to be the world x,y,z position of the camera.


This function expresses all the points, t, along an infinite ray projected from our camera location, O, in the direction D, which we will work out by modifying and normalising the mouse cursor position.

Step 0: 2d Viewport Coordinates

range [0:width, height:0]

We are starting with mouse cursor coordinates. These are 2d, and in the viewport coordinate system. First we need to get the mouse x,y pixel coordinates. You might have set up a call-back function (with e.g. GLFW or GLUT) something like this:

This gives us an x in the range of 0:width and y from height:0. Remember that 0 is at the top of the screen here, so the y-axis direction is opposed to that in other coordinate systems.

Step 1: 3d Normalised Device Coordinates

range [-1:1, -1:1, -1:1]

The next step is to transform it into 3d normalised device coordinates. This should be in the ranges of x [-1:1] y [-1:1] and z [-1:1]. We have an x and y already, so we scale their range, and reverse the direction of y.

We don't actually need to specify a z yet, but I put one in (for the craic).

Step 2: 4d Homogeneous Clip Coordinates

range [-1:1, -1:1, -1:1, -1:1]

We want our ray's z to point forwards - this is usually the negative z direction in OpenGL style. We can add a w, just so that we have a 4d vector.

Note: we do not need to reverse perspective division herebecause this is a ray with no intrinsic depth. Other tutorials on ray-casting will, incorrectly, tell you to do this. Ignore the false prophets! We would do that only in the special case of points, for certain special effects.

Step 3: 4d Eye (Camera) Coordinates

range [-x:x, -y:y, -z:z, -w:w]

Normally, to get into clip space from eye space we multiply the vector by a projection matrix. We can go backwards by multiplying by the inverse of this matrix.

Now, we only needed to un-project the x,y part, so let's manually set the z,w part to mean "forwards, and not a point".

Step 4: 4d World Coordinates

range [-x:x, -y:y, -z:z, -w:w]

Same again, to go back another step in the transformation pipeline. Remember that we manually specified a -1 for the z component, which means that our ray isn't normalised. We should do this before we use it.

This should balance the up-and-down, left-and-right, and forwards components for us. So, assuming our camera is looking directly along the -Z world axis, we should get [0,0,-1] when the mouse is in the centre of the screen, and less significant z values when the mouse moves around the screen. This will depend on the aspect ratio, and field-of-view defined in the view and projection matrices. We now have a ray that we can compare with surfaces in world space.

Shortcut

If you're happy with this theory then you can condense all of these instructions into one line.

Ray vs. Plane

We can describe any surface as a plane. It's often useful to do this before doing more detailed (and more computationally expensive) intersection tests. We can test, for example, if a point is inside a region bounded by planes - if not, then no need to check versus individual triangles inside the region. In my case, I wanted to create an imaginary ground plane that I could use to help me click-and-drag boxes around scenery. I do 2 ray-plane intersections to get the top-left and bottom-right corners of the box in xyz world coordinates.


I used ray-plane intersection to allow a user to drag out a "zone" on the floor. This might be useful in strategy games where you want to select a group of units. Unless you're on a globe. In that case, read on. Note that the square is projected into depth - it isn't a true square on the 2d screen, but will be if looking from directly above it.

Ray vs. Sphere

Probably the easiest method for selecting objects in a 3d scene. If you know a bounding radius, and centre point, of each object then we have the definition of its sphere. This might not be the best method to use for large, or irregularly shaped objects, which will have very large spheres that might overlap with smaller objects nearby.


Here I have an equation for all the points, P on a sphere's surface, which says "anything that is radius r distance from centre point C". The idea is to replace P with the points-along-a-ray equation, and solve for t. I skipped the re-arrangement steps, but if you're interested in a full discussion have a look at Akenine-Möller et al. Real-Time Rendering.

We can actually get 2 output intersections from a ray-sphere intersection because the ray can hit both the back and the front of the sphere. In this case we need to check for the closest intersection (smallest value of t). The ray can also miss the sphere, hit the very edge of the sphere (both t values are the same), or be cast from inside the sphere (one t value is negative).

Be sure to check the b^2 - c term before evaluating the expensive square root operator. If you miss the sphere entirely you will not get a sensible result for t. It is also possible that the ray will be cast from in front of or inside of the sphere, so you will need to check that your t values are positive.


Here I have a collection of spheres in 3d space. I generated these in Blender and scaled them to match the radius of my bounding sphere. They are all the same size, but one is farther away and behind the others - this was to test that only the closest intersection was being selected. When a sphere is clicked on, I changed its colour to pink. This helped me determine that the ray was being calculated correctly, and the intersection was returning the correct result. You can imagine each sphere surrounding a mesh or object that is being selected.

Optimisations

You probably won't need to unless you're doing a lot of clicking on spheres, but you can reduce the cost of this algorithm by adding a few checks before getting to the square root. Compare squared distances between the ray origin and the sphere centre. You can project this as a 2d distance along the ray direction (dot product) and compare the end point's squared distance to the sphere origin (use Pythagoras' theorem) with the squared radius.

Round-Up

Generally, if you want to draw on a shape (e.g. dragging boxes) then ray-casting is a good choice. If you want inaccuracy i.e. letting the user click in the general area of a fiddly little mesh to select it then ray-casting is a good choice. If you want accuracy then you're almost certainly better off using a colour-based picking method, which has the cost of one extra rendering pass, rather than 1-5 ray casts per selectable object in the scene.

Further Things (That I Might or Might Not Add Later)

Axis-Aligned Bounding Box (AABB)

You can do ray versus axis-aligned box intersection, which is quite commonly used. An object in the world is considered to have a bounding box, rather than a bounding sphere. This has the weakness that irregularly shaped, rotating objects will create gigantic boxes, but it is quite easy to work out the extents, and use these to validate a set of ray-plane intersections.

Oriented Bounding Box (OBB)

If your shapes are actually boxes, or you want the box to rotate with the object inside, then you can do a ray versus oriented box intersection in much the same way. A few optimised methods exist (Haines "Essential Ray Tracing Algorithms", Chapter 2). I haven't had occasion to use this "in the wild".

Barycentric

Ray versus triangle collisions will let you determine exactly where on a more complex mesh that you have clicked on. If you need this much accuracy, you are probably better off using the colour buffer method and getting a pixel-correct method. It might be useful for things like shooting games where you want put a bullet hole image in an exact location though. I haven't found a good reason to use this one in a while either. Before testing several different surfaces of a complex object, it's generally advised to do a ray versus bounding sphere test first - no point testing all of the triangles if the ray has missed the whole object!

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
提供的源码资源涵盖了安卓应用、小程序、Python应用和Java应用等多个领域,每个领域都包含了丰富的实例和项目。这些源码都是基于各自平台的最新技术和标准编写,确保了在对应环境下能够无缝运行。同时,源码中配备了详细的注释和文档,帮助用户快速理解代码结构和实现逻辑。 适用人群: 这些源码资源特别适合大学生群体。无论你是计算机相关专业的学生,还是对其他领域编程感兴趣的学生,这些资源都能为你提供宝贵的学习和实践机会。通过学习和运行这些源码,你可以掌握各平台开发的基础知识,提升编程能力和项目实战经验。 使用场景及目标: 在学习阶段,你可以利用这些源码资源进行课程实践、课外项目或毕业设计。通过分析和运行源码,你将深入了解各平台开发的技术细节和最佳实践,逐步培养起自己的项目开发和问题解决能力。此外,在求职或创业过程中,具备跨平台开发能力的大学生将更具竞争力。 其他说明: 为了确保源码资源的可运行性和易用性,特别注意了以下几点:首先,每份源码都提供了详细的运行环境和依赖说明,确保用户能够轻松搭建起开发环境;其次,源码中的注释和文档都非常完善,方便用户快速上手和理解代码;最后,我会定期更新这些源码资源,以适应各平台技术的最新发展和市场需求。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值