Variance Reduction Methods: a Quick Introduction to Importance Sampling——读完

https://www.scratchapixel.com/lessons/mathematics-physics-for-computer-graphics/monte-carlo-methods-in-practice/variance-reduction-methods

the art of variance reduction: better for less
both variance reduction techniques and quasi monte carlo methods (which we will talk about in the next chapter) are complex topics. why should u learn about them now? people have written entire books on these topics alone. why are they so important? monte carlo as u may have realized is actually quite a simple method. its main problem though is variance (in rendering, this turns into noise划重点要考的). to get rid of it, all we can do is increase the number of samples N, and to make things worse, each time u want to reduce this noise by two, u need not two but four times as many samples (for the error to decrease linearly, the number of samples needs to increase exponentially). thus a great deal of research was invested into finding whether this noise could be reduced by any other mean than just the brute force approach (increasing N). and it happens that mathematics again can help answering this question. this lead to an entire branch in monte carlo research focused on what is known as variance reduction methods. how can we reduce variance by any other mean than just increasing N. it happens that importance sampling and quasi monte carlo are two such solutions. of course, u can imagine why people are so excited about it. it is the saint graal of monte carlo rendering: the promise of better for less. importance sampling and quasi monte carlo deserve a lesson of their own (which u will find later in this section). teaching these techniques will also becomes easier once the topic of light transport and BRDFs have been reviewed (the next three lesssons). but let us briefly talk about them now, just to give u a feel of what they are how and why they work (if u are not interested in the mathematical details and just want to have an intuitive understanding of what these methods are, this introduction is probably what u are looking for and good enough).

A Quick Introduction to Importance Sampling

in the lesson introduction to shading, we already explained that to shade a point P, we need to gather light arriving at P. in 3D, light can come from all directions in the hemisphere oriented about the normal at P which mathematicaly, we can define as an integral over the hemisphere (the hemisphere is the domain of integration):

在这里插入图片描述
The symbol Ω is usually used in the literature to denote the hemisphere of direction (and dw is a differential solid angle, w is a direction) and L(w) a function of direction (it returns the amount of light coming from w). technically, we should add a cos⁡(θ) in this expression but it is not important for this demonstration. what is important, is to realize that finding how much light arrives at P requires an integral over the entire hemisphere of directions above P. unfortunately, there is no closed-form to this integral for arbitrary geometry. however, we can approximate the result of this integral with monte carlo integration. the idea is to “sample” the hemisphere, or to create a set of N random directions over the hemisphere about P. the approximation of the integral is just an average of the amount of light coming from these N randomly chosen directions:
在这里插入图片描述
where ωi is a random direction contained in the hemisphere above P (in the lesson Sampling from the advanced section, you can lean how to create samples over the hemisphere). An illustration of this technique can be seen in figure 1 (top).
在这里插入图片描述
Figure 1: samples can miss parts of the function which are contributing a lot to the result of the integral. It would be great to distribute these samples using a PDF that has the same than the function (the PDF is high whereas the function is high).

In two dimension though, the hemisphere becomes a half disk. Now, as with most of the Monte Carlo integration cases we have been studying so far, the samples are uniformly distributed. In other words, each direction in the hemisphere has equal probability to be chosen. If we consider the 2D case, the direction is reduced to an angle (θ) going from 0 to π. The domain of integration being defined by the interval [0,π], the PDF (which we know is 1/(b-a)) is therefore 1/π and is uniform all across i.e. the distribution function is constant (figure 1, bottom, the red line represents this PDF).

Let’s now look into what the problem is. As you can see in the top of figure 1, we are trying to find out how much light arrives at P, a point located on the floor of a box. To make the demonstration easier, we will imagine that the walls from this box are very dark, therefore they will contribute very little to the illumination of P. However, we have two lights on the ceiling of this room. Naturally, most of the light illuminating the floor will come from these two light sources. But, as you can see on the figure, on the six samples used in our Monte Carlo integration to approximate light arriving at P, all samples are missing these lights but one. Since the contribution of one of the lights is missing, we can assume that the approximation of the amount of light arriving at P using this set of samples, will be underestimated. In other words, we can except the difference between the approximation (the monte carlo integration) and the expected value of f(x) to be quite high (variance will be high).

When the function is pretty regular, the position of the samples has little impact on the result. When the function is very irregular, a small change in the sample position may cause a large-scale change in the result of the Monte Carlo integration. In general, irregular functions are difficult to sample and cause a high variance in the estimator (figure 2).

在这里插入图片描述
Figure 2: When the function is pretty regular (top left), the position of the samples has little impact on the result. When the function is very irregular, a small change in the sample position may cause a large-scale change in the result of the Monte Carlo integration…

Technically you can see the amount of light arriving at P as a function defined over the interval [0,π]. This is like rotating the hand指针 of a watch 手表的指针 from one side of the disk 圆盘 to the other, and measuring how much light strikes P from the direction pointed out by the hand. 这就像把手表的指针从圆盘的一边转到另一边,测量从指针指向的方向照射到P的光的强度。
At the bottom of figure 1, you can see what this function looks like (the yellow curve).
在这里插入图片描述
上图中,黄色的曲线,是最好的采样概率密度函数。
It’s a nice, continuous smooth curve, with two distinct peaks one for each of the two light sources. 每个光源对应一个波峰。
If we were to integrate this function, in the entire domain over which this function is valid, it is pretty clear that these two peaks would have a more important contribution 很显然这两个波峰贡献最大,to the result of the integral than any other part of the curve. We can say that these peaks are where the function is the most important. 这个两个播放是最终的。
As we said, it would make sense to create samples where the function is important but as we have just explained, there is no guarantee that it will actually be the case particularly when the sample are uniformly distributed.
综上,需要区分主次,在重要的地方进行采样。而不是单一均值采样。

蒙特卡洛方法是一个建立在随机基础上的方法。
Now you may think that this is not a problem. Why not just creating samples where the function is important (figure 3). There is at least two problems with this approach. First keep in mind that monte carlo integration is a stochastic process 随机过程. In other words, it only works because samples are chosen randomly. If you start deciding where you put the samples (rather than randomly distributing them), the theory of monte carlo integration will fall apart (the result of your sudo Monte Carlo integration will have no mathematical meaning). Concentrating samples where the function is important while ignoring the over parts of the curve, will probably overestimate the result anyway, which is not good either.
如果只是在重要的地方采样,也会引起问题——过采样。
But the second problem is even more critical: knowing where the function is actually important, suggests that we know what that function is in the first place, which obviously in most situations, is an information we don’t have. For example in the example of finding how much light strikes P, we just don’t know anything about where is the light coming from and in which amount. If we had this information, we would probably not use monte carlo integration in the first place anyway. We only use this method, because precisely, we don’t know beforehand what that function is, so we just sample it, randomly, hoping that the average of these samples will give us a valid approximation to the integral.

This is a very important point in Monte Carlo integration. In most situations, we have no prior knowledge on the function (the integrand) being integrated, and this is often particularly true in rendering.
在这里插入图片描述
如果采用的是左图的蓝色的采样点,它是均值采样,虽然不好的,但是还是ok。
但是如果是右侧,限制死的,则不满足蒙特卡洛随机的要求。

Figure 3: uniform distribution is okay but we may miss parts of the function which are important. In this example (on the left), one of the peaks is not sampled. Forcing the samples to be in the regions of the function which are the most important is not a good solution either. Monte Carlo integration relies on the samples to be “stochastically” distributed. Forcing them to be within specific regions of the function will introduce bias (in this example, on the right, you can see that the chosen samples will lead to overestimating the result of the integral).

unfortunately, variance reduction techniques require some previous knowledge of the function being integrated. in some importance cases (such as the two we described in this chapter, integrating the incoming radiance over the pixel area and the incoming light at P) we do not know what that function is. so what is the deal? Helas, there is no magical solution to this problem. if u do not have any prior knowledge on the function being integrated, just forget about variance reduction. but let us see what we can do when we know about f(x) prior to the integration (we will show an example related to rendering later in this chapter).

Let’s imagine that the integrand is a constant function (figure 4). So you go with a uniform distribution, you sample your constant function, get a result, and everything is great. Obviously since the function is constant, every time you run this approximation you get the same result. In other words, variance is simply zero. The ultimate goal of Monte Carlo integration is to have the smallest possible variance, with the smallest number of samples. So obviously when the variance is 0, you can’t get anything better than that. Then you might think “wouldn’t it be great if all functions that needed integration, were constant”. Though, in reality, we know that this is never the case: functions that need integration are never constant (it would be too easy). Some of them can even have very high narrow peaks. But can we turn a non constant function into a constant function?
在这里插入图片描述
Figure 4: the position of the samples doesn’t change the result of the Monte Carlo integration.

while being a rather strange thought, the answer to this question is obvioulsy yes. u just need to divide the function by itself. if the funtion f(x) equals 2 when x=0 , then f(0)/2 = 1. If the function f(x) = 0.5 when x = 2 then f(2)/0.5 = 1. If you do this for every value of x, then you end up with a constant function f(x)/f(x) = 1.

As we showed in figure 5 (right), dividing f(x) by another other function exactly proportional to f(x) gives a constant function as well. If f′(x) is a copy of f(x) but either scaled up or down by a constant factor c, we have:
在这里插入图片描述
and thus:
在这里插入图片描述
在这里插入图片描述
figure 5: if the function f(x) is divided by itself (in red left) or another function
f’(x) which is proportional to f(x) (in red, right), we get a constant function (in blue).

Now you will think, that’s great and simple but what’s the point, since the function we are interested in is f(x) and not a constant function. That’s true, but have another look at the formula for the general Monte Carlo estimator again:

在这里插入图片描述

We kept the same color code (red for f(x) and yellow for f’(x)) so that you can more easily see where we are heading at. In the Monte Carlo estimator, the integrand f(x) is divided by another function, pdf(x). Taking advantage of what we just learned, if pdf(x) is a function exactly proportional to f(x), then the variance of the Monte Carlo integral will also exactly be 0!

Since Monte Carlo integration applied to a constant function has no variance (regardless of the way samples are distributed), we can take advantage of the general Monte Carlo estimator (in which the integrand f(x) is divided by pdf(x)) to create the same effect. Indeed, if the pdf(x) (in the numerator) is a function whose shape is exactly proportional (the ideal case) or as similar as possible to the integrand, then variance will either be 0 (the ideal case) or potentially greatly reduced:

在这里插入图片描述
What it means in practice, is that to reduce variance, our tasks is to find a pdf that is matching as closely as possible the shape of the integrand. When the PDF matches the shape of the function exactly, you can a perfect estimator.

obvioulsy, this is not always possible, because as we said before, we do not always know what f(x) looks like. however, when we do, variance reduction can be achieved, by selecting a pdf whose shape is similar to the shape of f(x) (this often possible in shading, as we will see in the lesson on importance sampling). this idea illustrated in figure 6 in which the integrand function is plotted in blue and the pdf in red. if we do not know anything about the function’s shape, our best choice is probably to stick with a uniform distribution. of course, it will not be as good as if the pdf and the integrand had a similar shape (figure 6, middle) but certain still better than if we choose a pdf whose shape is really different than the function (as showed in the bottom of figure 6). In summary, if you don’t know what f(x) looks like or if you don’t know how to build a pdf to match the shape of f(x), just stick to uniform distribution. But at all cost, avoid using a pdf which is very different than f(x), because instead of reducing variance, you will do the exact opposite, which is to likely increase it a great deal. And remember, noise is what we try to get rid of so let’s not add any more of it!

在这里插入图片描述
Figure 6: for importance sampling to work, you need a PDF (in red) which is as close as possible to the function (in bue).

how do we choose the pdf? this part will be detailed in the lesson importance sampling.

before we look at a simple example (practical examples applied to rendering will be given in the lesson on importance sampling), let us explain where the term importance sampling comes from. in the chapter on Monte Carlo simulation, we introduced the general formula of the Monte Carlo estimator, in which the integrand is divided by the pdf. if the PDF is non uniform, the density of samples may vary across the domain of integration (creating more samples in areas where the PDF is high, and less where it is low). one important observation is that, if the PDF has more or less the same shape than the integrand f(x), then using this PDF will generate more samples in areas where the function is also important.

by choosing a PDF whose shape is similar to that of the integrand f(x), we generate more samples where the function is also likely to be important. whenever we will have prior knowledge of f(x) we will attempt to choose a pdf(x) with exactly or approximately the same shape as the integrand. using this PDF to generate our samples (sampling the random variable X form this pdf) will generate more samples where f(x) is important, thus reducing the risk of missing the important features of the function,and consequently reducing variance. hence the term importance sampling. we distribute more samples wherever the function f(x) is important, while still eventually distributing samples (only less) wherever the function is less important.

so in essence it is like combining the best of two worlds: we still have a stohastic process in which samples are chosen randomly, only they are distributed from a given PDF, so more samples might be generated where the pdf is high, etc. but, while chossing this PDF wisely, we can reduce variance, by reducing the “chance” of missing the important features of the function (and these features are important because they contribute the most ot the result of the integral).

remember from our original example, that our problem was that on 6 samples, only 1 was sampling one of the two peaks. because the contribution of one of the peaks is ignored, the error is high. however with importance sampling, assuming we can create a PDF which has a similar shape to our yellow curve, then we increase our chances to see more samples around these two peaks, thus potentially reducing variance (the error between the approximation and the expected value of f(x)).

In the chapter on Monte Carlo integration, we proved that expected value of the MC general estimator formula is indeed equal to the expected value of f(x). However remember that we also gave an intuitive explanation to the process. Because sampling the random variables from an arbitrary PDF, generates more samples in some areas and less in others, dividing f(x) by the pdf, reduces the contribution or weight of samples generated in areas of higher density, and increases the contribution of those generated in areas of lesser density.

So again, by generating samples where the function is important, we reduce the variance, however, we have also proven that the formula is mathematically valid and leads to the correct solution (which is that as the number of samples increases, the Monte Carlo integration will eventually converge to the expected value of f(x)).

Also an important note. Notice that even though more samples are generated in regions where the function is important, the process stays stochastic. Not only where the samples will be generated can’t be determined ahead, but it’s not because we can expect more samples to be generated where the PDF is high (we can’t know for sure, since the samples are drawn randomly, but statistically we have more chances to see them generated in these regions), that no samples will be generated at all where the PDF is low.

Again, it’s really about the art of distributing our credit of samples in the most efficient way, devoting a good chunk of them into sampling the function where it is high or important (it contributes the most to the integral), while keeping a few to sample the function everywhere else. The art of Monte Carlo integration is to find the optimal solution to this problem. Where should I invest my samples in, to get the smallest possible error for a fixed number of samples N.

在这里插入图片描述
example
finally let us look at a simple example. we use monte carlo integration to approximate the following integral:
在这里插入图片描述
在这里插入图片描述
Figure 7: our setup to test importance sampling. As you can see the shape of the second pdf p’(x) is closer to the shape of the integrand.

The great thing about this function is that we can actually compute the result of the integral analytically using the second fundamental theorem of calculus (see the chapter Mathemmatics of Shading in the lesson Introduction to Shading & Radiometry). The antiderivative of sin(x)sin⁡(x) is −cos(x)−cos⁡(x). Thus we can write:
在这里插入图片描述
The result of this integral is 1. Now we will try to approximate this integral using Monte Carlo integration. We will be using two different pdfs: a uniform probability distribution (p(x)=2/πxp(x)=2/πx 这里原文有一个x,其实是不对的,p(x)=2/π) and a pdf (p′(x)=8x/π2) which as you can see in figure 7, has a shape that matches the shape of the integrand better than the uniform PDF. The theory tells us that drawing samples from this PDF should reduce the variance compared to drawing samples from the uniform PDF. Let’s check it then! For the uniform distribution, we will be using the following estimator:
在这里插入图片描述
where the XiXis are drawn from a PDF with uniform distribution. As for the the second PDF, let’s use the inverse sampling method to find the formula we will be using to draw samples. Recall that we first need to compute then invert the CDF.

在这里插入图片描述

Now, let’s invert the function:
在这里插入图片描述

#include <cstdio> 
#include <cstdlib> 
#include <cmath> 
int main(int argc, char **argv) 
{ 
    srand48(13); 
    int N = 16; 
    for (int n = 0; n < 10; ++n) 
    { 
        float sumUniform = 0, sumImportance = 0; 
        for (int i = 0; i < N; ++i) 
        { 
            float r = drand48(); 
            sumUniform += sin(r * M_PI * 0.5); 
            float xi = sqrtf(r) * M_PI * 0.5; // this is our X_i 
            sumImportance += sin(xi) / ((8 * xi) / (M_PI * M_PI)); 
        } 
        sumUniform *= (M_PI * 0.5) / N; 
        sumImportance *= 1.f / N; 
        printf("%f %f\n", sumUniform, sumImportance); 
    } 
    return 0; 
} 

解释下代码,上面为啥求一个呢?我们随机出来的值,是看成是概率密度的值,然后求对应的xi是多少。
我们知道P(x<=X)=Value 是,这里的随机出来的值,就是Value,那么此时x等于多少呢?
ok,所以要对概率密度积分得到P(x<=X)的表达式,也就是上面的:
在这里插入图片描述
ok,当随机出来一个值之后,也就是:

在这里插入图片描述
这样就求出了对应的x的之后了。这里u就是x。
有了这个之后,之后就可以计算每个sin(xi)了。
然后套用公式:
在这里插入图片描述

if u run the program u should get the following results (with N = 16):
在这里插入图片描述

u can see in the last two colums, where variance is expressed as a percentage (of the difference between the result of the Monte Carlo integration and the actual expected value of f(x) which we know is 1), that the error is greater on average with uniform sampling than it is with importance sampling (rightmost column). As a side note, note how the last run (last row) gives a result very close to the correct solution (for both method). This is a good example of what we explained in the first chapter of this lesson: because we use random samples, a MC method can as well “just” randomly falls on the exact value by pure chance (on some occasions).

This conclude out “short” introduction on importance sampling. This example is very simple; its aim is just to show that it actually works. In the lesson Importance Sampling, you will find more complex examples which also apply more directly to rendering.

在这里插入图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值