2.4 理解指数加权平均

这里写图片描述
If beta equals 0.9 you got the red line. If it was much closer to one, if it was 0.98, you get the green line. And it it’s much smaller, maybe 0.5, you get the yellow line.

这里写图片描述

Let’s look a bit more than that to understand how this is computing averages of the daily temperature. So here’s that equation again, and let’s set beta equals 0.9 and write out a few equations that this corresponds to.

v100=0.1θ100+0.9v99=0.1θ100+0.9(0.1θ99+0.9v98)=0.1θ100+0.9(0.1θ99+0.9(0.1θ98+0.9v97))=0.1θ100+0.10.9θ99+0.10.92θ98+0.10.93θ97+ v 100 = 0.1 θ 100 + 0.9 v 99 = 0.1 θ 100 + 0.9 ( 0.1 θ 99 + 0.9 v 98 ) = 0.1 θ 100 + 0.9 ( 0.1 θ 99 + 0.9 ( 0.1 θ 98 + 0.9 v 97 ) ) ⋮ = 0.1 θ 100 + 0.1 ⋅ 0.9 θ 99 + 0.1 ⋅ 0.9 2 θ 98 + 0.1 ⋅ 0.9 3 θ 97 + ⋯

So one way to draw this in pictures would be if, let’s say we have some number of days of temperature. So this is theta and this is T. So theta 100 will be some value, then theta 99 will be some value, theta 98, so these are, so this is T equals 100, 99, 98, and so on, ratio of sum number of days of temperature. And what we have is then an exponentially decaying function. So starting from 0.1 to 0.9, times 0.1 to 0.9 squared, times 0.1, to and so on. So you have this exponentially decaying function. And the way you compute V100, is you take the element wise product between these two functions and sum it up. So you take this value, theta 100 times 0.1, times this value of theta 99 times 0.1 times 0.9, that’s the second term and so on. So it's really taking the daily temperature, multiply with this exponentially decaying function, and then summing it up. And this becomes your V100.

It turns out that, up to details that are for later. But all of these coefficients, add up to one or add up to very close to one, up to a detail called bias correction which we’ll talk about in the next video. But because of that, this really is an exponentially weighted average.

And finally, you might wonder, how many days temperature is this averaging over. Well, it turns out that 0.9100.35 0.9 10 ≈ 0.35 and this turns out to be about 1/e 1 / e , one of the base of natural algorithms. And, more generally, if you have 1ϵ 1 − ϵ , so in this example, ϵ ϵ would be 0.1, so if this was 0.9, then

(1ϵ)1ϵ1e0.35 ( 1 − ϵ ) 1 ϵ ≈ 1 e ≈ 0.35

And so, in other words, it takes about 10 days for the height of this to decay to around 1/3 already 1/e 1 / e of the peak. So it’s because of this, that when β β equals 0.9, we say that, this is as if you’re computing an exponentially weighted average that focuses on just the last 10 days temperature. Because it's after 10 days that the weight decays to less than about a third of the weight of the current day.

Whereas, in contrast, if beta was equal to 0.98, then, well, what do you need 0.98 to the power of in order for this to really small? Turns out that 0.98 to the power of 50 will be approximately equal to 1/e 1 / e . So the way to be pretty big will be bigger than 1/e 1 / e for the first 50 days, and then they’ll decay quite rapidly over that. So intuitively, this is the hard and fast thing, you can think of this as averaging over about 50 days temperature. Because, in this example, to use the notation here on the left, it’s as if epsilon is equal to 0.02, so one over epsilon is 50.

And this, by the way, is how we got the formula, that we're averaging over one over one minus beta or so days. Right here, epsilon replace a row of 1β 1 − β . It tells you, up to some constant roughly how many days temperature you should think of this as averaging over. But this is just a rule of thumb for how to think about it, and it isn’t a formal mathematical statement.

Finally, let’s talk about how you actually implement this. Recall that we start over v0 v 0 initialized as zero, then compute v1 v 1 on the first day, v2 v 2 , and so on. Now, to explain the algorithm, it was useful to write down v0 v 0 , v1 v 1 , v2 v 2 , and so on as distinct variables.
这里写图片描述
So just to say this again but for a new format, you set v0 v 0 equals zero, and then, repeatedly, have one each day, you would get next θT θ T , and then set to vT v T , gets updated as beta, times the old value of V theta, plus one minus beta, times the current value of V theta.

So one of the advantages of this exponentially weighted average formula, is that it takes very little memory. You just need to keep just one row number in computer memory, and you keep on overwriting it with this formula based on the latest values that you got. And it’s really this reason, the efficiency, it just takes up one line of code basically and just storage and memory for a single row number to compute this exponentially weighted average.

It’s really not the best way, not the most accurate way to compute an average. If you were to compute a moving window, where you explicitly sum over the last 10 days, the last 50 days temperature and just divide by 10 or divide by 50, that usually gives you a better estimate. But the disadvantage of that, of explicitly keeping all the temperatures around and sum of the last 10 days is it requires more memory, and it’s just more complicated to implement and is computationally more expensive.

So for things, we’ll see some examples on the next few videos, where you need to compute averages of a lot of variables. This is a very efficient way to do so both from computation and memory efficiency point of view which is why it’s used in a lot of machine learning. Not to mention that there’s just one line of code which is, maybe, another advantage.

So, now, you know how to implement exponentially weighted averages. There’s one more technical detail that’s worth for you knowing about called bias correction.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值