该系列仅在原课程基础上部分知识点添加个人学习笔记,或相关推导补充等。如有错误,还请批评指教。在学习了 Andrew Ng 课程的基础上,为了更方便的查阅复习,将其整理成文字。因本人一直在学习英语,所以该系列以英文为主,同时也建议读者以英文为主,中文辅助,以便后期进阶时,为学习相关领域的学术论文做铺垫。- ZJ
转载请注明作者和出处:ZJ 微信公众号-「SelfImprovementLab」
知乎:https://zhuanlan.zhihu.com/c_147249273
CSDN:http://blog.csdn.net/junjun_zhao/article/details/79140440
1.2 Orthogonalization 正交化
(字幕来源:网易云课堂)
One of the challenges with building machine learning systems is that there’s so many things you could try, so many things you could change.Including, for example, so many hyperparameters you could tune.One of the things I’ve noticed is about the most effective machine learning people is they’re very clear-eyed about what to tune in order to try to achieve one effect.This is a process we call orthogonalization.Let me tell you what I mean.Here’s a picture of an old school television,with a lot of knobs that you could tune to adjust the picture in various ways.So for these old TV sets, maybe there was one knob to adjust how tall vertically your image is and another knob to adjust how wide it is.Maybe another knob to adjust how trapezoidal it is,another knob to adjust how much to move the picture left and right,another one to adjust how much the picture’s rotated, and so on.And what TV designers had spent a lot of time doing was to build the circuitry,really often analog circuitry back then,to make sure each of the knobs had a relatively interpretable function.Such as one knob to tune this, one knob to tune this, one knob to tune this,and so on.In contrast, imagine if you had a knob that tunes 0.1 x how tall the image is,+ 0.3 x how wide the image is,- 1.7 x how trapezoidal the image is,+ 0.8 times the position of the image on the horizontal axis, and so on.If you tune this knob, then the height of the image, the width of the image,how trapezoidal it is, how much it shifts, it all changes all at the same time.If you have a knob like that, it’d be almost impossible to tune the TV so that the picture gets centered in the display area.So in this context, orthogonalization refers to that the TV designershad designed the knobs so that each knob kind of does only one thing.And this makes it much easier to tune the TV,so that the picture gets centered where you want it to be.
搭建建立机器学习系统的挑战之一是你可以尝试和改变的东西太多太多了,包括 比如说 有那么多的超参数可以调,我留意到 那些效率很高的机器学习专家有个特点,他们思维清晰对于要调整什么,来达到某个效果 非常清楚,这个步骤我们称之为正交化,让我告诉你是什么意思吧,这是一张老式电视图片,有很多旋钮可以用来调整图像的各种性质,所以对于这些旧式电视 可能有一个旋钮用来调,图像垂直方向的高度 另外有一个旋钮用来调图像宽度,也许还有一个旋钮用来调梯形角度,还有一个旋钮用来调整图像左右偏移,还有一个旋钮用来调图像旋转角度 之类的,电视设计师花了大量时间设计电路,那时通常都是模拟电路,来确保每个旋钮都有相对明确的功能,如一个旋钮来调整这个 一个旋钮调整这个 一个旋钮调整这个,以此类推,相比之下 想像一下 如果你有一个旋钮调的是 0.1 x图像高度,+ 0.3 x图像宽度,- 1.7 x 梯形角度,+ 0.8乘以图像在水平轴上的坐标 之类的,如果你调整这个旋钮 那么图像的高度 宽度,梯形角度 平移位置 全部都会同时改变,如果你有这样的旋钮 那几乎不可能把电视调好,让图像显示在区域正中,所以在这种情况下正交化指的是电视设计师,设计这样的旋钮使得每个旋钮都只调整一个性质,这样调整电视图像就容易得多,就可以把图像调到正中。
Here’s another example of orthogonalization.If you think about learning to drive a car, a car has three main controls,which are steering, the steering wheel decides how much you go left or right,acceleration, and braking.So these three controls, or really one control for steering and another two controls for your speed.It makes it relatively interpretable,what your different actions through different controls will do to your car.But now imagine if someone were to build a car so that there was a joystick,where one axis of the joystick controls 0.3 x your steering angle,minus 0.8 x your speed.And you had a different control that controls2 x the steering angle, + 0.9 x the speed of your car.In theory, by tuning these two knobs,you could get your car to steer at the angle and at the speed you want.But it’s much harder than if you had just one single control for controlling the steering angle,and a separate, distinct set of controls for controlling the speed.So the concept of orthogonalization refers to that,if you think of one dimension of what you want to do as controlling a steering angle,and another dimension as controlling your speed.Then you want one knob to just affect the steering angle as much as possible,and another knob, in the case of the car,is really acceleration and braking, that controls your speed.But if you had a control that mixes the two together,like a control like this one that affects both your steering angle and your speed,something that changes both at the same time,then it becomes much harder to set the car to the speed and angle you want.And by having orthogonal, orthogonal means at 90 degrees to each other.By having orthogonal controls that are ideally aligned with the things you actually want to control,it makes it much easier to tune the knobs you have to tune.To tune the steering wheel angle, and your accelerator, your braking, to get the car to do what you want.
接下来是另一个正交化例子,你想想学车的时候 一辆车有三个主要控制,第一是方向盘 方向盘决定你往左右偏多少,还有油门和刹车,就是这三个控制 其中一个控制方向,另外两个控制你的速度,这样就比较容易解读,知道不同控制的不同动作会对车子运动有什么影响,想象一下 如果有人这么造车 造了个游戏手柄,手柄的一个轴控制的是0.3 x您的转向角,减去0.8乘以你的速度,然后还有一个轴控制的是,2x转向角+ 0.9 x你的车速,理论上来说 通过调整这两个旋钮,你是可以将车子调整到你希望得到的角度和速度,但这样比单独控制,转向角度,分开独立的速度控制 要难得多,所以正交化的概念是指,你可以想出一个维度,这个维度你想做的是,控制转向角,还有另一个维度 来控制你的速度,那么你就需要一个旋钮尽量只控制转向角,另一个旋钮 在这个开车的例子里,其实是油门和刹车 控制了你的速度,但如果你有一个控制旋钮将两者混在一起,比如说这样一个控制装置同时影响你的转向角和速度,同时改变了两个性质,那么就很难令你的车子以想要的速度和角度前进,然而正交化之后 正交意味着互成 90 度,设计出正交化的控制装置,最理想的情况是和你实际想控制的性质一致,这样你调整参数时就容易得多,可以单独调整转向角,还有你的油门和刹车 令车子以你想要的方式运动。
So how does this relate to machine learning?For a supervised learning system to do well,you usually need to tune the knobs of your system to make sure that four things hold true.First, is that you usually have to make sure that you’re at least doing well on the training set.So performance on the training set needs to pass some acceptability assessment.For some applications,this might mean doing comparably to human level performance.But this will depend on your application,and we’ll talk more about comparing to human level performance next week.But after doing well on the training sets,you then hope that this leads to also doing well on the dev set.And you then hope that this also does well on the test set.And finally, you hope that doing well on the test set on the cost function results in your system performing in the real world.So you hope that this resolves in happy cat picture app users, for example.So to relate back to the TV tuning example, if the picture of your TV was either too wide or too narrow, you wanted one knob to tune in order to adjust that.You don’t want to have to carefully adjust five different knobs,which also affect different things.You want one knob to just affect the width of your TV image.
那么这与机器学习有什么关系呢?要弄好一个监督学习系统,你通常需要调你的系统的旋钮,确保四件事情,首先 你通常必须确保,至少系统在训练集上得到的结果不错,所以训练集上的表现必须通过某种评估 达到能接受的程度,对于某些应用,这可能意味着达到人类水平的表现,但这取决于你的应用,我们将在下周更多地谈谈如何与人类水平的表现进行比较,但是 在训练集上表现不错之后,你就希望系统也能在开发集上有好的表现,然后你希望系统在测试集上也有好的表现,在最后 你希望系统在测试集上,系统的成本函数 在实际使用中表现令人满意,比如说 你希望这些猫图片应用的用户满意,我们回到电视调节的例子 如果你的电视图像,太宽或太窄 你想要一个旋钮去调整,你可不想要仔细调节五个不同的旋钮,它们也会影响别的图像性质,你只需要一个旋钮去改变电视图像的宽度。
So in a similar way,if your algorithm is not fitting the training set well on the cost function,you want one knob, yes, that’s my attempt to draw a knob.Or maybe one specific set of knobs that you can use,to make sure you can tune your algorithm to make it fit well on the training set.So the knobs you use to tune this are, you might train a bigger network.Or you might switch to a better optimization algorithm,like the Adam optimization algorithm, and so on,into some other options we’ll discuss later this week and next week.In contrast, if you find that the algorithm is not fitting the dev set well,then there’s a separate set of knobs.Yes, that’s my not very artistic rendering of another knob,you want to have a distinct set of knobs to try.So for example, if your algorithm is not doing well on the dev set,it’s doing well on the training set but not on the dev set,then you have a set of knobs around regularization that you can use to try to make it satisfy the second criteria.So the analogy is, now that you’ve tuned the width of your TV set,if the height of the image isn’t quite right,then you want a different knob in order to tune the height of the TV image.And you want to do this hopefully without affecting the width of your TV image too much.And getting a bigger training set would be another knob you could use,that helps your learning algorithm generalize better to the dev set.Now, having adjusted the width and height of your TV image, well,what if it doesn’t meet the third criteria?What if you do well on the dev set but not on the test set?If that happens,then the knob you tune is, you probably want to get a bigger dev set.Because if it does well on the dev set but not the test set,it probably means you’ve overtuned to your dev set,and you need to go back and find a bigger dev set.And finally, if it does well on the test set, but it isn’t delivering to youa happy cat picture app user,then what that means is that you want to go back andchange either the dev set or the cost function.Because if doing well on the test set according to some cost functiondoesn’t correspond to your algorithm doing what you need it to do in the real world,then it means that either your dev test set distribution isn’t set correctly,or your cost function isn’t measuring the right thing.
所以类似地,如果你的算法在成本函数上不能很好地拟合训练集,你想要一个旋钮 是的 我画这东西表示旋钮,或者一组特定的旋钮 这样你可以用来,确保你的可以调整你的算法 让它很好地拟合训练集,所以你用来调试的旋钮是 你可能可以训练更大的网络,或者可以切换到更好的优化算法,比如Adam 优化算法 等等,我们将在本周和下周讨论一些其他选项,相比之下 如果发现算法对开发集的拟合很差,那么应该有独立的一组旋钮,是的 这就是我画得毛毛躁躁的另一个旋钮,你希望有一组独立的旋钮去调试,比如说 你的算法在开发集上做的不好,它在训练集上做得很好 但开发集不行,然后你有一组正则化的旋钮可以调节,尝试让系统满足第二个条件,类比到电视 就是现在你调好了电视的宽度,如果图像的高度不太对,你就需要一个不同的旋钮来调节电视图像的高度,然后你希望这个旋钮尽量不会影响到,电视的宽度,增大训练集可以是另一个可用的旋钮,它可以帮助你的学习算法更好地归纳开发集的规律,现在调好了电视图像的高度和宽度,如果它不符合第三个标准呢?如果系统在开发集上做的很好 但测试集上做得不好呢?如果是这样,那么你需要调的旋钮 可能是更大的开发集,因为如果它在开发集上做的不错 但测试集不行,这可能意味着你对开发集过拟合了,你需要往回退一步 使用更大的开发集,最后 如果它在测试集上做得很好 但无法给你的,猫图片应用用户提供良好的体验,这意味着你需要回去,改变开发集或成本函数,因为如果根据某个成本函数 系统在测试集上做的很好,但它无法反映你的算法在现实世界中的表现,这意味着要么你的开发集分布设置不正确,要么你的成本函数测量的指标不对。
I know I’m going over these examples quite quickly,but we’ll go much more into detail on these specific knobs later this week and next week.So if you aren’t following all the details right now, don’t worry about it.But I want to give you a sense of this orthogonalization process,that you want to be very clear about which of these maybe four issues,the different things you could tune, are trying to address.And when I train a neural network, I tend not to use early stopping.It’s not a bad technique, quite a lot of people do it.But I personally find early stopping difficult to think about.Because this is an knob that simultaneously affects how well you fit the training set,because if you stop early, you fit the training set less well.It also simultaneously is often done to improve your dev set performance.So this is one knob that is less orthogonalized,because it simultaneously affects two things.It’s like a knob that simultaneously affectsboth the width and the height of your TV image.And it doesn’t mean that it’s bad to use, you can use it if you want.But when you have more orthogonalized controls,such as these other ones that I’m writing down here,then it just makes the process of tuning your network much easier.
我们很快会逐一讲到这些例子,我们以后会详细介绍这些特定的旋钮,在本周和下周晚些时候会介绍的,所以如果现在你无法理解全部细节 别担心,但我希望你们对这种正交化过程有个概念,你要非常清楚 到底是四个问题中的哪一个,知道你可以调节哪些不同的东西 尝试解决那个问题,当我训练神经网络时 我一般不用早期停止,这个技巧也还不错 很多人都这么干,但个人而言 我觉得早期停止有点难以分析,因为这个旋钮会同时影响你对训练集的拟合,因为如果你早期停止 那么对训练集的拟合就不太好,但它同时也用来改善开发集的表现,所以这个旋钮没那么正交化,因为它同时影响两件事情,就像一个旋钮同时影响,电视图像的宽度和高度,不是说这样就不要用 如果你想用也是可以的,但如果你有更多的正交化控制,比如我这里写出的其他手段,用这些手段调网络会简单不少。
So I hope that gives you a sense of what orthogonalization means.Just like when you look at the TV image,it’s nice if you can say, my TV image is too wide,so I’m going to tune this knob, or it’s too tall, so I’m going to tune that knob,or it’s too trapezoidal, so I’m going to have to tune that knob.In machine learning, it’s nice if you can look at your system and say,this piece of it is wrong.It does not do well on the training set, it does not do well on the dev set,it does not do well on the test set,or it’s doing well on the test set but just not in the real world.But figure out exactly what’s wrong, and then have exactly one knob,or a specific set of knobs that helps to just solve that problem that is limiting the performance of machine learning system.So what we’re going to do this week and next week is go throughhow to diagnose what exactly is the bottle neck to your system’s performance.As well as identify the specific set of knobs you could use to tune your system to improve that aspect of its performance.So let’s start going more into the details of this process.
所以我希望你们对正交化的意义有点概念,就像你看电视图像一样,如果你说 我的电视图像太宽,所以我要调整这个旋钮 或者它太高了 所以我要调整那个旋钮,或者它太梯形了 所以我要调整这个旋钮 这就很好,在机器学习中 如果你可以观察你的系统 然后说,这一部分是错的,它在训练集上做的不好 在开发集上做的不好,它在测试集上做的不好,或者它在测试集上做的不错 但在现实世界中不好 这就很好,必须弄清楚到底是什么地方出问题了 然后我们刚好有对应的旋钮,或者一组对应的旋钮 刚好可以解决那个问题,那个限制了机器学习系统性能的问题,这就是我们这周和下周要讲到的,如何诊断出 系统性能瓶颈到底在哪,还有找到你可以用的一组特定的旋钮,来调整你的系统 来改善它特定方面的性能,我们开始详细讲讲这个过程吧,
重点总结:
结构化机器学习项目 — 机器学习策略(1)
1. 正交化
表示在机器学习模型建立的整个流程中,我们需要根据不同部分反映的问题,去做相应的调整,从而更加容易地判断出是在哪一个部分出现了问题,并做相应的解决措施。
正交化或正交性是一种系统设计属性,其确保修改算法的指令或部分不会对系统的其他部分产生或传播副作用。 相互独立地验证使得算法变得更简单,减少了测试和开发的时间。
当在监督学习模型中,以下的 4 个假设需要真实且是相互正交的:
- 系统在训练集上表现的好
- 否则,使用更大的神经网络、更好的优化算法
- 系统在开发集上表现的好
- 否则,使用正则化、更大的训练集
- 系统在测试集上表现的好
- 否则,使用更大的开发集
- 在真实的系统环境中表现的好
- 否则,修改开发测试集、修改代价函数
参考文献:
[1]. 大树先生.吴恩达Coursera深度学习课程 DeepLearning.ai 提炼笔记(3-1)– 机器学习策略(1)
PS: 欢迎扫码关注公众号:「SelfImprovementLab」!专注「深度学习」,「机器学习」,「人工智能」。以及 「早起」,「阅读」,「运动」,「英语 」「其他」不定期建群 打卡互助活动。