if think learning to drive a car ,a car has three main controls,which are steering, the steering wheel decides how much you go left or right,acceleration and braking. so these three controls,or really one control for steering, and another two controls your speed .it makes relatively interpretable, what your different actions throughdifferent controls will do to you car, but now imagine someone were to build a car so that there was a joystick,and the joystick contols like this:
in theory,by tuning these two knobs,you could get your car at the angle and at the speed you want .so the concept of orthogonalization refers to that, if you think one dimension of what you want to do as controlling steering angle,and another dimension as controlling you speed
then you want one knob to just affect steering angle as much as possible,and another knob,in the case of car , is really acceleration and braking that controls your speed.
But if you had a control that mixes the two together,like a control like this one that affects your steering angle and your speed,something that changing at the same time
then it become much harder to set the car to the speed and the angle you want,and by having orthgonal,orthgonal means at 90 degrees to each other, by having orthgonal controls that are ideally aligned with the things you actually want to control,it makes much easier to tune the knobs you have to tune,to tune your steering wheel angle and your accelerator and braking. to get the car to do you want
so how does this relate to machine learning,for a supervised learning system to do well,you usually need to tune your knobs of your system to make sure that four things hold true
in a word,if it not doing well in training set、dev set、test set、or performs well in real word ,you need one knob or one specific sets of knobs that you can use to make sure you can tune your algorithm to make it fit well,it means to try to make it satisfy the criteria