[Machine Learning][Linear Regression]Feature Scaling

最新推荐文章于 2022-03-22 18:37:00 发布

jkwRay

最新推荐文章于 2022-03-22 18:37:00 发布

阅读量321

点赞数

分类专栏： Octave 梯度下降机器学习优化特征缩放文章标签：梯度下降特征缩放

本文链接：https://blog.csdn.net/jkwRay/article/details/77689557

版权

机器学习同时被 3 个专栏收录

5 篇文章 0 订阅

订阅专栏

Octave

4 篇文章 0 订阅

订阅专栏

梯度下降

2 篇文章 0 订阅

订阅专栏

Introduction

When I use gradient descent to get the h(x) which is similar to ‘x^2 + 2*x + 1’, I find a problem that the alpha need to be small like 0.000001, otherwise the variable can’t be regressed, so that the training will become unbearable slow, so I use the Feature Scaling.

Concept

For example, there are two features.
The first feature ranges from [1,100], and the second feature ranges from [1,10000].
So if we draw the contour map, it may be like this( the image on the left):

So, if we narrow the feature to the range [-1,1] (or little small or large eg.[-3,3] (max) [-1/3,1/3] (min)), the gradient descent will be like the image on the right, which will take much less time.

So, we find the max and min data in each column to get the standard deviation, and make each column data divide this number.

For example, we have database X like this:

X =

1 90 8100
1 3 9
1 68 4624
1 43 1849
1 4 16
1 88 7744
1 76 5776
1 21 441
1 12 144
1 60 3600
1 5 25
1 35 1225
1 24 576
1 5 25
1 90 8100
1 62 3844
1 6 36
1 82 6724
1 77 5929
1 15 225
1 38 1444
1 48 2304
1 46 2116
1 92 8464
1 21 441
1 45 2025

Just by the feature Scaling:

X = [ones(m,1),X(:,2) ./ (max(X,[],1) - min(X,[],1))(1,2),X(:,3) ./ (max(X,[],1) - min(X,[],1))(1,3)]

We can get the X’:

X =

1.0000e+000 9.0909e-001 8.1008e-001
1.0000e+000 3.0303e-002 9.0009e-004
1.0000e+000 6.8687e-001 4.6245e-001
1.0000e+000 4.3434e-001 1.8492e-001
1.0000e+000 4.0404e-002 1.6002e-003
1.0000e+000 8.8889e-001 7.7448e-001
1.0000e+000 7.6768e-001 5.7766e-001
1.0000e+000 2.1212e-001 4.4104e-002
1.0000e+000 1.2121e-001 1.4401e-002
1.0000e+000 6.0606e-001 3.6004e-001
1.0000e+000 5.0505e-002 2.5003e-003
1.0000e+000 3.5354e-001 1.2251e-001
1.0000e+000 2.4242e-001 5.7606e-002
1.0000e+000 5.0505e-002 2.5003e-003
1.0000e+000 9.0909e-001 8.1008e-001
1.0000e+000 6.2626e-001 3.8444e-001
1.0000e+000 6.0606e-002 3.6004e-003
1.0000e+000 8.2828e-001 6.7247e-001
1.0000e+000 7.7778e-001 5.9296e-001
1.0000e+000 1.5152e-001 2.2502e-002
1.0000e+000 3.8384e-001 1.4441e-001
1.0000e+000 4.8485e-001 2.3042e-001
1.0000e+000 4.6465e-001 2.1162e-001
1.0000e+000 9.2929e-001 8.4648e-001
1.0000e+000 2.1212e-001 4.4104e-002
1.0000e+000 4.5455e-001 2.0252e-001
1.0000e+000 1.1111e-001 1.2101e-002
1.0000e+000 1.9192e-001 3.6104e-002
1.0000e+000 4.5455e-001 2.0252e-001
1.0000e+000 9.5960e-001 9.0259e-001
1.0000e+000 4.4444e-001 1.9362e-001
1.0000e+000 9.3939e-001 8.6499e-001
1.0000e+000 7.9798e-001 6.2416e-001
1.0000e+000 8.7879e-001 7.5698e-001
1.0000e+000 1.0000e+000 9.8020e-001
1.0000e+000 2.1212e-001 4.4104e-002
1.0000e+000 7.1717e-001 5.0415e-001
1.0000e+000 9.7980e-001 9.4099e-001
1.0000e+000 6.4646e-001 4.0964e-001
1.0000e+000 7.4747e-001 5.4765e-001
1.0000e+000 8.7879e-001 7.5698e-001

And this will only take thousands of steps, compared with the millions of steps.

Expand

Except Feature Scaling, we can use another way to optimise the gradient descent.
Some data may range from [0, 5000] ,while some data may range from [-200,200].
It’s obvious that [-1,1] is much better than [0,2].
So we can make every data to the range [-range,range].
Guided by this theropy, we can get the following expression: