[Machine Learning][Linear Regression]Feature Scaling

Introduction

When I use gradient descent to get the h(x) which is similar to ‘x^2 + 2*x + 1’, I find a problem that the alpha need to be small like 0.000001, otherwise the variable can’t be regressed, so that the training will become unbearable slow, so I use the Feature Scaling.

Concept

For example, there are two features.
The first feature ranges from [1,100], and the second feature ranges from [1,10000].
So if we draw the contour map, it may be like this( the image on the left):

So, if we narrow the feature to the range [-1,1] (or little small or large eg.[-3,3] (max) [-1/3,1/3] (min)), the gradient descent will be like the image on the right, which will take much less time.

So, we find the max and min data in each column to get the standard deviation, and make each column data divide this number.

For example, we have database X like this:

X =

1 90 8100
1 3 9
1 68 4624
1 43 1849
1 4 16
1 88 7744
1 76 5776
1 21 441
1 12 144
1 60 3600
1 5 25
1 35 1225
1 24 576
1 5 25
1 90 8100
1 62 3844
1 6 36
1 82 6724
1 77 5929
1 15 225
1 38 1444
1 48 2304
1 46 2116
1 92 8464
1 21 441
1 45 2025

Just by the feature Scaling:

X = [ones(m,1),X(:,2) ./ (max(X,[],1) - min(X,[],1))(1,2),X(:,3) ./ (max(X,[],1) - min(X,[],1))(1,3)]

We can get the X’:

X =

1.0000e+000 9.0909e-001 8.1008e-001
1.0000e+000 3.0303e-002 9.0009e-004
1.0000e+000 6.8687e-001 4.6245e-001
1.0000e+000 4.3434e-001 1.8492e-001
1.0000e+000 4.0404e-002 1.6002e-003
1.0000e+000 8.8889e-001 7.7448e-001
1.0000e+000 7.6768e-001 5.7766e-001
1.0000e+000 2.1212e-001 4.4104e-002
1.0000e+000 1.2121e-001 1.4401e-002
1.0000e+000 6.0606e-001 3.6004e-001
1.0000e+000 5.0505e-002 2.5003e-003
1.0000e+000 3.5354e-001 1.2251e-001
1.0000e+000 2.4242e-001 5.7606e-002
1.0000e+000 5.0505e-002 2.5003e-003
1.0000e+000 9.0909e-001 8.1008e-001
1.0000e+000 6.2626e-001 3.8444e-001
1.0000e+000 6.0606e-002 3.6004e-003
1.0000e+000 8.2828e-001 6.7247e-001
1.0000e+000 7.7778e-001 5.9296e-001
1.0000e+000 1.5152e-001 2.2502e-002
1.0000e+000 3.8384e-001 1.4441e-001
1.0000e+000 4.8485e-001 2.3042e-001
1.0000e+000 4.6465e-001 2.1162e-001
1.0000e+000 9.2929e-001 8.4648e-001
1.0000e+000 2.1212e-001 4.4104e-002
1.0000e+000 4.5455e-001 2.0252e-001
1.0000e+000 1.1111e-001 1.2101e-002
1.0000e+000 1.9192e-001 3.6104e-002
1.0000e+000 4.5455e-001 2.0252e-001
1.0000e+000 9.5960e-001 9.0259e-001
1.0000e+000 4.4444e-001 1.9362e-001
1.0000e+000 9.3939e-001 8.6499e-001
1.0000e+000 7.9798e-001 6.2416e-001
1.0000e+000 8.7879e-001 7.5698e-001
1.0000e+000 1.0000e+000 9.8020e-001
1.0000e+000 2.1212e-001 4.4104e-002
1.0000e+000 7.1717e-001 5.0415e-001
1.0000e+000 9.7980e-001 9.4099e-001
1.0000e+000 6.4646e-001 4.0964e-001
1.0000e+000 7.4747e-001 5.4765e-001
1.0000e+000 8.7879e-001 7.5698e-001

And this will only take thousands of steps, compared with the millions of steps.

Expand

Except Feature Scaling, we can use another way to optimise the gradient descent.
Some data may range from [0, 5000] ,while some data may range from [-200,200].
It’s obvious that [-1,1] is much better than [0,2].
So we can make every data to the range [-range,range].
Guided by this theropy, we can get the following expression:

X=XAverageOf(X)StandardDeviation

This will also make the training faster.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值