特征工程(Feature Engineering)和多项式回归(Polynomial Regression)
1 Feature Engineering and Polynomial Regression Overview
Out of the box, linear regression provides a means of building models of the form:
f
w
,
b
=
w
0
x
0
+
w
1
x
1
+
.
.
.
+
w
n
−
1
x
n
−
1
+
b
(1)
f_{\mathbf{w},b} = w_0x_0 + w_1x_1+ ... + w_{n-1}x_{n-1} + b \tag{1}
fw,b=w0x0+w1x1+...+wn−1xn−1+b(1)
What if your features/data are non-linear or are combinations of features? For example, Housing prices do not tend to be linear with living area but penalize very small or very large houses resulting in the curves shown in the graphic above. How can we use the machinery of linear regression to fit this curve? Recall, the ‘machinery’ we have is the ability to modify the parameters
w
\mathbf{w}
w,
b
\mathbf{b}
b in (1) to ‘fit’ the equation to the training data. However, no amount of adjusting of
w
\mathbf{w}
w,
b
\mathbf{b}
b in (1) will achieve a fit to a non-linear curve.
- 简单说, 特征工程即创建新特征来更好拟合模型
- 多项式回归,即用多项式来拟合模型
- 事实上,我们可以发现线性回归模型(f=wx+b,拟合一次),多元回归(f=wx+b,w、x为向量,拟合多特征),而特征工程是将高次转换为新的特征,从而让多元回归拟合高次曲线,这个过程称为多项式回归。
2 Polynomial Features(多项式特征)
Above we were considering a scenario where the data was non-linear. Let’s try using what we know so far to fit a non-linear curve. We’ll start with a simple quadratic: y = 1 + x 2 y = 1+x^2 y=1+x2
You’re familiar with all the routines we’re using. They are available in the lab_utils.py file for review. We’ll use np.c_[..]
which is a NumPy routine to concatenate along the column boundary.
按照线性回归模型预测:
import numpy as np
import matplotlib.pyplot as plt
from lab_utils_multi import zscore_normalize_features, run_gradient_descent_feng
np.set_printoptions(precision=2) # reduced display precision on numpy arrays
# create target data
x = np.arange(0, 20, 1)
y = 1 + x**2
X = x.reshape(-1, 1)
model_w,model_b = run_gradient_descent_feng(X,y,iterations=1000, alpha = 1e-2)
plt.scatter(x, y, marker='x', c='r', label="Actual Value"); plt.title("no feature engineering")
plt.plot(x,X@model_w + model_b, label="Predicted Value"); plt.xlabel("X"); plt.ylabel("y"); plt.legend(); plt.show()
可以看出用一次预测效果太差,我们需要多项式特征,因此我们进行特征工程,调整x的次数。
Well, as expected, not a great fit. What is needed is something like
y
=
w
0
x
0
2
+
b
y= w_0x_0^2 + b
y=w0x02+b, or a polynomial feature.
To accomplish this, you can modify the input data to engineer the needed features. If you swap the original data with a version that squares the
x
x
x value, then you can achieve
y
=
w
0
x
0
2
+
b
y= w_0x_0^2 + b
y=w0x02+b. Let’s try it. Swap X
for X**2
below:
# create target data
x = np.arange(0, 20, 1)
y = 1 + x**2
# Engineer features
X = x**2 #<-- added engineered feature
X = X.reshape(-1, 1) #X should be a 2-D Matrix
model_w,model_b = run_gradient_descent_feng(X, y, iterations=10000, alpha = 1e-5)
plt.scatter(x, y, marker='x', c='r', label="Actual Value"); plt.title("Added x**2 feature")
plt.plot(x, np.dot(X,model_w) + model_b, label="Predicted Value"); plt.xlabel("x"); plt.ylabel("y"); plt.legend(); plt.show()
可以看出用二次拟合效果更好
3 Selecting Features(特征选择)
Above, we knew that an x 2 x^2 x2 term was required. It may not always be obvious which features are required. One could add a variety of potential features to try and find the most useful. For example, what if we had instead tried : y = w 0 x 0 + w 1 x 1 2 + w 2 x 2 3 + b y=w_0x_0 + w_1x_1^2 + w_2x_2^3+b y=w0x0+w1x12+w2x23+b ?
再看看三次拟合:
# create target data
x = np.arange(0, 20, 1)
y = x**2
# engineer features .
X = np.c_[x, x**2, x**3] #<-- added engineered feature
model_w,model_b = run_gradient_descent_feng(X, y, iterations=10000, alpha=1e-7)
plt.scatter(x, y, marker='x', c='r', label="Actual Value"); plt.title("x, x**2, x**3 features")
plt.plot(x, X@model_w + model_b, label="Predicted Value"); plt.xlabel("x"); plt.ylabel("y"); plt.legend(); plt.show()
Note the value of
w
\mathbf{w}
w, [0.08 0.54 0.03]
and b is 0.0106
.This implies the model after fitting/training is:
0.08
x
+
0.54
x
2
+
0.03
x
3
+
0.0106
0.08x + 0.54x^2 + 0.03x^3 + 0.0106
0.08x+0.54x2+0.03x3+0.0106
Gradient descent has emphasized the data that is the best fit to the
x
2
x^2
x2 data by increasing the
w
1
w_1
w1 term relative to the others. If you were to run for a very long time, it would continue to reduce the impact of the other terms.
Gradient descent is picking the ‘correct’ features for us by emphasizing its associated parameter(梯度下降通过强调其相关参数为我们选择“正确”的特征,较小的权重值意味着不太重要/正确的特征)
4 An Alternate View
Above, polynomial features were chosen based on how well they matched the target data. Another way to think about this is to note that we are still using linear regression once we have created new features. Given that, the best features will be linear relative to the target. This is best understood with an example.
从上面可以看出二次拟合效果最好,并非次数越高越好
5 Scaling features(特征缩放)
As described in the last lab, if the data set has features with significantly different scales, one should apply feature scaling to speed gradient descent. In the example above, there is x x x, x 2 x^2 x2 and x 3 x^3 x3 which will naturally have very different scales. Let’s apply Z-score normalization to our example.
# create target data
x = np.arange(0,20,1)
X = np.c_[x, x**2, x**3]
print(f"Peak to Peak range by column in Raw X:{np.ptp(X,axis=0)}")
# add mean_normalization
X = zscore_normalize_features(X)
print(f"Peak to Peak range by column in Normalized X:{np.ptp(X,axis=0)}")
x = np.arange(0,20,1)
y = x**2
X = np.c_[x, x**2, x**3]
X = zscore_normalize_features(X)
model_w, model_b = run_gradient_descent_feng(X, y, iterations=100000, alpha=1e-1)
plt.scatter(x, y, marker='x', c='r', label="Actual Value"); plt.title("Normalized x x**2, x**3 feature")
plt.plot(x,X@model_w + model_b, label="Predicted Value"); plt.xlabel("x"); plt.ylabel("y"); plt.legend(); plt.show()
Feature scaling allows this to converge much faster.
Note again the values of
w
\mathbf{w}
w. The
w
1
w_1
w1 term, which is the
x
2
x^2
x2 term is the most emphasized. Gradient descent has all but eliminated the
x
3
x^3
x3 term.
6 Complex Functions(复杂函数拟合)
With feature engineering, even quite complex functions can be modeled:
# 创建数据集
x = np.arange(0,20,1)
y = np.cos(x/2)
# np_c()用于连接矩阵
X = np.c_[x, x**2, x**3,x**4, x**5, x**6, x**7, x**8, x**9, x**10, x**11, x**12, x**13]
# 特征缩放
X = zscore_normalize_features(X)
# 多元回归
model_w,model_b = run_gradient_descent_feng(X, y, iterations=1000000, alpha = 1e-1)
plt.scatter(x, y, marker='x', c='r', label="Actual Value"); plt.title("Normalized x x**2, x**3 feature")
plt.plot(x,X@model_w + model_b, label="Predicted Value"); plt.xlabel("x"); plt.ylabel("y"); plt.legend(); plt.show()