【Machine Learning】5.特征工程(Feature Engineering)和多项式回归(Polynomial Regression)

1 Feature Engineering and Polynomial Regression Overview

Out of the box, linear regression provides a means of building models of the form:
f w , b = w 0 x 0 + w 1 x 1 + . . . + w n − 1 x n − 1 + b (1) f_{\mathbf{w},b} = w_0x_0 + w_1x_1+ ... + w_{n-1}x_{n-1} + b \tag{1} fw,b=w0x0+w1x1+...+wn1xn1+b(1)
What if your features/data are non-linear or are combinations of features? For example, Housing prices do not tend to be linear with living area but penalize very small or very large houses resulting in the curves shown in the graphic above. How can we use the machinery of linear regression to fit this curve? Recall, the ‘machinery’ we have is the ability to modify the parameters w \mathbf{w} w, b \mathbf{b} b in (1) to ‘fit’ the equation to the training data. However, no amount of adjusting of w \mathbf{w} w, b \mathbf{b} b in (1) will achieve a fit to a non-linear curve.

  • 简单说, 特征工程即创建新特征来更好拟合模型
  • 多项式回归,即用多项式来拟合模型
  • 事实上,我们可以发现线性回归模型(f=wx+b,拟合一次),多元回归(f=wx+b,w、x为向量,拟合多特征),而特征工程是将高次转换为新的特征,从而让多元回归拟合高次曲线,这个过程称为多项式回归。

2 Polynomial Features(多项式特征)

Above we were considering a scenario where the data was non-linear. Let’s try using what we know so far to fit a non-linear curve. We’ll start with a simple quadratic: y = 1 + x 2 y = 1+x^2 y=1+x2

You’re familiar with all the routines we’re using. They are available in the lab_utils.py file for review. We’ll use np.c_[..] which is a NumPy routine to concatenate along the column boundary.
按照线性回归模型预测:

import numpy as np
import matplotlib.pyplot as plt
from lab_utils_multi import zscore_normalize_features, run_gradient_descent_feng
np.set_printoptions(precision=2)  # reduced display precision on numpy arrays

# create target data
x = np.arange(0, 20, 1)
y = 1 + x**2
X = x.reshape(-1, 1)

model_w,model_b = run_gradient_descent_feng(X,y,iterations=1000, alpha = 1e-2)

plt.scatter(x, y, marker='x', c='r', label="Actual Value"); plt.title("no feature engineering")
plt.plot(x,X@model_w + model_b, label="Predicted Value");  plt.xlabel("X"); plt.ylabel("y"); plt.legend(); plt.show()

线性回归预测效果
可以看出用一次预测效果太差,我们需要多项式特征,因此我们进行特征工程,调整x的次数。
Well, as expected, not a great fit. What is needed is something like y = w 0 x 0 2 + b y= w_0x_0^2 + b y=w0x02+b, or a polynomial feature.
To accomplish this, you can modify the input data to engineer the needed features. If you swap the original data with a version that squares the x x x value, then you can achieve y = w 0 x 0 2 + b y= w_0x_0^2 + b y=w0x02+b. Let’s try it. Swap X for X**2 below:

# create target data
x = np.arange(0, 20, 1)
y = 1 + x**2

# Engineer features 
X = x**2      #<-- added engineered feature

X = X.reshape(-1, 1)  #X should be a 2-D Matrix
model_w,model_b = run_gradient_descent_feng(X, y, iterations=10000, alpha = 1e-5)

plt.scatter(x, y, marker='x', c='r', label="Actual Value"); plt.title("Added x**2 feature")
plt.plot(x, np.dot(X,model_w) + model_b, label="Predicted Value"); plt.xlabel("x"); plt.ylabel("y"); plt.legend(); plt.show()

请添加图片描述
可以看出用二次拟合效果更好

3 Selecting Features(特征选择)

Above, we knew that an x 2 x^2 x2 term was required. It may not always be obvious which features are required. One could add a variety of potential features to try and find the most useful. For example, what if we had instead tried : y = w 0 x 0 + w 1 x 1 2 + w 2 x 2 3 + b y=w_0x_0 + w_1x_1^2 + w_2x_2^3+b y=w0x0+w1x12+w2x23+b ?

再看看三次拟合:

# create target data
x = np.arange(0, 20, 1)
y = x**2
# engineer features .
X = np.c_[x, x**2, x**3]   #<-- added engineered feature

model_w,model_b = run_gradient_descent_feng(X, y, iterations=10000, alpha=1e-7)

plt.scatter(x, y, marker='x', c='r', label="Actual Value"); plt.title("x, x**2, x**3 features")
plt.plot(x, X@model_w + model_b, label="Predicted Value"); plt.xlabel("x"); plt.ylabel("y"); plt.legend(); plt.show()

请添加图片描述
Note the value of w \mathbf{w} w, [0.08 0.54 0.03] and b is 0.0106.This implies the model after fitting/training is:
0.08 x + 0.54 x 2 + 0.03 x 3 + 0.0106 0.08x + 0.54x^2 + 0.03x^3 + 0.0106 0.08x+0.54x2+0.03x3+0.0106
Gradient descent has emphasized the data that is the best fit to the x 2 x^2 x2 data by increasing the w 1 w_1 w1 term relative to the others. If you were to run for a very long time, it would continue to reduce the impact of the other terms.

Gradient descent is picking the ‘correct’ features for us by emphasizing its associated parameter(梯度下降通过强调其相关参数为我们选择“正确”的特征,较小的权重值意味着不太重要/正确的特征)

4 An Alternate View

Above, polynomial features were chosen based on how well they matched the target data. Another way to think about this is to note that we are still using linear regression once we have created new features. Given that, the best features will be linear relative to the target. This is best understood with an example.

从上面可以看出二次拟合效果最好,并非次数越高越好

5 Scaling features(特征缩放)

As described in the last lab, if the data set has features with significantly different scales, one should apply feature scaling to speed gradient descent. In the example above, there is x x x, x 2 x^2 x2 and x 3 x^3 x3 which will naturally have very different scales. Let’s apply Z-score normalization to our example.

# create target data
x = np.arange(0,20,1)
X = np.c_[x, x**2, x**3]
print(f"Peak to Peak range by column in Raw        X:{np.ptp(X,axis=0)}")

# add mean_normalization 
X = zscore_normalize_features(X)     
print(f"Peak to Peak range by column in Normalized X:{np.ptp(X,axis=0)}")

x = np.arange(0,20,1)
y = x**2

X = np.c_[x, x**2, x**3]
X = zscore_normalize_features(X) 

model_w, model_b = run_gradient_descent_feng(X, y, iterations=100000, alpha=1e-1)

plt.scatter(x, y, marker='x', c='r', label="Actual Value"); plt.title("Normalized x x**2, x**3 feature")
plt.plot(x,X@model_w + model_b, label="Predicted Value"); plt.xlabel("x"); plt.ylabel("y"); plt.legend(); plt.show()

请添加图片描述

Feature scaling allows this to converge much faster.
Note again the values of w \mathbf{w} w. The w 1 w_1 w1 term, which is the x 2 x^2 x2 term is the most emphasized. Gradient descent has all but eliminated the x 3 x^3 x3 term.

6 Complex Functions(复杂函数拟合)

With feature engineering, even quite complex functions can be modeled:

# 创建数据集
x = np.arange(0,20,1)
y = np.cos(x/2)

# np_c()用于连接矩阵
X = np.c_[x, x**2, x**3,x**4, x**5, x**6, x**7, x**8, x**9, x**10, x**11, x**12, x**13]
# 特征缩放
X = zscore_normalize_features(X) 

# 多元回归
model_w,model_b = run_gradient_descent_feng(X, y, iterations=1000000, alpha = 1e-1)

plt.scatter(x, y, marker='x', c='r', label="Actual Value"); plt.title("Normalized x x**2, x**3 feature")
plt.plot(x,X@model_w + model_b, label="Predicted Value"); plt.xlabel("x"); plt.ylabel("y"); plt.legend(); plt.show()

请添加图片描述

  • 21
    点赞
  • 15
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
基于多项式回归的模糊C均值算法(PRFCM)是一种扩展的模糊C均值聚类算法,它结合了多项式回归模型和模糊聚类方法,用于处理非线性关系和模糊性质的数据。 PRFCM的工作流程如下: 1. 数据预处理:首先,对原始数据进行标准化处理,以确保所有特征具有相似的尺度。 2. 初始化隶属度矩阵:随机初始化每个数据点对于每个聚类中心的隶属度值,这些隶属度值表示数据点属于每个聚类的程度。 3. 计算聚类中心:根据隶属度矩阵,计算每个聚类中心的位置。聚类中心是通过加权平均计算得到的,权重是根据隶属度矩阵计算的。 4. 更新隶属度矩阵:根据当前的聚类中心,计算每个数据点对于每个聚类中心的新隶属度值。这里使用多项式回归模型来估计数据点与聚类中心之间的关系。 5. 重复步骤3和步骤4,直到达到收敛条件。一般情况下,可以设置最大迭代次数或者设定聚类中心的变化小于某个阈值作为收敛条件。 6. 输出聚类结果:根据最终的隶属度矩阵,确定每个数据点所属的聚类。 PRFCM的关键特点是引入了多项式回归模型来建立数据点与聚类中心之间的关系。多项式回归能够捕捉到数据中的非线性关系,从而提高了聚类的准确性。同时,通过隶属度矩阵的引入,PRFCM能够处理模糊性质的数据,允许数据点属于多个聚类,并给出对应的隶属度值。 PRFCM算法在许多实际应用中都具有良好的效果,特别是在数据具有非线性关系和模糊性质时。它不仅能够提供准确的聚类结果,还能给出数据点与聚类中心之间的关系程度,为进一步分析和决策提供了有价值的信息。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值