sklearn.preprocessing.PolynomialFeatures 用法

最新推荐文章于 2023-09-01 16:06:01 发布

yangzhenzhen

最新推荐文章于 2023-09-01 16:06:01 发布

阅读量5.5k

点赞数 1

１、sklearn.preprocessing.PolynomialFeatures类
先给出它的官方文档链接http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html。
首先要知道它是一个类。全称如下：class sklearn.preprocessing.PolynomialFeatures(degree=2, interaction_only=False, include_bias=True)
官文的注释如下：
Generate polynomial and interaction features.
Generate a new feature matrix consisting of all polynomial combinations of the features with degree less than or equal to the specified degree. For example, if an input sample is two dimensional and of the form [a, b], the degree-2 polynomial features are [1, a, b, a^2, ab, b^2].

我的理解是：专门产生多项式的，并且多项式包含的是相互影响的特征集。比如：一个输入样本是２维的。形式如[a,b] ,则二阶多项式的特征集如下[1,a,b,a^2,ab,b^2]。

参数理解：（一共只有３个参数）
degree : integer
The degree of the polynomial features. Default = 2.
多项式的阶数，一般默认是２。

interaction_only : boolean, default = False
If true, only interaction features are produced: features that are products of at most degree distinct input features (so not x[1] ** 2, x[0] * x[2] ** 3, etc.).
如果值为true(默认是false),则会产生相互影响的特征集。

include_bias : boolean
If True (default), then include a bias column, the feature in which all polynomial powers are zero (i.e. a column of ones - acts as an intercept term in a linear model).
是否包含偏差列

属性：
powers_ : array, shape (n_input_features, n_output_features)
powers_[i, j] is the exponent of the jth input in the ith output.

n_input_features_ : int
The total number of input features.
输入特征的个数

n_output_features_ : int
The total number of polynomial output features. The number of output features is computed by iterating over all suitably sized combinations of input features.
输出多项式的特征个数。它的计算是通过遍历所有的适当大小的输入特征组合。

note：Be aware that the number of features in the output array scales polynomially in the number of features of the input array, and exponentially in the degree. High degrees can cause overfitting.
注意：多项式的阶数不要太高，否则会出现过拟合。

方法：
Methods:
1.fit(X, y=None)
Compute number of output features.
计算输出特征的个数
2.fit_transform(X, y=None, **fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters:
X : numpy array of shape [n_samples, n_features]
Training set.
y : numpy array of shape [n_samples]
Target values.
Returns:
X_new : numpy array of shape [n_samples, n_features_new]
Transformed array.

输入参数：输入特征矩阵
返回：输出特征矩阵

get_params([deep]) Get parameters for this estimator.
set_params(**params) Set the parameters of this estimator.
transform(X[, y]) Transform data to polynomial features

例子如下：
>>> X = np.arange(6).reshape(3, 2)
>>> X
array([[0, 1],
[2, 3],
[4, 5]])
>>> poly = PolynomialFeatures(2)　＃设置多项式阶数为２，其他的默认
>>> poly.fit_transform(X)
array([[ 1, 0, 1, 0, 0, 1],
[ 1, 2, 3, 4, 6, 9],
[ 1, 4, 5, 16, 20, 25]])
>>> poly = PolynomialFeatures(interaction_only=True)＃默认的阶数是２，同时设置交互关系为true
>>> poly.fit_transform(X)
array([[ 1, 0, 1, 0],
[ 1, 2, 3, 6],
[ 1, 4, 5, 20]])

备注：上面的数组中，每一行是一个list。比如[0,1] 类似与上面的[a,b]。好的现在它的多项式输出矩阵就是
[1,a,b,a^2,ab,b^2]。所以就是下面对应的[1,0,1,0,0,1]。
现在将interaction_only=True。这时就是只找交互作用的多项式输出矩阵。例如[a,b]的多项式交互式输出[1,a,b,ab]。不存在自己与自己交互的情况如;a^2或者a*b^2之类的。

官网里面还给了一个非常详细的例子叫Polynomial interpolation：链接如下http://scikit-learn.org/stable/auto_examples/linear_model/plot_polynomial_interpolation.html#example-linear-model-plot-polynomial-interpolation-py

Polynomial interpolation

This example demonstrates how to approximate a function with a polynomial of degree n_degree by using ridge regression. Concretely, from n_samples 1d points, it suffices to build the Vandermonde matrix, which is n_samples x n_degree+1 and has the following form:
这个例子是示范如何通过岭回归使用一个多项式来近似一个函数。具体的说就是不从n个输入特征的第一个特征开始，它是能够构建一个范德蒙矩阵，故n个输入特征构成的范德蒙矩阵如下：
[[1, x_1, x_1 ** 2, x_1 ** 3, ...],
[1, x_2, x_2 ** 2, x_2 ** 3, ...], ...]

Intuitively, this matrix can be interpreted as a matrix of pseudo features (the points raised to some power). The matrix is akin to (but different from) the matrix induced by a polynomial kernel.
直观地说，这个矩阵可以被解释为一个伪特征矩阵。这个矩阵就类似于由多项式内核产生的矩阵。
This example shows that you can do non-linear regression with a linear model, using a pipeline to add non-linear features. Kernel methods extend this idea and can induce very high (even infinite) dimensional feature spaces.
这个例子向我们展示你可以使用线性回归来做非线性回归的预测。使用pipeline来添加非线性特征。内核方法扩展了这个主意，同时也生成了高维的特征空间。

代码如下：
#-*- coding:utf-8-*-
print(__doc__)

# Author: Mathieu Blondel
# Jake Vanderplas
# License: BSD 3 clause

import numpy as np
import matplotlib.pyplot as plt

from sklearn.linear_model import Ridge
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline

def f(x):
""" function to approximate by polynomial interpolation"""
return x * np.sin(x)

# generate points used to plot
x_plot = np.linspace(0, 10, 100)#在０－１０之间均匀的取100个数
print 'x_plot',x_plot,x_plot.shape
# generate points and keep a subset of them
x = np.linspace(0, 10, 100)
rng = np.random.RandomState(0)#产生一个伪随机数
rng.shuffle(x)#现场修改序列，改变自身内容。（类似洗牌，打乱顺序）
x = np.sort(x[:20])#产生一个子集,只去前20个数.用于作为训练点
print 'x',x,type(x),x.shape
y = f(x) #y=x * np.sin(x)
print 'y',y,type(y),y.shape
# create matrix versions of these arrays
X = x[:, np.newaxis]
print 'X',X,type(X),X.shape
X_plot = x_plot[:, np.newaxis]#将一维的数组转化为矩阵形式
print 'X_plot',X_plot,type(X_plot),X_plot.shape

plt.plot(x_plot, f(x_plot), label="ground truth")
plt.scatter(x, y, label="training points")#画出散点图

for degree in [3, 4, 5]:
model = make_pipeline(PolynomialFeatures(degree), Ridge())#使用岭回归来进行多项式的特征输出
model.fit(X, y)#训练模型
y_plot = model.predict(X_plot)#预测标签值
plt.plot(x_plot, y_plot, label="degree %d" % degree)

plt.legend(loc='lower left')#画出画线标签

plt.show()

图片显示是：polyfeature.png

yangzhenzhen

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
sklearn.preprocessing.PolynomialFeatures 用法

１、sklearn.preprocessing.PolynomialFeatures类先给出它的官方文档链接http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html。首先要知道它是一个类。全称如下：class sklearn.preprocessing.Pol
复制链接

扫一扫