sklearn.preprocessing.PolynomialFeatures类的使用

最新推荐文章于 2022-11-24 22:57:19 发布

panghaomingme

最新推荐文章于 2022-11-24 22:57:19 发布

阅读量796

点赞数

分类专栏： Scikit Learn Python

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/panghaomingme/article/details/56677028

版权

Python 同时被 2 个专栏收录

24 篇文章 2 订阅

订阅专栏

17 篇文章 0 订阅

订阅专栏

在之前的代码中多次出现了使用sklearn.pipeline.Pipeline和sklearn.preprocessing.PolynomialFeatures这两个类。我在找相关资料的时候发现很少有写这方面的文章和博客。除了官网的英文文档，其实这个文档写的非常好。但考虑到自己的英文水平有限，于是想写点什么来记录这两个类。
１、sklearn.preprocessing.PolynomialFeatures类
先给出它的官方文档链接http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html。
首先要知道它是一个类。全称如下：class sklearn.preprocessing.PolynomialFeatures(degree=2, interaction_only=False, include_bias=True)

官文的注释如下：
Generate polynomial and interaction features.
Generate a new feature matrix consisting of all polynomial combinations of the features with degree less than or equal to the specified degree. For example, if an input sample is two dimensional and of the form [a, b], the degree-2 polynomial features are [1, a, b, a^2, ab, b^2].

我的理解是：专门产生多项式的，并且多项式包含的是相互影响的特征集。比如：一个输入样本是２维的。形式如[a,b] ,则二阶多项式的特征集如下[1,a,b,a^2,ab,b^2]。

参数理解：（一共只有３个参数）
degree : integer
The degree of the polynomial features. Default = 2.
多项式的阶数，一般默认是２。

interaction_only : boolean, default = False
If true, only interaction features are produced: features that are products of at most degree distinct input features (so not x[1] ** 2, x[0] * x[2] ** 3, etc.).
如果值为true(默认是false),则会产生相互影响的特征集。

include_bias : boolean
If True (default), then include a bias column, the feature in which all polynomial powers are zero (i.e. a column of ones - acts as an intercept term in a linear model).
是否包含偏差列

属性：
powers_ : array, shape (n_input_features, n_output_features)
powers_[i, j] is the exponent of the jth input in the ith output.

n_input_features_ : int
The total number of input features.
输入特征的个数

n_output_features_ : int
The total number of polynomial output features. The number of output features is computed by iterating over all suitably sized combinations of input features.
输出多项式的特征个数。它的计算是通过遍历所有的适当大小的输入特征组合。

note：Be aware that the number of features in the output array scales polynomially in the number of features of the input array, and exponentially in the degree. High degrees can cause overfitting.
注意：多项式的阶数不要太高，否则会出现过拟合。

方法：
Methods:
1.fit(X, y=None)
Compute number of output features.
计算输出特征的个数
2.fit_transform(X, y=None, **fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters:
X : numpy array of shape [n_samples, n_features]
Training set.
y : numpy array of shape [n_samples]
Target values.
Returns:
X_new : numpy array of shape [n_samples, n_features_new]
Transformed array.

输入参数：输入特征矩阵
返回：输出特征矩阵

get_params([deep]) Get parameters for this estimator.
set_params(**params) Set the parameters of this estimator.
transform(X[, y]) Transform data to polynomial features

例子如下：
>>> X = np.arange(6).reshape(3, 2)
>>> X
array([[0, 1],
[2, 3],
[4, 5]])
>>> poly = PolynomialFeatures(2)　＃设置多项式阶数为２，其他的默认
>>> poly.fit_transform(X)
array([[ 1, 0, 1, 0, 0, 1],
[ 1, 2, 3, 4, 6, 9],
[ 1, 4, 5, 16, 20, 25]])
>>> poly = PolynomialFeatures(interaction_only=True)＃默认的阶数是２，同时设置交互关系为true
>>> poly.fit_transform(X)
array([[ 1, 0, 1, 0],
[ 1, 2, 3, 6],
[ 1, 4, 5, 20]])

备注：上面的数组中，每一行是一个list。比如[0,1] 类似与上面的[a,b]。好的现在它的多项式输出矩阵就是
[1,a,b,a^2,ab,b^2]。所以就是下面对应的[1,0,1,0,0,1]。
现在将interaction_only=True。这时就是只找交互作用的多项式输出矩阵。例如[a,b]的多项式交互式输出[1,a,b,ab]。不存在自己与自己交互的情况如;a^2或者a*b^2之类的。

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。