Scikit Learn-扩展线性建模 (Scikit Learn - Extended Linear Modeling)
This chapter focusses on the polynomial features and pipelining tools in Sklearn.
本章重点介绍Sklearn中的多项式特征和流水线工具。
多项式特征介绍 (Introduction to Polynomial Features)
Linear models trained on non-linear functions of data generally maintains the fast performance of linear methods. It also allows them to fit a much wider range of data. That’s the reason in machine learning such linear models, that are trained on nonlinear functions, are used.
经过数据非线性函数训练的线性模型通常可以保持线性方法的快速性能。 它还允许他们适应更大范围的数据。 这就是在机器学习中使用此类经过非线性函数训练的线性模型的原因。
One such example is that a simple linear regression can be extended by constructing polynomial features from the coefficients.
一个这样的例子是,可以通过从系数构造多项式特征来扩展简单的线性回归。
Mathematically, suppose we have standard linear regression model then for 2-D data it would look like this −
$$Y=W_{0}+W_{1}X_{1}+W_{2}X_{2}$$数学上,假设我们有标准的线性回归模型,那么对于二维数据,它看起来像这样-
$$ Y = W_ {0} + W_ {1} X_ {1} + W_ {2} X_ {2} $$Now, we can combine the features in second-order polynomials and our model will look like as follows −
$$Y=W_{0}+W_{1}X_{1}+W_{2}X_{2}+W_{3}X_{1}X_{2}+W_{4}X_1^2+W_{5}X_2^2$$现在,我们可以将特征组合到二阶多项式中,我们的模型如下所示:
$$ Y = W_ {0} + W_ {1} X_ {1} + W_ {2} X_ {2} + W_ {3} X_ {1} X_ {2} + W_ {4} X_1 ^ 2 + W_ { 5} X_2 ^ 2 $$The above is still a linear model. Here, we saw that the resulting polynomial regression is in the same class of linear models and can be solved similarly.
以上仍然是线性模型。 在这里,我们看到了所得的多项式回归属于同一类线性模型,并且可以类似地求解。
To do so, scikit-learn provides a module named PolynomialFeatures. This module transforms an input data matrix into a new data matrix of given degree.
为此,scikit-learn提供了一个名为PolynomialFeatures的模块。 该模块将输入数据矩阵转换为给定程度的新数据矩阵。
参量 (Parameters)
Followings table consist the parameters used by PolynomialFeatures module
下表包含PolynomialFeatures模块使用的参数
Sr.No | Parameter & Description |
---|---|
1 | degree − integer, default = 2 It represents the degree of the polynomial features. |
2 | interaction_only − Boolean, default = false By default, it is false but if set as true, the features that are products of most degree distinct input features, are produced. Such features are called interaction features. |
3 | include_bias − Boolean, default = true It includes a bias column i.e. the feature in which all polynomials powers are zero. |
4 | order − str in {‘C’, ‘F’}, default = ‘C’ This parameter represents the order of output array in the dense case. ‘F’ order means faster to compute but on the other hand, it may slow down subsequent estimators. |
序号 | 参数及说明 |
---|---|
1个 | 度 -整数,默认= 2 它代表多项式特征的程度。 |
2 | interact_only-布尔值,默认= false 默认情况下,它为false,但如果设置为true,则会生成大多数度数不同的输入要素的乘积。 这些功能称为交互功能。 |
3 | include_bias-布尔值,默认= true 它包括一个偏差列,即所有多项式幂均为零的特征。 |
4 | 顺序 -str in {'C','F'},默认='C' 此参数表示密集情况下输出数组的顺序。 “ F”阶意味着更快的计算,但另一方面,它可能会减慢后续的估计量。 |
属性 (Attributes)
Followings table consist the attributes used by PolynomialFeatures module
跟随表包含PolynomialFeatures模块使用的属性
Sr.No | Attributes & Description |
---|---|
1 | powers_ − array, shape (n_output_features, n_input_features) It shows powers_ [i,j] is the exponent of the jth input in the ith output. |
2 | n_input_features _ − int As name suggests, it gives the total number of input features. |
3 | n_output_features _ − int As name suggests, it gives the total number of polynomial output features. |
序号 | 属性和说明 |
---|---|
1个 | powers_-数组,形状(n_output_features,n_input_features) 它显示powers_ [i,j]是第i个输出中第j个输入的指数。 |
2 | n_input_features _ − int 顾名思义,它给出了输入功能的总数。 |
3 | n_output_features _ − int 顾名思义,它给出了多项式输出特征的总数。 |
实施实例 (Implementation Example)
Following Python script uses PolynomialFeatures transformer to transform array of 8 into shape (4,2) −
以下Python脚本使用PolynomialFeatures转换器将8的数组转换为形状(4,2)-
from sklearn.preprocessing import PolynomialFeatures
import numpy as np
Y = np.arange(8).reshape(4, 2)
poly = PolynomialFeatures(degree=2)
poly.fit_transform(Y)
输出量 (Output)
array(
[
[ 1., 0., 1., 0., 0., 1.],
[ 1., 2., 3., 4., 6., 9.],
[ 1., 4., 5., 16., 20., 25.],
[ 1., 6., 7., 36., 42., 49.]
]
)
使用管道工具精简 (Streamlining using Pipeline tools)
The above sort of preprocessing i.e. transforming an input data matrix into a new data matrix of a given degree, can be streamlined with the Pipeline tools, which are basically used to chain multiple estimators into one.
可以使用流水线工具简化上述类型的预处理,即将输入数据矩阵转换为给定程度的新数据矩阵,该工具基本上用于将多个估计量链接为一个。
例 (Example)
The below python scripts using Scikit-learn’s Pipeline tools to streamline the preprocessing (will fit to an order-3 polynomial data).
下面的python脚本使用Scikit-learn的Pipeline工具简化了预处理(将适合3阶多项式数据)。
#First, import the necessary packages.
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
import numpy as np
#Next, create an object of Pipeline tool
Stream_model = Pipeline([('poly', PolynomialFeatures(degree=3)), ('linear', LinearRegression(fit_intercept=False))])
#Provide the size of array and order of polynomial data to fit the model.
x = np.arange(5)
y = 3 - 2 * x + x ** 2 - x ** 3
Stream_model = model.fit(x[:, np.newaxis], y)
#Calculate the input polynomial coefficients.
Stream_model.named_steps['linear'].coef_
输出量 (Output)
array([ 3., -2., 1., -1.])
The above output shows that the linear model trained on polynomial features is able to recover the exact input polynomial coefficients.
上面的输出表明,在多项式特征上训练的线性模型能够恢复精确的输入多项式系数。
翻译自: https://www.tutorialspoint.com/scikit_learn/scikit_learn_extended_linear_modeling.htm