使用sklearn.preprocessing.PolynomialFeatures来进行特征的构造。
它是使用多项式的方法来进行的,如果有a,b两个特征,那么它的2次多项式为(1,a,b,a^2,ab, b^2)。
PolynomialFeatures有三个参数
degree:控制多项式的度
interaction_only: 默认为False,如果指定为True,那么就不会有特征自己和自己结合的项,上面的二次项中没有a^2和b^2。
include_bias:默认为True。如果为True的话,那么就会有上面的 1那一项。
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
path = r"activity_recognizer\1.csv"
#数据在https://archive.ics.uci.edu/ml/datasets/Activity+Recognition+from+Single+Chest-Mounted+Accelerometer
df = pd.read_csv(path, header=None)
df.columns = ['index', 'x', 'y', 'z', 'activity']
knn = KNeighborsClassifier()
knn_params = {'n_neighbors':[3, 4, 5, 6]}
X = df[['x', 'y', 'z']]
y = df['activity']
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(degree=2, include_bias=False, interaction_only=False)
X_ploly = poly.fit_transform(X)
X_ploly_df = pd.DataFrame(X_ploly, columns=poly.get_feature_names())
print(X_ploly_df.head())
结果:
x0 x1 x2 x0^2 x0 x1 x0 x2 x1^2 \
0 1502.0 2215.0 2153.0 2256004.0 3326930.0 3233806.0 4906225.0
1 1667.0 2072.0 2047.0 2778889.0 3454024.0 3412349.0 4293184.0
2 1611.0 1957.0 1906.0 2595321.0 3152727.0 3070566.0 3829849.0
3 1601.0 1939.0 1831.0 2563201.0 3104339.0 2931431.0 3759721.0
4 1643.0 1965.0 1879.0 2699449.0 3228495.0 3087197.0 3861225.0
x1 x2 x2^2
0 4768895.0 4635409.0
1 4241384.0 4190209.0
2 3730042.0 3632836.0
3 3550309.0 3352561.0
4 3692235.0 3530641.0