多项式回归
from sklearn.preprocessing import PolynomialFeatures
poly3 = PolynomialFeatures(degree = 3)
x_x2_and_x3 = poly3.fit_transform(arbitraary_data[['x']])
代码说明:指定计算所有指数不超过3的 degree =3,将数据转换为一个特征矩阵,
如果是具体的数据,.fit_transform() 内可以填写具体的数据,.shape()可以查看特征矩阵
用.get_feature_names([特征]) 查看特征名称
后续运行就用线性回归做预测
from sklearn.pipeline import Pipeline
pipeline_version_of_model = Pipeline([
('josh_transform',PolynomialFeatures(degree=3)),
('josh_regression',LinearRegression(fit_intercept = False)),
])
pipeline_version_of_model.fit(arbitrary_data[['x']],arbitrary_data['y'])
pipeline_version_of_model.named_steps['jsoh_regression'].coef_
两个步骤同时写(特征矩阵和回归),
six_vehicles = vehicle_data.sample(6)
从数据中随机选取6个数据
训练数据和验证数据
这网站可以了解更多方差的信息:
https://en.wikipedia.org/wiki/Bias-variance_tradeoff
把数据分为两块,训练数据用来训练模型,验证数据用来计算损失
from sklearn.utils import shuffle
diamond_data = shuffile(diamond_data)
代码说明:shuffile是将数据随机打乱
a,b,c = np.split(diamond_data,[1500,1800])
代码说明:np的分割,将数据分为三部分,1500之前,1500-1800,1800之后
正则化
a = linear_model.Ridge(alpha = 1)
a.fit(b,c[['price']])
代码说明:alpha:元参数,和模型复杂度有关联,alpha越大,参数越小
ss = StandardScaler()
ss.fit(a)
a_ss = ss.transform(a)
b_ss = linear_model.Ridge(alpha = 100)
b_ss.fit(b,c[['price']])
代码说明:对数据进行缩放
用Pipeline来写会简单一些
abc = Pipeline([
('scale',StandardScraler()),
('model',linear_model.Ridge(alpha=100))
])
abc.fit(b,c[['price']])
Lasso回归
abc = Pipeline([
('scale',StandardScraler()),
('poly',PolynomialFeatures(degree=3))
('model',linear_model.Lasso(alpha=80,fit_intercept = False))
])
abc.fit(b,c[['price']])