数据挖掘路上点点滴滴,记录下机器学习常用模型(附代码),持续更新
数据划分
x_train1, x_test1, y_train1, y_test1 = train_test_split(x, y, train_size=0.8, random_state=14)
数据标准化
树类模型不需要,使数据符合N(0,1)分布。很多ML的算法要求训练的输入参数的平均值是0并且有相同阶数的方差例如:RBF核的SVM,L1和L2正则的线性回归
from sklearn.preprocessing import StandardScaler
ss= StandardScaler()
x_train=ss.fit_transform(x_train)
x_test=ss.transform(x_test)
回归:
线性回归算法模型构建
lr = LinearRegression()
lr.fit(x_train,y_train)
lr_y_test_hat = lr.predict(x_test)
lr_score = lr.score(x_test, y_test)
print ("lr:", lr_score)
Lasso回归算法模型构建
from sklearn.linear_model import Lasso
lasso = LassoCV(alphas=np.logspace(-3,1,20))