机器学习一线性模型

最新推荐文章于 2024-01-01 00:49:38 发布

嫩芽新枝

最新推荐文章于 2024-01-01 00:49:38 发布

阅读量441

点赞数

分类专栏： Python&ML学习必备

本文链接：https://blog.csdn.net/qq_14835271/article/details/80980465

版权

Python&ML学习必备专栏收录该内容

4 篇文章 1 订阅

订阅专栏

此为python机器学习的系列作品的第一篇，对于线性模型的求解与讨论，理论部分可以参考《python大战机器学习》，此系列的主要代码来源也是这本书。对于线性回归模型的学习数据采用scikit-learn中的diabetes数据集。总共有10个属性，都是数字化的类型数据。

def load_data():
    diabetes = datasets.load_diabetes()
    return cross_validation.train_test_split(diabetes.data, diabetes.target, test_size = 0.25, random_state =0)

def test_LinearRegression(*data):
    X_train,X_test,y_train,y_test = data
    regr = linear_model.LinearRegression(fit_intercept = True, normalize = True, copy_X = True, n_jobs = -1)
    regr.fit(X_train, y_train)
    print('Coefficients :%s, intercept %.2f ' %(regr.coef_, regr.intercept_))
    print('Residual sum of squares: %.2f' %(np.mean((regr.predict(X_test) - y_test) ** 2)))
    print('Score: %.2f ' %regr.score(X_test,y_test))

使用load_data()导入数据，使用test_LinearRegression()测试多元线性回归的性能，性能指标输出如下：

'''
Coefficients :[ -43.26774487 -208.67053951  593.39797213  302.89814903 -560.27689824
  261.47657106   -8.83343952  135.93715156  703.22658427   28.34844354], intercept 153.07 
Residual sum of squares: 3180.20
Score: 0.36    
'''

对于多元线性回归的改进一般有两种方式：L1正则化和L2正则化，其中L2正则化是通过平均相同影响因子的权重，L1正则化则是通过将影响因子小的因数权重设置为0，其可以作为一个特征选择的方式使用。下面先给出L2正则化的影响。

def test_Ridge(*data):
    X_train,X_test,y_train,y_test = data
    regr = linear_model.Ridge(alpha = 1.0, fit_intercept = True, normalize = False, copy_X = True)
    regr.fit(X_train, y_train)
    print('Coefficients :%s, intercept %.2f ' %(regr.coef_, regr.intercept_))
    print('Residual sum of squares: %.2f' %(np.mean((regr.predict(X_test) - y_test) ** 2)))
    print('Score: %.2f ' %regr.score(X_test,y_test))

其得到的结果如下：

'''
Coefficients :[  21.19927911  -60.47711393  302.87575204  179.41206395    8.90911449
  -28.8080548  -149.30722541  112.67185758  250.53760873   99.57749017], intercept 152.45 
Residual sum of squares: 3192.33
Score: 0.36

下面一张图给出alpha因子对结果的影响：

可见随着alpha的增加实验结果先增加后降低。

然后给出L2正则化对实验结果的影响，其测试代码如下：

def test_Lasso(*data):
    X_train,X_test,y_train,y_test = data
    regr = linear_model.Lasso(normalize = True)
    regr.fit(X_train,y_train)
    print('Coefficients :%s, intercept %.2f ' %(regr.coef_, regr.intercept_))
    print('Residual sum of squares: %.2f' %(np.mean((regr.predict(X_test) - y_test) ** 2)))
    print('Score: %.2f ' %regr.score(X_test,y_test))
test_Lasso(X_train,X_test,y_train,y_test)

其实验结果如下：

'''
Coefficients :[  0.          -0.         442.67992538   0.           0.
   0.          -0.           0.         330.76014648   0.        ], intercept 152.52 
Residual sum of squares: 3583.42
Score: 0.28 
normalize = True 
Coefficients :[  0.          -0.         474.30362799  12.72676075   0.
   0.          -0.           0.         356.52419331   0.        ], intercept 152.58 
Residual sum of squares: 3524.80
Score: 0.29 
'''

同样测试一下L2正则化的参数对结果的影响：

def test_Lasso_alpha(*data):
    X_train,X_test,y_train,y_test = data
    alphas = [0.01, 0.02, 0.05, 0.1, 0.5, 1, 2, 5, 10, 20, 50, 100, 200, 500, 1000]
    scores = []
    for i, alpha in enumerate(alphas):
        regr = linear_model.Lasso(alpha = alpha)
        regr.fit(X_train, y_train)
        scores.append(regr.score(X_test, y_test))
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.plot(alphas,scores)
    ax.set_xlabel(r"$\alpha$")
    ax.set_ylabel(r"Score")
    ax.set_xscale('log')
    ax.set_title("Lasso")
    plt.savefig('Lasso.png') # figure must be saved before show otherwise the figure will be all wihte
    plt.show()
test_Lasso_alpha(X_train,X_test,y_train,y_test)

结果如下：

可见随着alpha增大效果急剧下降，因为alpha越大，模型越简单，当alpha趋于无穷的时候，此时模型预测的将是一个常数b，结果肯定非常差了。

最后是同时进行L1和L2正则化，那么对应了有两个参数，代码如下：

def test_ElasticNet(*data):
    X_train,X_test,y_train,y_test = data
    regr = linear_model.ElasticNet()
    regr.fit(X_train,y_train)
    print('Coefficients :%s, intercept %.2f ' %(regr.coef_, regr.intercept_))
    print('Residual sum of squares: %.2f' %(np.mean((regr.predict(X_test) - y_test) ** 2)))
    print('Score: %.2f ' %regr.score(X_test,y_test))
test_ElasticNet(X_train,X_test,y_train,y_test)

结果如下：

'''
Coefficients :[ 0.40560736  0.          3.76542456  2.38531508  0.58677945  0.22891647
 -2.15858149  2.33867566  3.49846121  1.98299707], intercept 151.93
Residual sum of squares: 4922.36
Score: 0.01
'''

可见这个结果就非常差了，下面看看调节alpha和rho对结果的影响：

def test_ElasticNet_alpha_rho(*data):
    X_train,X_test,y_train,y_test = data
    alphas = np.logspace(-2,2)
    rhos = np.linspace(0.01, 1)
    scores = []
    for alpha in alphas:
        for rho in rhos:
            regr = linear_model.ElasticNet(alpha = alpha, l1_ratio = rho)
            regr.fit(X_train, y_train)
            scores.append(regr.score(X_test, y_test))
    alphas, rhos = np.meshgrid(alphas,rhos)
    scores = np.array(scores).reshape(alphas.shape)
    from mpl_toolkits.mplot3d import Axes3D
    from matplotlib import cm
    fig = plt.figure()
    ax = Axes3D(fig)
    surf = ax.plot_surface(alphas,rhos,scores,rstride=1, cstride=1, cmap = cm.jet, linewidth =0, antialiased = False)
    fig.colorbar(surf, shrink = 0.5, aspect = 5)
    ax.set_xlabel(r"$\alpha$")
    ax.set_ylabel(r"$\rho$")
    ax.set_zlabel("score")
    ax.set_title("ElasticNet")
    plt.savefig('ElasticNet.png') # figure must be saved before show otherwise the figure will be all wihte
    plt.show()
test_ElasticNet_alpha_rho(X_train,X_test,y_train,y_test)

得到的结果图如下：

上面是双因素对score的影响。

嫩芽新枝

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
机器学习一线性模型

此为python机器学习的系列作品的第一篇，对于线性模型的求解与讨论，理论部分可以参考《python大战机器学习》，此系列的主要代码来源也是这本书。对于线性回归模型的学习数据采用scikit-learn中的diabetes数据集。总共有10个属性，都是数字化的类型数据。def load_data(): diabetes = datasets.load_diabetes() ...
复制链接

扫一扫