正则化线性回归

打倒帝国主义

于 2024-06-16 17:16:44 发布

阅读量525

点赞数 6

文章标签：线性回归算法人工智能

本文链接：https://blog.csdn.net/2401_83040292/article/details/139722462

版权

class sklearn.linear_model.Ridge(alpha=1.0, fit_intercept=True, normalize=False,

copy

Sklearn实现岭回归的方法

clas

class sklearn.linear_model.Ridge(alpha=1.0, fit_intercept=True, normalize=False,
copy_X=True, max_iter=None, tol=0.001, solver=
’auto’
, random_state=None)

参数释义：

alpha： 正则化系数，float类型，默认为1.0。正则化改善了问题的条件并减少了估

计的方差。较大的值指定较强的正则化。

fit_intercept： 是否需要截距，bool类型，默认为True。

normalize： 是否先进行归一化，bool类型，默认为False。如果为真，则回归X将

在回归之前被归一化。

copy_X： 是否复制X数组，bool类型，默认为True，如果为True，将复制X数组;

否则，它覆盖原数组X。

max_iter： 最大的迭代次数，int类型，默认为None，最大的迭代次数，对于

sparse_cg和lsqr而言，默认次数取决于scipy.sparse.linalg，对于sag而言，则默

认为1000次。

tol： 精度，float类型，默认为0.001。就是解的精度。

solver： 求解方法，str类型，默认为auto。可选参数为：auto、svd、cholesky、

lsqr、sparse_cg、sag。

示例：

import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model
from sklearn.metrics import mean_squared_error,r2_score
#定义样本数量和特征
num_sample=1000#样本点
num_feature=1
weight=-3.4#一次项系数
b_true=4.3#截距
feature=np.random.normal(size=(num_sample,num_feature))#自变量数组
label=weight*feature+b_true+np.random.normal(size=(num_sample,num_feature))#因变量数组，y=-3.4x+4.3+随机扰动

X_train = feature[:-100] # 取除最后100个样本外的所有样本作为训练集
X_test = feature[-100:]   # 取最后100个样本作为测试集
y_train = label[:-100] 
y_test = label[-100:] 

#创建Ridge模型
reg=linear_model.Ridge(alpha=0.5,fit_intercept=True) #岭参数=0.5，带截距
reg.fit(X_train,y_train)
y_predict=reg.predict(X_test)#得到结果
#求解均方误差
print("mean_square_error:%.2f"%mean_squared_error(y_test,y_predict))
#求解R^2
print('Coefficient of determination: %.2f' % r2_score(y_test, y_predict))
#输出一次项系数
print("Coefficient of the model:%.2f"%reg.coef_)
#输出常数项
print("intercept of the model:%.2f"%reg.intercept_)

#绘图
ax =plt.subplot(111)
ax.scatter(X_test,y_test)
ax.plot(X_test,y_predict)
ax.set_ylabel('Y')
ax.set_xlabel('X')
plt.show()

交叉验证选择岭参数

class sklearn.linear_model.RidgeCV(alphas=(0.1, 1.0, 10.0),
*
, fit_intercept=True,
normalize=
'deprecated'
, scoring=None, cv=None, gcv_mode=None,
store_cv_values=False, alpha_per_target=False)

参数释义：

alpha： 待选择的岭参数构成的数组，元素必须为正浮点型数据。

fit_intercept： 是否需要截距，bool类型，默认为True。

cv： 整数，交叉验证设置参数，默认值为None，这时选用留一法进行交叉验证，如果设置为整数，

那就视为K-fold方法的折叠次数。

store_cv_values： bool类型, 默认为False，表示是否存储交叉验证时对应于每一个alpha值的交叉

验证结果。

alpha_per_target： bool类型, 默认为False，表示是否针对每一个目标单独优化alpha值

cv_values_： ndarray of shape (n_samples, n_alphas)，每一次交叉验证时，对应alpha的CV值。

coef_： 浮点型ndarray数组，计算得出的回归系数（不含截距）

intercept_： 浮点型ndarray数组，计算得出的截距

best_score_： ndarray数组，返回最佳alpha值对应的Score，如果设置 if alpha_per_target=True,

则返回每一个目标对应的score

示例:

from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import RidgeCV
import matplotlib.font_manager as fm  #设置字体
myfont = fm.FontProperties(fname='C:\Windows\Fonts\simsun.ttc')
data=[
[0.607492, 3.965162], [0.358622, 3.514900], [0.147846, 3.125947], [0.637820, 4.094115], [0.230372, 3.476039], [0.070237, 3.210610], [0.067154, 3.190612], [0.925577, 4.631504], [0.717733, 4.295890], [0.015371, 3.085028], [0.067732, 3.176513], [0.427810, 3.816464], [0.995731, 4.550095], [0.738336, 4.256571], [0.981083, 4.560815], [0.247809, 3.476346], [0.648270, 4.119688], [0.731209, 4.282233], [0.236833, 3.486582], [0.969788, 4.655492], [0.335070, 3.448080], [0.040486, 3.167440], [0.212575, 3.364266], [0.617218, 3.993482], [0.541196, 3.891471], [0.526171, 3.929515], [0.378887, 3.526170], [0.033859, 3.156393], [0.132791, 3.110301], [0.138306, 3.149813]
]

#生成X,y矩阵
dataMat = np.array(data)
X = dataMat[:,0:1] # 变量x
y = dataMat[:,1] #变量y
X_train,X_test,y_train,y_test = train_test_split(X,y ,train_size=0.8)
model = RidgeCV(alphas=[0.01,0.02,0.05,0.1, 0.2,0.5,1.0,2.0,5.0,10.0])#通过RidgeCV可以设置多个取值，算法使用交叉验证获取最佳参数。
model.fit(X_train, y_train) # 线性回归建模
print('系数矩阵:\n',model.coef_,model.intercept_)
print('线性回归模型:\n',model)
print('交叉验证最佳alpha值',model.alpha_) # 只有在使用RidgeCV算法时才有效

Lasso

class sklearn.linear_model.Lasso(alpha=1.0,*, fit_intercept=True,normalize='deprecated',precompute=False, copy_X=True, max_iter=1000,
tol=0.0001, warm_start=False, positive=False, random_state=None, selection='cyclic')

参数释义：

precompute： bool类型或大小为(n_features, n_features)的数组，表示是否使

用预先计算的Gram矩阵增加计算速度。默认值False。

warm_start： bool类型,默认值为False，表示是否重复利用上一次调用的拟合结

果作为初始值。

positive： bool类型，默认值为False。表示是否强制要求所有系数为正数

示例：

import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model
from sklearn.metrics import mean_squared_error,r2_score
#定义样本和特征数量
num_sample=1000#样本点个数
num_feature=1
weight=-3.4
b_true=4.3
feature=np.random.normal(size=(num_sample,num_feature))
label=weight*feature+b_true+np.random.normal(size=(num_sample,num_feature))

X_train = feature[:-100]
X_test = feature[-100:]
y_train = label[:-100]
y_test = label[-100:]
#lasso
reg=linear_model.Lasso(alpha=0.5,fit_intercept=True) #正则化参数=0.5，带截距
reg.fit(X_train,y_train)
y_predict=reg.predict(X_test)
#均方误差
print("mean_square_error:%.2f"%mean_squared_error(y_test,y_predict))
#R^2
print('Coefficient of determination: %.2f' % r2_score(y_test, y_predict))
print("Coefficient of the model:%.2f"%reg.coef_)
print("intercept of the model:%.2f"%reg.intercept_)

#绘图
ax =plt.subplot(111)
ax.scatter(X_test,y_test)
ax.plot(X_test,y_predict)
ax.set_ylabel('Y')
ax.set_xlabel('X')
plt.show()

基于交叉验证的lasso回归求解：sklearn.linear_model.LassoCV

利用最小角回归方法求解lasso回归：
sklearn.linear_model.LassoLars

基于交叉验证，利用最小角回归方法求解lasso回归：
sklearn.linear_model.LassoLarsCV

Sklearn求解弹性网络Elastic回归

ElasticNet(alpha=1.0, l1_ratio=0.5, fit_intercept=True, normalize=False,
precompute=False, max_iter=1000, copy_X=True, tol=0.0001,
warm_start=False, positive=False, random_state=None,
selection=
’cyclic’)

ElasticNet中的参数alpha，l1_ratio分别对应于下面的损失函数中同名的参数

设置l1_ratio=0等同于使用L2惩罚，即岭回归，而设置l1_ratio=1等同于使用

L1惩罚，即lasso回归

打倒帝国主义

关注

6
点赞
踩
20

收藏

觉得还不错? 一键收藏
0
评论
正则化线性回归

ndarray of shape (n_samples, n_alphas)，每一次交叉验证时，对应alpha的CV值。sparse_cg和lsqr而言，默认次数取决于scipy.sparse.linalg，对于sag而言，则默。整数，交叉验证设置参数，默认值为None，这时选用留一法进行交叉验证，如果设置为整数，是否复制X数组，bool类型，默认为True，如果为True，将复制X数组;是否先进行归一化，bool类型，默认为False。最大的迭代次数，int类型，默认为None，最大的迭代次数，对于。
复制链接

扫一扫