机器学习实践——线性回归算法

最新推荐文章于 2024-04-27 16:59:54 发布

隔壁的NLP小哥

最新推荐文章于 2024-04-27 16:59:54 发布

阅读量554

点赞数 2

分类专栏：机器学习

本文链接：https://blog.csdn.net/hei653779919/article/details/104955253

版权

机器学习专栏收录该内容

29 篇文章 12 订阅

订阅专栏

机器学习实践——线性回归算法

1、基本准备

1.1 、 sklearn.linear_model.LinearRegression类

1.1.1 基本参数

fit_intercept：bool类型，表示是否计算截距，也就是线性模型中的B是否为0，设置为True则使用截距，False不使用，默认为TRUE。
normalize：布尔类型，表示是否对数据进行归一化，归一化的过程就是原值减去均值使得新的均值为0，同时除以新值的方差。
copy_X：bool类型，表示是否复制X，当为false的时候，X可能会被覆盖。
n_jobs：int类型或者None类型，表示用于计算的作业数，主要是对运算进行加速。

1.1.2 基本属性

coef_：表示线性模型的系数，ndarray类型。
intercept_：表示模型的偏置项，ndarray类型。

1.1.3 基本方法

fit(X,Y) 模型的训练，拟合数据。
get_params() 获取估计量的参数。
predict(X) 使用线性模型进行预测。
score(X,Y) 评估模型回归的准确度。

1.1.4 基本使用

import numpy as np
from sklearn.linear_model import LinearRegression

X = np.array([[1,1],[2,2],[1,2],[2,3]])
Y = np.dot(X,np.array([1,2])) + 3
reg = LinearRegression(normalize=True)
reg.fit(X,Y)
print(reg.coef_)
print(reg.intercept_)

输出：

[1. 2.]
3.0000000000000018

2、糖尿病预测

使用sklearn中自带的糖尿病数据，同时从sklearn.metrics 引入 mean_squared_error,r2_score的评价指标，具体代码如下

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.datasets import load_diabetes
from sklearn.metrics import mean_squared_error,r2_score
from sklearn.model_selection import train_test_split

X,Y = load_diabetes(return_X_y=True)

#单属性的线性模型,使用第3个属性值
X = X[:,np.newaxis,2]
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.2)
regr = LinearRegression(normalize=True)
regr.fit(X,Y)
Y_pre = regr.predict(X_test)

print("the coef is: ",regr.coef_)
print("the Mean squired error is: %.2f" % mean_squared_error(Y_test,Y_pre))
print("the r2 score is %.2f" % r2_score(Y_test,Y_pre))

plt.scatter(X_test,Y_test,color='black')
plt.plot(X_test,Y_pre,color='blue',linewidth=3)

plt.xticks(())
plt.yticks(())
plt.show()

输出：

the coef is:  [949.43526038]
the Mean squired error is: 4256.65
the r2 score is 0.45

上述的模型使用的是当个特征进行的线性拟合，显然，效果不好，我们下面使用全部的特征进行拟合。

from sklearn.linear_model import LinearRegression
from sklearn.datasets import load_diabetes
from sklearn.metrics import mean_squared_error,r2_score
from sklearn.model_selection import train_test_split

X,Y = load_diabetes(return_X_y=True)

X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.2)
regr = LinearRegression(normalize=True)
regr.fit(X,Y)
Y_pre = regr.predict(X_test)

print("the coef is: ",regr.coef_)
print("the Mean squired error is: %.2f" % mean_squared_error(Y_test,Y_pre))
print("the r2 score is %.2f" % r2_score(Y_test,Y_pre))

输出

the coef is:  [ -10.01219782 -239.81908937  519.83978679  324.39042769 -792.18416163
  476.74583782  101.04457032  177.06417623  751.27932109   67.62538639]
the Mean squired error is: 3289.84
the r2 score is 0.50

进一步，我们使用多项式的提高模型的参数的幂。

from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression
from sklearn.datasets import load_diabetes
from sklearn.metrics import mean_squared_error,r2_score
from sklearn.model_selection import train_test_split

def polynomial_model(degree=1):
    #degree表示参数的幂
    polynomial_features = PolynomialFeatures(degree=degree)
    linear_re = LinearRegression(normalize=True)
    pipline = Pipeline([("polynomial_features",polynomial_features),("linear_re",linear_re)])
    return pipline
X,Y = load_diabetes(return_X_y=True)

X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.2)

degrees = [2,1]
result = []
for d in degrees:
    model = polynomial_model(degree=d)
    model.fit(X_train,Y_train)
    Y_pre = model.predict(X_test)
    train_score = model.score(X_train,Y_train)
    mse = mean_squared_error(Y_test,Y_pre)
    r_score = r2_score(Y_test,Y_pre)
    result.append({'degree':d,'train_score':train_score,'mse':mse,'r_score':r_score})


for item in result:
    print("the train_score is:{0}; the mse is:{1};the r2_score is:{2};the degree is:{3}".format(
        item['train_score'],item['mse'],item['r_score'],item['degree']))

输出

the train_score is:0.5710720567036127; the mse is:2667.9315051349363;the r2_score is:0.5994026521411335;the degree is:2
the train_score is:0.49066017970917464; the mse is:2755.4503370951693;the r2_score is:0.5862614557110543;the degree is:1

3 岭回归(L2正则化的线性回归模型)

3.1 sklearn.linear_model.Ridge

3.1.1 基本参数

alpha：float类型，表示正则化的权重，默认为1.0。
fit_intercept：bool类型，是否使用带有截距B，默认为True
normalize：bool值，是否进行标准化
slover：{‘auto’，‘svd’，‘cholesky’，‘lsqr’，‘sparse_cg’，‘sag’，‘saga’}，默认=‘auto’
“ svd”使用X的奇异值分解来计算Ridge系数。“ sag”使用随机平均梯度下降，“ saga”使用其改进的无偏版本SAGA。两种方法都使用迭代过程。当fit_intercept为True 时，仅’sparse_cg’支持稀疏输入。
random_state：int值或者为None值，在对数据进行混洗的时候使用的伪随机器的种子

3.1.2 基本属性

coef_：线性岭回归的参数

3.1.3 基本方法

fit(X_train,Y_train) ：训练
get_params() 获取估计量的参数
predict(X_test)：模型预测
score() 返回预测的确定系数
set_params()设置参数

3.1.4 基本使用

import numpy as np
from sklearn.linear_model import Ridge

n_samples,n_features = 10,5
rng = np.random.RandomState(0)
y = rng.rand(n_samples)
X = rng.rand(n_samples,n_features)
clf = Ridge(alpha=0.5)
clf.fit(X,y)
print(clf.coef_)

4、稀疏线性模型Lasso

4.1 sklearn.linear_model类(带有L1正则化的类)

4.1.1 基本参数

alpha：float类型，表示正则化的权重，默认为1.0。
fit_intercept：bool类型，是否使用带有截距B，默认为True
normalize：bool值，是否进行标准化。
precompute：bool类型，是否进行事先计算来加快速度

4.1.2 基本属性

coef_模型参数
sparse_coef_模型的稀疏参数
intercept_：截距值

4.1.3 基本方法

fit(X_train,Y_train) ：训练
get_params() 获取估计量的参数
predict(X_test)：模型预测
score() 返回预测的确定系数
set_params()设置参数

4.1.4 基本使用

import numpy as np
from sklearn.linear_model import Lasso

X = np.array([[0,0],[1,1],[2,2]])
Y = np.array([0,1,2])

clf = Lasso(alpha=0.5)
clf.fit(X,Y)
print(clf.coef_)
print(clf.sparse_coef_)

隔壁的NLP小哥

关注

2
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
机器学习实践——线性回归算法

机器学习实践——线性回归算法1、基本准备1.1 、 sklearn.linear_model.LinearRegression类1.1.1 基本参数fit_intercept：bool类型，表示是否计算截距，也就是线性模型中的B是否为0，设置为True则使用截距，False不使用，默认为TRUE。normalize：布尔类型，表示是否对数据进行归一化，归一化的过程就是原值减去均值使得新...
复制链接

扫一扫

专栏目录