- 一元线性回归
任务介绍:
自定义一元回归函数linearRegression,输入参数为x和y的数组xArr和yArr,输出为参数w1和w0
使用美国医疗保险费数据insurance.csv的输入特征age和目标特征charges,根据linearRegression得到回归参数值w1和w0,并保留到小数点后两位
已知根据最小二乘法求得的参数值为
import pandas as pd
from sklearn import linear_model
insurance = pd.read_csv('insurance.csv')
age = insurance['age'].values
charges = insurance['charges'].values
def linearRegression(xArr,yArr):
mean_x =xArr.mean()
mean_y = yArr.mean()
numerator = sum(xArr*yArr) - len(xArr)*mean_x*mean_y
denominator = sum(xArr*xArr) - len(xArr)*mean_x*mean_x
w1 = numerator/denominator
w0 = mean_y - w1*mean_x
return (w0,w1)
print("模型训练,得到参数值")
w0, w1 = linearRegression(age,charges)
print("%.2f" % w1 ,'\n', "%.2f" % w0)
print("sklearn的训练结果")
regr = linear_model.LinearRegression()
regr.fit(age.reshape(-1,1), charges.reshape(-1,1))
print("%.2f" % regr.coef_)
print("%.2f" % regr.intercept_)
- 多元线性回归
代码:
from sklearn import linear_model
from numpy import mat, linalg, column_stack, ones
import pandas as pd
insurance = pd.read_csv('insurance.csv')
def linearRegression(xArr, yArr):
xMat = mat(xArr)
yMat = mat(yArr).T
xTx = xMat.T*xMat
if linalg.det(xTx) == 0:
print( "singular matrix, can't do inverse")
ws = linalg.solve(xTx,xMat.T*yMat)
return ws
print('模型训练,得到参数值')
X = insurance[['age', 'bmi', 'children']].values
X = column_stack((X,ones(X.shape[0])))
y = insurance['charges']
ws = linearRegression(X,y)
print(ws)
print('sklearn的训练结果')
regr = linear_model.LinearRegression()
regr.fit(X, y)
print(regr.coef_)
print(regr.intercept_)
- 线性回归应用:预测医疗费用
任务介绍
请对insurance.csv中的名义型特征进行了One-Hot编码,得到了数据变量insurance
请使用自定义的多元回归函数linearRegression得到回归模型参数和预测值y_pred
自定义决定系数函数r2_Score(为了与sklearn中的r2_score的名称有所区别),只保留小数点后两位,并计算实际模型的决定系数结果score
比较使用sklearn进行模型训练和模型评价与使用自定义函数进行模型训练和模型评价的结果
代码:
在这里插入代码片import numpy as np
import numpy
import pandas as pd
from sklearn.linear_model import LinearRegression
data = pd.read_csv(r'D:\微信下载\WeChat Files\wxid_0u8hx3fhkb6g22\FileStorage\File\2022-09\insurance.csv')
data['sex'] = data['sex'].replace({'male':0, 'female':1 })
data['smoker']=data['smoker'].replace({'yes':0,'no':1})
data['region']=data['region'].replace({'northwest':0,'northeast':1,'southwest':2,'southeast':3})
X = data.drop(['charges'], axis=1)
y = data['charges']
rfc = LinearRegression()
rfc.fit(X,y)
y_pred = rfc.predict(X)
print('使用sklearn库函数预测', y_pred)
def r2_Score(y, y_pred):
sst = sum((y - y.mean()) * (y - y.mean()))
ssr = sum((y_pred - y.mean()) * (y_pred - y.mean()))
sse = sum((y_pred - y) * (y_pred - y))
r2 = ssr / sst
return round(r2, 2)
print('自编函数r^2', r2_Score(y,y_pred))
print('计算实际模型决定系数' , rfc.score(X,y))
def linear(X,y):
x = np.mat(np.c_[np.ones(X.shape[0]), X])
y_data = np.mat(y)
B = np.linalg.inv(x.T * x) * x.T * y_data.T
y_hat = x * B
return y_hat, B
ypre ,coef = linear(X,y)
print('自编函数预测', ypre)