和上一篇类似,绘制线性回归方程和置信区间线。使用到的库:
import matplotlib as mpl
import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
使用的数据,采用随机创建方法,不同于上一篇,这里的x,y分别是二维数组。
完整代码如下:
import matplotlib as mpl
import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
# 构造数据
np.random.seed(1000)
x = np.random.randint(1,100,(100,1))
y = [2*i+(np.random.randint(-9,9))**2+np.random.randint(100) for i in x]
print(x,y)
size1 = 20
fontdict = {'weight': 'bold','size':size1,'color':'k','family':'SimHei'}
mpl.rcParams.update(
{
'text.usetex': False,
'font.family': 'stixgeneral',
'mathtext.fontset': 'stix',
"font.family":'serif',
"font.size": size1,
"font.serif": ['Times New Roman'],
}
)
fig,ax = plt.subplots(figsize = (8,6))
sns.regplot(x,y,ax = ax)
ax.set_xlim(0,100)
ax.set_ylim(0,400)
ax.set_xlabel('xlabel')
ax.set_ylabel('ylabel')
# 拟合方程
model = LinearRegression()
model.fit(x,y)
a = model.coef_[0][0]
b = model.intercept_[0]
ax.text(8,350,'$y$ = {:.2f}$x$ + {:.2f}'.format(a,b))
# R2
r2 = r2_score(y,model.predict(x)).round(2)
ax.text(8,300,f'$R$$^{2}$ = {r2}')
std=np.std(model.predict(x))
std_z = 1.96 # from z-table for 95%
confidence_interval = std * std_z
plt.plot(x, model.predict(x) - confidence_interval,label="95%-")
plt.plot(x, model.predict(x) + confidence_interval,label="95%+")
plt.tight_layout()
plt.savefig('out.png',dpi = 600)
plt.show()
显示效果如下:
实际应用中,一般选取不同比例数据作为训练数据和测试数据,我们可以采用:
from sklearn.model_selection import train_test_split库实现数据的分割。
具体代码如下:
# splits the training and test data set in 80% : 20%
# assign random_state to any value.This ensures consistency.
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=5)