python进行回归分析(1)

最新推荐文章于 2024-08-12 20:45:36 发布

静听山水

最新推荐文章于 2024-08-12 20:45:36 发布

阅读量2.2k

点赞数 2

分类专栏：机器学习文章标签： python

本文链接：https://blog.csdn.net/qq_41081716/article/details/103115393

版权

机器学习专栏收录该内容

55 篇文章 14 订阅

订阅专栏

数据来源：R软件自带的包alr4中的数据集

library(alr4)
data<-UN11
write.table(data,"C:/Users/admin/Desktop/数据分析/a.csv",row.names=FALSE,col.names=TRUE,sep=",")

接下来用python分析：

np.array(x).reshape(-1,1)：把array的行形式转为列形式，这是regr函数里fit、predict中参数的数据格式

math.log( x ) :返回自然对数，另外，可以通过log(x, base)来设置底数，如math.log(100,10)=2

'Slope: %.3f' % regr.coef_：设置回归系数输出格式为：三位小数。

import math
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model

data=pd.read_csv("a.csv")

x=data["ppgdp"]
y=data["fertility"]
x=[math.log(x) for x in x]
y=[math.log(y) for y in y]

# 建立线性回归模型
regr = linear_model.LinearRegression()
regr.fit(np.array(x).reshape(-1,1), y)

#拟合值
y_pred=regr.predict(np.array(x).reshape(-1,1))
#print(y_pred)

#回归系数
# a, b = regr.coef_, regr.intercept_
# print(a,b)
print('Slope: %.3f' % regr.coef_)
print('Intercept: %.3f' % regr.intercept_)

#print(type(regr.predict(np.array(x).reshape(-1,1))))

plt.scatter(x, y, color ="blue")
plt.plot(x, regr.predict(np.array(x).reshape(-1,1)), color = 'orange', linewidth = 4)
plt.show()

输出结果：Slope: -0.207 Intercept: 2.666

第一种方法用的库是sklearn，下面用的是另外一种方法，用的是statsmodels.api：

import statsmodels.api as sm # 最小二乘
import matplotlib.pyplot as plt
import pandas as pd
import math
import numpy as np

data=pd.read_csv("a.csv")
x=data["ppgdp"]
y=data["fertility"]
x=[math.log(x) for x in x]
y=[math.log(y) for y in y]

plt.scatter(x, y, color ="blue")
x=sm.add_constant(x)    #线性回归增加常数项 y=kx+b
#print(x)是numpy类型，第一列全1，第二列为数据
regr = sm.OLS(y, x) # 普通最小二乘模型，ordinary least square model
res = regr.fit()

#回归系数。
print(res.params)
#回归结果
print(res.summary())
# 获得拟合值
y_fitted = res.fittedvalues

#x[:,1]提取第二列数据，是list
plt.scatter(x[:,1],y,color='g',label="data")  
plt.plot(x[:,1],y_fitted,color='r',label="OLS")
plt.legend(loc='best')#自动生成图列，loc是位置参数，best也可以用plt.legend
plt.xlabel("ppgdp")
plt.ylabel("fertility")

plt.grid(True)
plt.show()

输出：[ 2.66550734 -0.20714979]

OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.526
Model: OLS Adj.R-squared: 0.524
Method: Least Squares F-statistic: 218.6
Date: Mon, 18 Nov 2019 Prob (F-statistic): 9.06e-34
Time: 16:24:31   Log-Likelihood: -46.435
No. Observations: 199 AIC: 96.87
Df Residuals: 197 BIC: 103.5
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 2.6655 0.121 22.108 0.000 2.428 2.903
x1 -0.2071 0.014 -14.785 0.000 -0.235 -0.180
==============================================================================
Omnibus: 1.037 Durbin-Watson: 2.130
Prob(Omnibus): 0.595 Jarque-Bera (JB): 1.148
Skew: -0.151 Prob(JB): 0.563
Kurtosis: 2.782 Cond. No. 48.3
==============================================================================