python：用OLS 求企业的净利润年平均增长率

belldeep

已于 2023-08-21 21:35:38 修改

阅读量1.9k

点赞数

分类专栏： python 文章标签： python statsmodels OLS 线性回归

于 2022-02-20 14:34:42 首次发布

本文链接：https://blog.csdn.net/belldeep/article/details/123030614

版权

python 专栏收录该内容

200 篇文章 22 订阅

订阅专栏

以长春高新 2006年-2021年的净利润为样本数据，

print("长春高新 求企业的净利润年平均增长率")
a = 3757 / 5.267
n = 15
r = math.pow(a, 1/n)
print("r= {0:.2f}%".format((r-1)*100))

r= 54.96%

用OLS求企业的净利润年平均增长率。

先看数据 000661.txt 净利润单位：百万元

year,jlr
2006,5.267
2007,6.5
2008,20.1
2009,73.15
2010,86.5
2011,109.5
2012,299.8
2013,283.9
2014,318.2
2015,384.5
2016,484.8
2017,662
2018,1006.5
2019,1775
2020,3047
2021,3757

编写 ols_model_1.py 如下

# coding=utf-8
import os, sys
import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt

# 用 statsmodels库做一元线性回归分析
if len(sys.argv) ==2:
    fcode = sys.argv[1]
else:
    print('usage: python ols_model_1.py fcode ')
    sys.exit(1)

if len(fcode) !=6:
    print(' fcode is char(6)')
    sys.exit(2)

file1 = "./" +fcode +'.txt'
if not os.path.exists(file1):
    print(file1 +' is not exists.')
    sys.exit(3)

# 用pandas 读取csv
df = pd.read_csv(file1)
y = df['jlr'].values # 净利润

# 构造变量
x = np.arange(0,len(y),1) # x值
X = sm.add_constant(x) # 回归方程添加一列 x0=1
 
# 建回归方程
# OLS（endog,exog=None,missing='none',hasconst=None) (endog:因变量，exog=自变量）
modle = sm.OLS(y,X) # 最小二乘法
res = modle.fit()   # 拟合数据
beta = res.params   # 取系数
print(res.summary())  # 回归分析摘要
print('beta=',beta)

# 画图
Y = res.fittedvalues    # 预测值
fig, ax = plt.subplots(figsize=(10,6))
ax.plot(x, y, '-', label='jz')  # 原始数据
ax.plot(x, Y, 'r--.',label='fit') # 拟合数据
ax.legend(loc='upper left') # 图例，显示label
plt.title('predict net value: ' +fcode)
plt.xlabel('x')
plt.ylabel('jz')
plt.grid()
plt.show()

运行 python ols_model_1.py 000661

                            OLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       0.644
Model:                            OLS   Adj. R-squared:                  0.619
Method:                 Least Squares   F-statistic:                     25.37
Date:                Sat, 19 Mar 2022   Prob (F-statistic):           0.000182
Time:                        10:55:13   Log-Likelihood:                -126.41
No. Observations:                  16   AIC:                             256.8
Df Residuals:                      14   BIC:                             258.4
Df Model:                           1
Covariance Type:            nonrobust
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const       -660.5318    333.371     -1.981      0.068   -1375.542      54.479
x1           190.7352     37.868      5.037      0.000     109.515     271.955
==============================================================================
Omnibus:                        2.911   Durbin-Watson:                   0.298
Prob(Omnibus):                  0.233   Jarque-Bera (JB):                1.777
Skew:                           0.815   Prob(JB):                        0.411
Kurtosis:                       2.902   Cond. No.                         17.0
==============================================================================

beta= [-660.53178676  190.73521324]

看图：110 是两线的交点值，2200 是拟合直线的终点值。

print("长春高新 求企业的净利润年平均增长率")
a = 2200 / 110
n = 11
r = math.pow(a, 1/n)
print("r= {0:.2f}%".format((r-1)*100))

结论：企业的净利润年平均增长率 31.3%

附录：回归结果详细说明

    Dep.Variable: y 因变量
    Model：OLS 最小二乘模型
    Method: Least Squares 最小二乘
    No. Observations: 样本数据的数量
    Df Residuals：残差自由度(degree of freedom of residuals)
    Df Model：模型自由度(degree of freedom of model)
    Covariance Type：nonrobust 协方差阵的稳健性
    R-squared：R 判定系数
    Adj. R-squared: 修正的判定系数
    F-statistic： 统计检验 F 统计量
    Prob (F-statistic): F检验的 P值
    Log likelihood: 对数似然

    coef：自变量和常数项的系数，b1,b2,...bm,b0
    std err：系数估计的标准误差
    t：统计检验 t 统计量
    P>|t|：t 检验的 P值
    [0.025, 0.975]：估计参数的 95%置信区间的下限和上限
    Omnibus：基于峰度和偏度进行数据正态性的检验
    Prob(Omnibus)：基于峰度和偏度进行数据正态性的检验概率
    Durbin-Watson：检验残差中是否存在自相关
    Skewness：偏度，反映数据分布的非对称程度
    Kurtosis：峰度，反映数据分布陡峭或平滑程度
    Jarque-Bera(JB)：基于峰度和偏度对数据正态性的检验
    Prob(JB)：Jarque-Bera(JB)检验的 P值。
    Cond. No.：检验变量之间是否存在精确相关关系或高度相关关系。