官方文档:https://www.statsmodels.org/stable/user-guide.html \quad https://www.statsmodels.org/stable/api.html
一.概述
1.简介
(1)简介:
参见:https://zhuanlan.zhihu.com/p/91384305
statsmodels是1个Python统计分析模块,源于斯坦福大学统计学教授Jonathan Taylor,并由Skipper Seabold和Josef Perktold于2010年正式创
建该项目.其包含了许多经典统计学和经济计量学的算法,主要有:
①回归模型:线性回归,广义线性模型,健壮线性模型,线性混合效应模型等
②方差分析(ANOVA)
③时间序列分析和状态空间模型:AR,ARMA,ARIMA,VAR等
④广义的矩量法
⑤非参数方法:核密度估计,核回归
⑥统计模型结果可视化方法
(2)项目结构:
statsmodels/
__init__.py
api.py
discrete/
__init__.py
discrete_model.py
tests/
results/
tsa/
__init__.py
api.py
tsatools.py
stattools.py
arima_process.py
vector_ar/
__init__.py
var_model.py
tests/
results/
tests/
results/
stats/
__init__.py
api.py
stattools.py
tests/
tools/
__init__.py
tools.py
decorators.py
tests/
2.与其他模块的关系:
①与patsy:受R的公式系统的启发,Nathaniel Smith创建了patsy项目.该模块提供了statsmodels的公式/模型的规范框架
②与scikit-learn:statsmodels更关注统计推断,而sklearn更注重预测
3.安装与导入
(1)安装:
pip install statsmodels
(2)导入:
①对交互式使用,推荐导入接口:
import statsmodels.api as sm
import statsmodels.tsa.api as tsa
②直接导入方法/模型/子模块:
from statsmodels.regression.linear_model import OLS,WLS
from statsmodels.datasets import macrodata
import statsmodels.regression.linear_model as lm
(3)查看可用函数/类:
>>> dir(sm)
['BayesGaussMI', 'BinomialBayesMixedGLM'...'webdoc']
>>> dir(sm.tsa)
['AR', 'ARIMA'...'x13_arima_select_order']
(4)不同导入方法的比较:
参见:https://www.statsmodels.org/stable/api-structure.html#import-paths-and-structure
二.横断面研究(Cross-Sectional Study)
1.接口
#通常导入为sm:
import statsmodels.api as sm
#注意:
①这类接口推荐用于交互式使用
②这些类/函数实际上是定义在其他地方的,sm只是提供了1个接口
(1)回归(Regression):
"普通最小二乘法"(Ordinary Least Squares):class sm.OLS(<endog>,<exog>[,missing='none',hasconst=None,**kwargs])
#实际上是class statsmodels.regression.linear_model.OLS
#参数说明:
endog:指定数据点的y值;为1-D array-like
exog:指定数据点的x值;为n×k array-like,其中n=len(<endog>),k为特征数
missing:指定如何处理缺失值;为"none"(不检查是否包含NaN)/"drop"(丢弃相应记录)/"raise"(报错)
hasconst:说明自变量中是否包含常数项对应的虚拟变量;为None/bool
Indicates whether the RHS includes a user-supplied constant.If True,a constant is not checked for and
k_constant is set to 1 and all result statistics are calculated as if a constant is present.If False,
a constant is not checked for and k_constant is set to 0
kwargs:指定使用公式接口时要传入的其他参数
######################################################################################################################
"广义最小二乘法"(Generalized Least Squares):class sm.GLS(<endog>,<exog>[,sigma=None,missing='none',hasconst=None,**kwargs])
#实际上是class statsmodels.regression.linear_model.GLS
#参数说明:其他参数同sm.OLS
sigma:指定协方差加权矩阵;为None/scalar/array
#The default is None for no scaling
#If sigma is a scalar, it is assumed that sigma is an n x n diagonal matrix with the given scalar, sigma as the
#value of each diagonal element
#If sigma is an n-length vector, then sigma is assumed to be a diagonal matrix with the given sigma on the
#diagonal
#This should be the same as WLS
######################################################################################################################
Generalized Least Squares with AR covariance structures:class sm.GLSAR(<endog>,<exog>[,rho=1,missing='none',hasconst=None,**kwargs])
#实际上是class statsmodels.regression.linear_model.GLSAR
######################################################################################################################
"加权最小二乘法"(Weighted Least Squares):class sm.WLS(<endog>,<exog>[,weights=1.0,missing='none',hasconst=None,**kwargs])
#实际上是class statsmodels.regression.linear_model.WLS
#参数说明:其他参数同sm.OLS
weights:指定权重;为int/1-D array-like
######################################################################################################################
"递归最小二乘法"(Recursive Least Squares):class sm.RecursiveLS(<endog>,<exog>[,constraints=None,**kwargs])
#实际上是class statsmodels.regression.recursive_ls.RecursiveLS
######################################################################################################################
"滚动普通最小二乘法"(Rolling Ordinary Least Squares):class sm.RollingOLS(<endog>,<exog>[,window=None,min_nobs=None,missing='drop',expanding=False])
#实际上是class statsmodels.regression.rolling.RollingOLS
######################################################################################################################
"滚动加权最小二乘法"(Rolling Weighted Least Squares):class sm.RollingWLS(<endog>,<exog>[,window=None,weights=None,min_nobs