statsmodels API_statsmodels.api-CSDN博客

本文链接：https://blog.csdn.net/qq_33790600/article/details/122883507

1 介绍

主要的 statsmodels API 分为以下模块：

statsmodels.api: 横截面模型和方法。
statsmodels.tsa.api: 时间序列模型和方法。
statsmodels.formula.api: 使用公式字符串和 DataFrame 指定模型的便捷接口。

2 statsmodels.api

2.1 回归

类	描述
OLS(endog[, exog, missing, hasconst])	普通最小二乘
WLS(endog, exog[, weights, missing, hasconst])	加权最小二乘
GLS(endog, exog[, sigma, missing, hasconst])	广义最小二乘
GLSAR(endog[, exog, rho, missing, hasconst])	具有 AR 协方差结构的广义最小二乘
RecursiveLS(endog, exog[, constraints])	递归最小二乘
RollingOLS(endog, exog[, window, min_nobs, …])	滚动普通最小二乘法
RollingWLS(endog, exog[, window, weights, …])	滚动加权最小二乘法

2.1.1 statsmodels.regression.linear_model.OLS

普通最小二乘法。

import statsmodels.api as sm

model = sm.OLS(endog, exog=None, missing='none', hasconst=None, **kwargs)

'''参数
endog: array_like
	一个一维的因变量。
exog: array_like
	默认情况下，截距是不包含的，应该由用户添加。
	使用 statsmodels.tools.add_constant 添加。
missing: str
	可用的选项有 'none'、'drop' 和 'raise'。如果 'none'，则不进行 nan 检查。如果 'drop'，则丢弃任何使用 nan 的观察值。如果 'raise'，则抛出一个错误。默认的是 'none'。
hasconst: None or bool
	指示RHS是否包含用户提供的常量。如果为True，则不检查常数，k_constant设置为1，并计算所有结果统计信息，就像存在常数一样。如果为False，则不检查常数，并将k_constant设置为0。
**kwargs
	使用公式接口时用来设置模型属性的额外参数。
'''

# 属性
model.weights		# scalar 由于从 WLS 继承，有一个属性 weights = array(1.0)。

# 方法
model.fit([method, cov_type, cov_kwds, use_t])			# Full fit of the model.
model.fit_regularized([method, alpha, L1_wt, ...])		# 返回线性回归模型的正则化拟合。
model.from_formula(formula, data[, subset, drop_cols])	# 从公式和 dataframe 创建模型。
model.get_distribution(params, scale[, exog, ...])		# 为预测分布构造一个随机数生成器。
model.hessian(params[, scale])							# 在给定点求Hessian函数的值。
model.hessian_factor(params[, scale, observed])			# 计算Hessian的权重。
model.information(params)								# 模型的费雪信息矩阵。
model.initialize()										# 初始化模型组件。
model.loglike(params[, scale])							# OLS模型的似然函数。
model.predict(params[, exog])							# 从设计矩阵中返回线性预测值。
model.score(params[, scale])							# 在给定的点评估得分函数。
model.whiten(x)											# OLS model whitener does nothing.

# Properties
model.df_model		# 模型自由度。
model.df_resid		# 剩余自由度。
model.endog_names	# 因变量的名称。
model.exog_names	# 自变量的名称。

2.2 插值

类	描述
BayesGaussMI(data[, mean_prior, cov_prior, …])	使用高斯模型的贝叶斯插值。
MI(imp, model[, model_args_fn, …])	MI 使用提供的 imputer 对象执行多重插值。
MICE(model_formula, model_class, data[, …])	链式方程的多重插值。
MICEData(data[, perturbation_method, k_pmm, …])	包装数据集以允许使用 MICE 处理丢失的数据。

2.3 广义估计方程

2.4 广义线性模型

2.5 离散和计数模型

类	描述
Logit(endog, exog[, check_rank])	Logit 模型
Probit(endog, exog[, check_rank])	概率模型
MNLogit(endog, exog[, check_rank])	多项 Logit 模型

2.6 多元模型

类	描述
Factor([endog, n_factor, corr, method, smc, …])	因子分析
MANOVA(endog, exog[, missing, hasconst])	多元方差分析
PCA(data[, ncomp, standardize, demean, …])	主成分分析

2.7 其他模型

2.8 图形

2.9 统计数据

2.10 工具

类	描述
test([extra_args, exit])	运行测试套件
add_constant(data[, prepend, has_constant])	向数组中添加一列1。

2.10.1 statsmodels.tools.tools.add_constant

向数组中添加一列1。

import statsmodels.api as sm

sm.add_constant(data, prepend=True, has_constant='skip')

'''参数
data: array_like
	列序矩阵。
prepend: bool
	如果为真，则常数在第一列。否则该常量被追加(最后一列)。
has_constant: str {‘raise’, ‘add’, ‘skip’}
	如果数据已经有常数的行为。默认情况下，将返回数据而不添加另一个常量。如果是 'raise'，则在任何列有常量值时引发错误。使用 'add' 将添加一	
	列1，如果一个常量列存在。
'''

'''返回
array_like
	原始值和以常数(一列1)作为第一列或最后一列的值。返回值类型取决于输入类型。
'''