Python 对线性模型进行特征选择，不断模型线性模型的AIC

最新推荐文章于 2024-05-13 14:54:22 发布

RunsenLIu

最新推荐文章于 2024-05-13 14:54:22 发布

阅读量684

点赞数

分类专栏：原力计算文章标签： python 开发语言 r语言

liurunsen

本文链接：https://blog.csdn.net/weixin_44510615/article/details/129190433

版权

原力计算专栏收录该内容

214 篇文章 3 订阅

订阅专栏

在Python中没有R语言中的MASS::stepAIC 方法进行逐步回归

逐步回归是一种贪心算法，它每次迭代选择一个最优的特征加入模型或从模型中删除一个特征，直到最终得到一个最优的子集。有两种逐步回归算法，一种是前向逐步回归（forward stepwise regression），另一种是后向逐步回归（backward stepwise regression）。

但我们可以使用暴力的方法不断求出模型的AIC

以下是使用前向逐步回归进行特征选择的代码示例：

import numpy as np
import pandas as pd
import statsmodels.api as sm
from itertools import combinations
from sklearn.metrics import mean_squared_error

# Load Boston Housing dataset
boston = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv')

# Split dataset into features (X) and target (y)
X = boston.drop('medv', axis=1)
y = boston['medv']

# Add constant to features (for intercept)
X = sm.add_constant(X)

# Create list of all possible feature combinations (excluding constant)
feature_combinations = [c for i in range(1, len(X.columns)) for c in combinations(X.columns[1:], i)]

# Initialize best AIC and best feature combination
best_aic = np.inf
best_features = None

# Loop through all feature combinations and calculate AIC
for features in feature_combinations:
    # Fit OLS model with current feature combination
    model = sm.OLS(y, X[list(features) + ['const']])
    results = model.fit()
    # Calculate AIC for current model
    aic = results.aic
    # Update best AIC and best feature combination if current model has lower AIC
    if aic < best_aic:
        best_aic = aic
        best_features = features

# Fit final OLS model with best feature combination
final_model = sm.OLS(y, X[list(best_features) + ['const']])
final_results = final_model.fit()

# Print summary of final model
print(final_results.summary())

这个函数接受输入矩阵X和目标向量y，并返回具有最小AIC的最佳模型。它使用一个循环来逐个添加特征，并在每次迭代中计算AIC。它返回一个拟合最好的模型。

需要注意的是，这个算法可能会产生过拟合的问题，因为它在每次迭代中选择一个最优的特征，而不是考虑所有特征的组合。因此，建议使用交叉验证或其他方法来评估特征选择的性能。

RunsenLIu

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
0
评论
Python 对线性模型进行特征选择，不断模型线性模型的AIC

逐步回归是一种贪心算法，它每次迭代选择一个最优的特征加入模型或从模型中删除一个特征，直到最终得到一个最优的子集。有两种逐步回归算法，一种是前向逐步回归（forward stepwise regression），另一种是后向逐步回归（backward stepwise regression）。需要注意的是，这个算法可能会产生过拟合的问题，因为它在每次迭代中选择一个最优的特征，而不是考虑所有特征的组合。它使用一个循环来逐个添加特征，并在每次迭代中计算AIC。但我们可以使用暴力的方法不断求出模型的AIC。
复制链接

扫一扫