Python中没有forward backward stepwise方法。
使用RFE包
原理:参数中设定需要几个变量,每次按重要性筛去变量
参考:http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html
心得:可以考虑使用gridsearch来调节n_features这一参数
from sklearn.linear_model import LogisticRegression
from sklearn.feature_selection import RFE
reg = LogisticRegression(C=1, solver="newton-cg", max_iter = 1000, penalty = "l2")
model_select = RFE(estimator = reg, n_features_to_select = 4)
自己写了一个利用AIC准则做stepwise的函数:
import math
AIC = lambda estimator, X, y: 2*X.shape[1] + X.shape[0]*math.log(pow((y-(estimator.predict_proba(X))[:,0]), 2).sum()/X.shape[1])
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
def stepwise_selection(X, y, initial_list=[], verbose=True):
""" Perform a forward-backward feature selection
based on p-value from statsmodels.api.OLS
Arguments:
X - pandas.DataFrame with candidate features
y - list-like with the target
initial_list - list of features to start with (column names of X)
threshold_in - include a feature if its p-value < threshold_in
threshold_out - exclude a feature if its p-value > thres