特征筛选是模型训练前很重要的一步。
from feature_selector import FeatureSelector
import pandas as pd
案例一:Air Quality Dataset
air_quality = pd.read_csv('data/AirQualityUCI.csv')
air_quality['Date'] = pd.to_datetime(air_quality['Date'])
air_quality['Date'] = (air_quality['Date'] - air_quality['Date'].min()).dt.total_seconds()
air_quality['Time'] = [int(x[:2]) for x in air_quality['Time']]
air_quality.head()
labels = air_quality['PT08.S5(O3)']
air_quality = air_quality.drop(columns = 'PT08.S5(O3)')
fs = FeatureSelector(data = air_quality, labels = labels)
fs.identify_all(selection_params = {'missing_threshold': 0.5, 'correlation_threshold': 0.7,
'task': 'regression', 'eval_metric': 'l2',
'cumulative_importance': 0.9})
。。。
fs.plot_collinear(plot_all=True)
案例二:Insurance Dataset
保险数据集
insurance = pd.read_csv('data/caravan-insurance-challenge.csv')
insurance = insurance[insurance['ORIGIN'] == 'train']
labels = insurance['CARAVAN']
insurance = insurance.drop(columns = ['ORIGIN', 'CARAVAN'])
insurance.head()
fs = FeatureSelector(data = insurance, labels = labels)
fs.identify_all(selection_params = {'missing_threshold': 0.8, 'correlation_threshold': 0.85,
'task': 'classification', 'eval_metric': 'auc',
'cumulative_importance': 0.8})
insurance_missing_zero = fs.remove(methods = ['missing', 'zero_importance'])
to_remove = fs.check_removal()
fs.feature_importances.head()
insurance_removed = fs.remove(methods = 'all', keep_one_hot=False)
—END—
微信公众号:数据分析联盟
加群微信助手:lestat911
——
手机淘宝用户复制下面:
【Python全套代码 实战 图片 数据演示 案例】 http:// m.tb.cn/h.34wSLrP 点击链接,再选择浏览器咑閞;或復·制这段描述€hi79bdU0FGR€后到:point_right:淘♂寳♀:point_left:[来自超级会员的分享]