Feature Engineering Made Easy 笔记 Chapter5

最新推荐文章于 2024-07-10 19:12:38 发布

星心火

最新推荐文章于 2024-07-10 19:12:38 发布

阅读量89

点赞数

文章标签： python 机器学习人工智能

本文链接：https://blog.csdn.net/weixin_42812126/article/details/120739049

版权

Feature Engineering Made Easy 笔记

Chapter 5 Feature Selection

Chapter 5 Feature Selection

预测性能指标

真阳性率和假阳性率
绝对平均误差(回归)
$R^2 = 1-\frac{SS_{res}}{SS_{total}}$ coefficient of determination
$SS_{res} = \sum(y_i-\widehat{y})^2$
$SS_{total} = \sum(y_i-\overline{y})^2$
模型越好，越接近1
训练时间
预测新数据时间
…

基于统计的特征选择

Pearson Correlation $\frac{Cov(x,y)}{\sigma_x \sigma_y}$ 衡量两个变量的线性关系。
Hypothesis Test:
Anova F value
chi-squared
使用SelectKBest

k_best = SelectKBest(f_classif, k=5)
k_best.fit_transform(X, y)

基于机器学习模型选择

使用机器学习 SelectFromModel
Decision Tree
logisitic regression（针对回归任务）
SVC（针对二分数据集）

tree = DecisionTreeClassifier()
tree.fit(X, y)
tree_pipe_params = {'classifier__max_depth': [1, 3, 5, 7]}
importances = pd.DataFrame({'importance': tree.feature_importances_, 'feature':X.columns}).sort_values('importance', ascending=False)
tree_pipe_params = {'classifier__max_depth': [1, 3, 5, 7]}
select_from_model = SelectFromModel(DecisionTreeClassifier(), 
                                    threshold=.05)
selected_X = select_from_model.fit_transform(X, y)

from sklearn.pipeline import Pipeline
select_from_pipe = Pipeline([('select', SelectFromModel(DecisionTreeClassifier())), 
                             ('classifier', d_tree)])
select_from_pipe_params = deepcopy(tree_pipe_params)
select_from_pipe_params.update({
              'select__threshold': [.01, .05, .1, "mean", "median", "2.*mean"],
              'select__estimator__max_depth': [None, 1, 3, 5, 7]
              })
print select_from_pipe_params
# not better than original
get_best_model_and_accuracy(select_from_pipe, 
                            select_from_pipe_params, 
                            X, y)

星心火

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Feature Engineering Made Easy 笔记 Chapter5

Feature Engineering Made Easy 笔记Chapter 5 Feature Selection预测性能指标基于统计的特征选择基于机器学习模型选择Chapter 5 Feature Selection预测性能指标真阳性率和假阳性率绝对平均误差(回归)R2=1−SSresSStotalR^2 = 1-\frac{SS_{res}}{SS_{total}}R2=1−SStotalSSres coefficient of determinationSSres=∑(yi−y
复制链接

扫一扫