Interpretable Models - RuleFit

一、引入

Q:The linear regression model does not account for interactions between features. Would it not be convenient to have a model that is as simple and interpretable as linear models, but also integrates feature interactions?

A:RuleFit learns a sparse linear model with the original features and also a number of new features that are decision rules.Each path through a tree can be transformed into a decision rule by combining the split decisions into a rule. The node predictions are discarded and only the splits are used in the decision rules。rulefit可以理解为linear regression加decision rule衍生出来新特征。

二、Rulefit的三步骤

Rulefit由两部分组成,第一个部分是decision tree中创造出来的规则,第二部分是把新规则和原始feature作为linear model的输入。

2.1 Rule generation

Bagged ensembles, random forest, AdaBoost and MART can be used to fit an ensemble of decision trees by regressing or classifying y with your original features X.

每条规则的形式:

举例:

Altogether, the number of rules created from an ensemble of M trees with tm terminal nodes each is:

2.2 Sparse linear model

进行train前先对original feature去除异常值

L1正则(Lasso)

三、优缺点

优点:为了保持强可解释性,作者建议树的最大深度不能超过3

缺点:效果一般,在规则交叠时可解释性打折扣

rulefit Python的包:https://github.com/christophM/rulefit

另一个类似的是scope-rules(It differs in the way it learns the final rules: First, similar and duplicate rules are removed. Then scope-rules chooses rules based on recall and precision instead of relying on Lasso.)。

以第一个为例子:

import numpy as np
import pandas as pd

from rulefit import RuleFit
from sklearn.ensemble import GradientBoostingRegressor

boston_data = pd.read_csv("boston.csv", index_col=0) y = boston_data.medv.values X = boston_data.drop("medv", axis=1) features = X.columns X = X.as_matrix() 

gb = GradientBoostingRegressor(n_estimators=500, max_depth=10, learning_rate=0.01)
rf = RuleFit(gb)

rf.fit(X, y, feature_names=features)

rules = rf.get_rules()
rules = rules[rules.coef != 0].sort_values("support", ascending=False)
print(rules)

 

四、其他

4.1 Naive Bayes Classifier

天然具有可解释性,因为它的特征独立性假设

4.2 KNN

具有局部可解释性,不具备全局可解释性(there are no global weights or structures explicitly learned.)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值