DataWhale-树模型与集成学习-Task06-梯度提升树01-202110

Part D: 梯度提升树 — Datawhale

一、练习题

1. 练习题1

 解答:

(1)  均方损失函数

                               L(y_i,\hat{y})=\frac{1}{N}\sum_{i=1}^N(y_i-\hat{y})^2

                                    \begin{equation} \begin{aligned} \frac{\partial L}{\partial \hat{y}}&=\frac{2}{N}\sum_{i=1}^N(\hat{y}-y_i)=0 \end{equation} \end{aligned}

所以F_0(\mathbf{X}_i)=\frac{1}{N}\sum_{i=1}^N y_i

(2) 绝对值损失函数

                                  L(y_i,\hat{y})=\frac{1}{N}\sum_{i=1}^N |y_i-\hat{y}|

                                  \frac{\partial L}{\partial \hat{y}}=\sum_{i=1}^{N}\frac{|\hat{y}-y_i|}{\hat{y}-y_i}=0

所以F_0(\mathbf{X}_i)y_i中位数。

2. 练习题2

 解答:

                              r_i=y_i-F_{m-1}(X_i)

                              L(w_i)=(r_i-w_i)^2

                             \begin{equation} \begin{aligned} w_i^{*}&=0-\frac{\partial L}{\partial w}|_{w=0}\\ &=2r_i \end{aligned} \end{equation}

3. 练习题3

 解答:

此处就不能做梯度下降了,直接取h_m^{*}(X_i)=0

4. 练习题4

 解答:

  牛顿法是求L^{'}(y_i,\bar{w})=0的0点,用牛顿迭代法:

                               \begin{equation} \begin{aligned} \bar{w}_i^{(m)}&=\bar{w}_i^{(m-1)}-\frac{L^{'}(y_i,\bar{w}_i^{(m-1)})}{L^{''}(y_i,\bar{w}_i^{(m-1)})}\\ &=F_{m-1}(X_i)-\frac{L^{'}(y_i,F_{m-1}(X_i))}{L^{''}(y_i,F_{m-1}(X_i))} \end{aligned} \end{equation}

由此得到:

                              w_i^{*}=0-\frac{L^{'}(y_i,w_i)}{L^{''}(y_i,w_i)}|_{w=0}

5. 练习题5

 解答:

K\geq3 ,

 当y_{Ki}=1,有:

\begin{equation} \begin{aligned} -\frac{\partial L}{\partial F_{ki}}|_{F_i=F_i^{(m-1)}}&=-\frac{\partial}{\partial F_{ki}}\log(1+\sum_{c=1}^{K-1}e^{F_{ci}})\\ &=-\frac{e^{F_{ci}^{(m-1)}}}{1+\sum_{c=1}^{K-1}e^{F_{ci}^{(m-1)}}} \end{aligned} \end{equation}

y_{Ki}=0,有:

\begin{equation} \begin{aligned} -\frac{\partial L}{\partial F_{ki}}|_{F_i=F_i^{(m-1)}}&=\frac{\partial}{\partial F_{ki}}\sum_{c=1}^{K-1}y_{ci}\log\frac{e^{F_{ci}}}{1+\sum_{c=1}^{K-1}e^{F_{ci}}}\\ &=\frac{\partial}{\partial F_{ki}}y_{ki}\log\frac{e^{F_{ki}}}{1+\sum_{c=1}^{K-1}e^{F_{ci}}}\\ &=y_{ki}\frac{1+\sum_{c=1}^{K-1}e^{F_{ci}}}{e^{F_{ki}}}[\frac{e^{F_{ki}}}{1+\sum_{c=1}^{K-1}e^{F_{ci}}}-\frac{e^{F_{ki}}\cdot e^{F_{ki}}}{(1+\sum_{c=1}^{K-1}e^{F_{ci}})^2}]\\ &=y_{ki}[1-\frac{e^{F_{ki}}}{1+\sum_{c=1}^{K-1}e^{F_{ci}}}] \end{aligned} \end{equation}

6. 练习题6

 解答:

所以:

-\frac{\partial L}{\partial F_i}=y_i\frac{1+e^{F_i}}{e^{F_i}}[\frac{e^{F_i}}{1+e^{F_i}}-(\frac{e^{F_i}}{1+e^{F_i}})^2]-(1-y_i)\frac{e^{F_i}}{1+e^{F_i}}=y_i-\frac{e^{F_i}}{1+e^{F_i}}

7. 练习题7

 解答:

\frac{1}{10}=p1=\frac{e^{F^{(0)}}}{1+e^{F^{(0)}}}

可以得到:

F^{(0)}=-log 9

二、代码实现

1. GBDT回归算法

GYH老师的代码:

from sklearn.tree import DecisionTreeRegressor as DT
from sklearn.datasets import make_regression
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
import numpy as np

class GBDTRegressor:

    def __init__(self, max_depth=4, n_estimator=1000, lr=0.2):
        self.max_depth = max_depth
        self.n_estimator = n_estimator
        self.lr = lr
        self.booster = []

        self.best_round = None

    def record_score(self, y_train, y_val, train_predict, val_predict, i):
        mse_val = mean_absolute_error(y_val, val_predict)
        if (i+1)%10==0:
            mse_train = mean_absolute_error(y_train, train_predict)
            print("第%d轮\t训练集: %.4f\t"
                "验证集: %.4f"%(i+1, mse_train, mse_val))
        return mse_val

    def fit(self, X, y):
        # 在数据集中划分训练集和验证集
        X_train, X_val, y_train, y_val = train_test_split(
            X, y, test_size=0.25, random_state=0)
        train_predict, val_predict = 0, 0
        next_fit_val = np.full(X_train.shape[0], np.median(y_train))
        # 为early_stop做记录准备
        last_val_score = np.infty
        for i in range(self.n_estimator):
            cur_booster = DT(max_depth=self.max_depth)
            cur_booster.fit(X_train, next_fit_val)
            train_predict += cur_booster.predict(X_train) * self.lr
            val_predict += cur_booster.predict(X_val) * self.lr
            # 平方损失为((y - (F_{m-1} + w)^2)/2,若记残差为r
            # 即为((r - w)^2)/2,此时关于w在0点处的负梯度求得恰好为r
            # 因此拟合的值就是y_train - train_predict
            next_fit_val = y_train - train_predict
            self.booster.append(cur_booster)
            cur_val_score = self.record_score(
                y_train, y_val, train_predict, val_predict, i)
            if cur_val_score > last_val_score:
                self.best_round = i
                print("\n训练结束!最佳轮数为%d"%(i+1))
                break
            last_val_score = cur_val_score

    def predict(self, X):
        cur_predict = 0
        # 在最佳验证集得分的轮数停止,防止过拟合
        for i in range(self.best_round):
            cur_predict += self.lr * self.booster[i].predict(X)
        return cur_predict

if __name__ == "__main__":

    X, y = make_regression(
        n_samples=10000, n_features=50, n_informative=20, random_state=1)
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.25, random_state=0)

    model = GBDTRegressor()
    model.fit(X_train, y_train)
    prediction = model.predict(X_test)
    mse = mean_absolute_error(y_test, prediction)
    print("\n测试集的MSE为 %.4f"%(mse))

2. GBDT分类算法

GYH老师的二分类代码

from sklearn.tree import DecisionTreeRegressor as DT
from sklearn.datasets import make_classification
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split
import numpy as np

class GBDTClassifier:

    def __init__(self, max_depth=4, n_estimator=1000, lr=0.2):
        self.max_depth = max_depth
        self.n_estimator = n_estimator
        self.lr = lr
        self.booster = []

        self.best_round = None

    def record_score(self, y_train, y_val, train_predict, val_predict, i):
        train_predict = np.exp(train_predict) / (1 + np.exp(train_predict))
        val_predict = np.exp(val_predict) / (1 + np.exp(val_predict))
        auc_val = roc_auc_score(y_val, val_predict)
        if (i+1)%10==0:
            auc_train = roc_auc_score(y_train, train_predict)
            print("第%d轮\t训练集: %.4f\t"
                "验证集: %.4f"%(i+1, auc_train, auc_val))
        return auc_val

    def fit(self, X, y):
        X_train, X_val, y_train, y_val = train_test_split(
            X, y, test_size=0.25, random_state=0)
        train_predict, val_predict = 0, 0
        # 按照二分类比例的初始化公式计算
        fit_val = np.log(y_train.mean() / (1 - y_train.mean()))
        next_fit_val = np.full(X_train.shape[0], fit_val)
        last_val_score = - np.infty
        for i in range(self.n_estimator):
            cur_booster = DT(max_depth=self.max_depth)
            cur_booster.fit(X_train, next_fit_val)
            train_predict += cur_booster.predict(X_train) * self.lr
            val_predict += cur_booster.predict(X_val) * self.lr
            next_fit_val = y_train - np.exp(
                train_predict) / (1 + np.exp(train_predict))
            self.booster.append(cur_booster)
            cur_val_score = self.record_score(
                y_train, y_val, train_predict, val_predict, i)
            if cur_val_score < last_val_score:
                self.best_round = i
                print("\n训练结束!最佳轮数为%d"%(i+1))
                break
            last_val_score = cur_val_score

    def predict(self, X):
        cur_predict = 0
        for i in range(self.best_round):
            cur_predict += self.lr * self.booster[i].predict(X)
        return np.exp(cur_predict) / (1 + np.exp(cur_predict))

if __name__ == "__main__":

    X, y = make_classification(
        n_samples=10000, n_features=50, n_informative=20, random_state=1)
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.25, random_state=0)

    model = GBDTClassifier()
    model.fit(X_train, y_train)
    prediction = model.predict(X_test)
    auc = roc_auc_score(y_test, prediction)
    print("\n测试集的AUC为 %.4f"%(auc))

GYH老师的多分类源码

from sklearn.tree import DecisionTreeRegressor as DT
from sklearn.datasets import make_classification
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split
import numpy as np

def one_hot(y):
    res = np.zeros((y.size, y.max()+1))
    res[np.arange(y.size), y] = 1
    return res

class GBDTMultiClassifier:

    def __init__(self, max_depth=4, n_estimator=1000, lr=0.2):
        self.max_depth = max_depth
        self.n_estimator = n_estimator
        self.lr = lr
        self.booster = []

        self.n_classes = None
        self.best_round = None

    def get_init_val(self, y):
        init_val = []
        y = np.argmax(y, axis=1)
        for c in range(self.n_classes):
            init_val.append(np.log((y==c).mean()))
        return np.full((y.shape[0], self.n_classes), init_val)

    def record_score(self, y_train, y_val, train_predict, val_predict, i):
        train_predict = np.exp(train_predict) / np.exp(
            train_predict).sum(1).reshape(-1, 1)
        val_predict = np.exp(val_predict) / np.exp(
            val_predict).sum(1).reshape(-1, 1)
        auc_val = roc_auc_score(y_val, val_predict)
        if (i+1)%10==0:
            auc_train = roc_auc_score(y_train, train_predict)
            print("第%d轮\t训练集: %.4f\t"
                "验证集: %.4f"%(i+1, auc_train, auc_val))
        return auc_val

    def fit(self, X, y):
        X_train, X_val, y_train, y_val = train_test_split(
            X, y, test_size=0.25, random_state=0)
        self.n_classes = y.shape[1]
        train_predict = np.zeros((X_train.shape[0], self.n_classes))
        val_predict = np.zeros((X_val.shape[0], self.n_classes))
        next_fit_val = self.get_init_val(y_train)
        last_val_score = - np.infty
        for i in range(self.n_estimator):
            last_train = train_predict.copy()
            self.booster.append([])
            for m in range(self.n_classes):
                cur_booster = DT(max_depth=self.max_depth)
                cur_booster.fit(X_train, next_fit_val[:, m])
                train_predict[:, m] += cur_booster.predict(X_train) * self.lr
                val_predict[:, m] += cur_booster.predict(X_val) * self.lr
                next_fit_val[:, m] = y_train[:, m] - np.exp(
                    last_train[:, m]) / np.exp(last_train).sum(1)
                self.booster[-1].append(cur_booster)
            cur_val_score = self.record_score(
                y_train, y_val, train_predict, val_predict, i)
            if cur_val_score < last_val_score:
                self.best_round = i
                print("\n训练结束!最佳轮数为%d"%(i+1))
                break
            last_val_score = cur_val_score

    def predict(self, X):
        cur_predict = np.zeros((X.shape[0], self.n_classes))
        for i in range(self.best_round):
            for m in range(self.n_classes):
                cur_predict[:, m] += self.lr * self.booster[i][m].predict(X)
        return np.exp(cur_predict) / np.exp(cur_predict).sum(1).reshape(-1, 1)


if __name__ == "__main__":

    X, y = make_classification(
        n_samples=10000, n_features=50, n_informative=20,
        n_classes=3, random_state=1)
    y = one_hot(y)
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.25, random_state=0)

    model = GBDTMultiClassifier()
    model.fit(X_train, y_train)
    prediction = model.predict(X_test)
    auc = roc_auc_score(y_test, prediction)
    print("\n测试集的AUC为 %.4f"%(auc))

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值