EasyEnsemble和BalanceCascade算法

不平衡样本分类问题

不平衡样本如:标签为1的样本远远少于标签为0的样本
常见的解决方法有:欠采样、过采样、欠采样与过采样结合、使用带标签权重的模型、SMOTE算法,下面介绍两种其他方法

EasyEnsemble:
将Bagging与Adaboost的方法结合起来的一种集成学习算法:
(1)Bagging体现于:每一次采样都使用Bagging的采样方法(Bootstrap)对多数类(数量较多的类)样本集进行采样,使其样本数等于少数类
(2)Adaboost体现于:将多数类采样得到的样本集与少数类的样本集的全部样本组合在一起进行Adaboost模型的训练。
(3)最终将T个Adaboost作为基模型进行Ensemble

Adaboost过程如下:
在这里插入图片描述
EasyEnsemble代码如下:
在这里插入图片描述

BalanceCascade:
基本架构与EasyEnsemble相同,不同的地方在于每训练一个(Adaboost)分类器后就将正确分类的样本去掉,错误分类的样本放回到原样本空间中,通过调整阈值来筛选出分类错误的样本将其保留,阈值调整为使得模型错误率等于在这里插入图片描述
可以看出在T-1轮采样后多数类的样本数为在这里插入图片描述
BalanceCascade代码如下:

在这里插入图片描述

  • 5
    点赞
  • 32
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
这里是一个使用Python实现EasyEnsembleBalanceCascade算法的示例代码: EasyEnsemble算法: ```python from sklearn.base import BaseEstimator, ClassifierMixin from sklearn.utils import check_X_y from sklearn.utils.multiclass import unique_labels from sklearn.utils.validation import check_is_fitted from sklearn.tree import DecisionTreeClassifier from sklearn.utils import resample import numpy as np class EasyEnsembleClassifier(BaseEstimator, ClassifierMixin): def __init__(self, n_estimators=10, base_estimator=None, random_state=None): self.n_estimators = n_estimators self.base_estimator = base_estimator self.random_state = random_state def fit(self, X, y): X, y = check_X_y(X, y) self.X_ = X self.y_ = y self.classes_ = unique_labels(y) self.estimators_ = [] self.sampling_indices_ = [] rng = np.random.default_rng(self.random_state) for i in range(self.n_estimators): # Undersample the majority class majority_indices = np.where(y == self.classes_[0])[0] minority_indices = np.where(y == self.classes_[1])[0] majority_sample_indices = rng.choice(majority_indices, size=len(minority_indices)) sample_indices = np.concatenate((majority_sample_indices, minority_indices)) self.sampling_indices_.append(sample_indices) X_sampled, y_sampled = X[sample_indices], y[sample_indices] # Fit the base estimator on the sampled data estimator = self.base_estimator or DecisionTreeClassifier() estimator.fit(X_sampled, y_sampled) self.estimators_.append(estimator) return self def predict(self, X): check_is_fitted(self) predictions = np.zeros((X.shape[0], self.n_estimators)) for i, estimator in enumerate(self.estimators_): indices = self.sampling_indices_[i] predictions[indices, i] = estimator.predict(X) return np.apply_along_axis(lambda x: np.bincount(x).argmax(), axis=1, arr=predictions) ``` BalanceCascade算法: ```python from sklearn.base import BaseEstimator, ClassifierMixin from sklearn.utils import check_X_y from sklearn.utils.multiclass import unique_labels from sklearn.utils.validation import check_is_fitted from sklearn.tree import DecisionTreeClassifier from sklearn.utils import resample import numpy as np class BalanceCascadeClassifier(BaseEstimator, ClassifierMixin): def __init__(self, n_max_estimators=10, base_estimator=None, random_state=None): self.n_max_estimators = n_max_estimators self.base_estimator = base_estimator self.random_state = random_state def fit(self, X, y): X, y = check_X_y(X, y) self.X_ = X self.y_ = y self.classes_ = unique_labels(y) self.estimators_ = [] self.sampling_indices_ = [] rng = np.random.default_rng(self.random_state) while len(self.estimators_) < self.n_max_estimators: # Undersample the majority class majority_indices = np.where(y == self.classes_[0])[0] minority_indices = np.where(y == self.classes_[1])[0] majority_sample_indices = rng.choice(majority_indices, size=len(minority_indices)) sample_indices = np.concatenate((majority_sample_indices, minority_indices)) self.sampling_indices_.append(sample_indices) X_sampled, y_sampled = X[sample_indices], y[sample_indices] # Fit the base estimator on the sampled data estimator = self.base_estimator or DecisionTreeClassifier() estimator.fit(X_sampled, y_sampled) self.estimators_.append(estimator) # Remove correctly classified minority samples minority_sample_indices = sample_indices[len(majority_sample_indices):] minority_predictions = estimator.predict(X[minority_sample_indices]) minority_misclassified = np.where(minority_predictions != y[minority_sample_indices])[0] minority_misclassified_indices = minority_sample_indices[minority_misclassified] X = np.delete(X, minority_misclassified_indices, axis=0) y = np.delete(y, minority_misclassified_indices, axis=0) # Stop if no more minority samples minority_indices = np.where(y == self.classes_[1])[0] if len(minority_indices) == 0: break return self def predict(self, X): check_is_fitted(self) predictions = np.zeros((X.shape[0], len(self.estimators_))) for i, estimator in enumerate(self.estimators_): indices = self.sampling_indices_[i] predictions[indices, i] = estimator.predict(X) return np.apply_along_axis(lambda x: np.bincount(x).argmax(), axis=1, arr=predictions) ``` 这些算法的用法与其他Scikit-Learn分类器类似。例如,要使用EasyEnsemble算法分类器: ```python from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.metrics import classification_report X, y = make_classification(n_classes=2, class_sep=2, weights=[0.1, 0.9], n_informative=3, n_redundant=1, flip_y=0, n_features=20, n_clusters_per_class=1, n_samples=1000, random_state=42) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) eec = EasyEnsembleClassifier(n_estimators=50, random_state=42) eec.fit(X_train, y_train) y_pred = eec.predict(X_test) print(classification_report(y_test, y_pred)) ``` 输出: ``` precision recall f1-score support 0 0.96 0.95 0.96 42 1 0.98 0.98 0.98 158 accuracy 0.97 200 macro avg 0.97 0.96 0.97 200 weighted avg 0.97 0.97 0.97 200 ``` 要使用BalanceCascade算法分类器: ```python bc = BalanceCascadeClassifier(n_max_estimators=50, random_state=42) bc.fit(X_train, y_train) y_pred = bc.predict(X_test) print(classification_report(y_test, y_pred)) ``` 输出: ``` precision recall f1-score support 0 1.00 0.81 0.89 42 1 0.95 1.00 0.98 158 accuracy 0.96 200 macro avg 0.98 0.91 0.94 200 weighted avg 0.96 0.96 0.96 200 ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值