随机森林_处理不均衡数据
balanced 加上balanced 参数
# 处理不均衡的数据
from sklearn.ensemble import RandomForestClassifier
from sklearn import datasets
from sklearn.feature_selection import SelectFromModel
iris = datasets.load_iris()
features = iris.data
target = iris.target
# 删除前40个
features = features[40:, :]
target = target[40:]
# 二值化
target = np.where((target == 0), 0, 1)
# balanced 加上balanced 参数
randomforest = RandomForestClassifier(random_state=0, n_jobs=-1, class_weight="balanced")
# 训练模型 可设置权重值
model = randomforest.fit(features, target)
Discussion
A useful argument is balanced, wherein classes are automatically weighted inversely proptional to how frequently they appear in the data:
wj=nknj
wj=nknj
where wjwj is the weight to class j, n is the number of observations, njnj is the number of observations in class j, and k is the total number of classes.