# Resample the minority class. You can change the strategy to 'auto' if you are not sure.# 如果这里选 minority 只能保证两个 class 样本均衡# 但是使用 auto 可以保证多个类样本均衡
sm = SMOTE(sampling_strategy='auto', random_state=7)# Fit the model to generate the data.
oversampled_data,oversampled_label=sm.fit_resample(table.drop(['姓名','头发颜色'], axis=1), table['头发颜色'])
oversampled_table =pd.concat([oversampled_data, oversampled_label], axis=1)
文章目录什么是样本不平衡如何平衡数据集的样本——重采样欠采样(也叫 undersampling)将大的样本集的数据全部筛选出来通过随机采样操作采样固定个数的样本留下和少样本的样本集拼合成最终的样本集样本均衡了过采样(over-sampling)通过 imblearn 库扩充小的样本集样本均衡了什么是样本不平衡import pandas as pdimport numpy as npimport seaborn as snsvalues = {"姓名":["A","B","C","D","E",