weka up-sampling & down-sampling

[b]up-sampling:[/b]

SMOTE algorithm,over-sampled by creating ``synthetic'' examples rather than by over-sampling with replacement.

[b]Weka supervised SMOTE filter [/b]
两个参数:
[list]
[*]nearestNeighbors:how many nearest neighbor instances (surrounding the currently considered instance) are used to build an inbetween synthetic instance. 默认取值5.
[*]percentage.how many synthetic instances are created based on the number of the class with less instances. 默认值100,假设minority class有25个样本,则25个新样本将会根据nearest Neighbors来合成,此时minority class的样本数变成了50.
[/list]

[b]down-sampling[/b]
The majority class is under-sampled by randomly removing samples from the majority class population until the minority class becomes some specified percentage of the majority class.

[b]Weka supervised SpreadSubsample filter[/b]
maxCount:可以取minority class的样本数量 n。
如果 maxCount < n: 则正负例的样本数量都减少到maxCount
如果 maxCount > n: 则minority class的样本数量 n不变,majority class的样本数量减少到maxCount



Instances train = DataSource
.read(path);
train.setClassIndex(rawins.numAttributes() - 1);
weka.filters.supervised.instance.SpreadSubsample sps = new SpreadSubsample();
sps.setMaxCount(n); //minority class的样本数量 n
sps.setInputFormat(train);
Instances ins = sps.useFilter(train, sps);
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值