python鸢尾花数据集matlab,用Matlab或Python在Tomek链接中运行一个数据集

本文介绍了如何使用Tomek Links 方法对数据集进行欠采样,以解决类别不平衡问题。通过Synthetic数据集展示如何应用Tomek Links 过滤掉噪声样本,保持两类样本的平衡,便于后续机器学习模型训练。
摘要由CSDN通过智能技术生成

import numpy as np

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.utils import shuffle

from imblearn.under_sampling import TomekLinks

print(__doc__)

rng = np.random.RandomState(0)

n_samples_1 = 500

n_samples_2 = 50

X_syn = np.r_[1.5 * rng.randn(n_samples_1, 2),

0.5 * rng.randn(n_samples_2, 2) + [2, 2]]

y_syn = np.array([0] * (n_samples_1) + [1] * (n_samples_2))

X_syn, y_syn = shuffle(X_syn, y_syn)

X_syn_train, X_syn_test, y_syn_train, y_syn_test = train_test_split(X_syn,

y_syn)

# remove Tomek links

tl = TomekLinks(return_indices=True)

X_resampled, y_resampled, idx_resampled = tl.fit_sample(X_syn, y_syn)

fig = plt.figure()

ax = fig.add_subplot(1, 1, 1)

idx_samples_removed = np.setdiff1d(np.arange(X_syn.shape[0]),

idx_resampled)

idx_class_0 = y_resampled == 0

plt.scatter(X_resampled[idx_class_0, 0], X_resampled[idx_class_0, 1],

alpha=.8, label='Class #0')

plt.scatter(X_resampled[~idx_class_0, 0], X_resampled[~idx_class_0, 1],

alpha=.8, label='Class #1')

plt.scatter(X_syn[idx_samples_removed, 0], X_syn[idx_samples_removed, 1],

alpha=.8, label='Removed samples')

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值