过采样smote方法

最新推荐文章于 2024-08-21 20:54:16 发布

mambasmile

最新推荐文章于 2024-08-21 20:54:16 发布

阅读量2k

点赞数 2

分类专栏：数据挖掘技术

本文链接：https://blog.csdn.net/qq_26890109/article/details/81316770

版权

SMOTE算法是一种用于处理不平衡数据集的过采样方法。它通过随机选取少数类样本，并结合其K近邻的特征来生成新的合成样本。算法原理是计算每个基样本与其K个最近邻的中心距离，然后生成位于基样本与K近邻中心之间的新样本点。

摘要由CSDN通过智能技术生成

class Smote:
    def __init__(self,samples,N=10,k=5):
        self.n_samples,self.n_attrs=samples.shape
        self.N=N
        self.k=k
        self.samples=samples
        self.newindex=0
       # self.synthetic=np.zeros((self.n_samples*N,self.n_attrs))

    def over_sampling(self):
        N=int(self.N/100)
        self.synthetic = np.zeros((self.n_samples * N, self.n_attrs))
        neighbors=NearestNeighbors(n_neighbors=self.k).fit(self.samples)
        # print('neighbors',neighbors)
        for i in range(len(self.samples)):
            nnarray=neighbors.kneighbors(self.samples[i].reshape(1,-1),return_distance=False)[0]
            #print nnarray
            self._populate(N,i,nnarray)
        return self.synthetic


    # for each minority class samples,choose N of the k nearest neighbors and generate N synthetic samples.
    def _populate(self,N,i,nnarray):