scipy 中的whiten函数

调用kmeans函数,kmeans中调用了whited函数。查后,发现whiten是对输入数据按标准差做归一化处理。
在这里插入图片描述
v a r i a n c e = Σ i = 1 n ( x i − x m e a n ) 2 n variance = \frac{ \Sigma_{i=1}^{n}(x_{i} - x_{mean})^{2}}{n} variance=nΣi=1n(xixmean)2
s t a n d _ d e v a t i o n = v a r i a n c e stand\_devation = \sqrt{variance} stand_devation=variance
经过whiten
x i ′ = x i s t a n d _ d e v a t i o n x_{i}^{'} =\frac{x_{i}} {stand\_devation} xi=stand_devationxi

与标准化不同的是,白化处理没有减去均值。
下面是按步骤实现和调用函数实现,结果是一样的。

import numpy as np
from numpy import array
from scipy.cluster.vq import vq, kmeans, whiten
import matplotlib.pyplot as plt
features  = array([[ 1.9,2.3],
    [ 1.5,2.5],
    [ 0.8,0.6],
    [ 0.4,1.8],
    [ 0.1,0.1],
    [ 0.2,1.8],
    [ 2.0,0.5],
    [ 0.3,1.5],
    [ 1.0,1.0]])

mean = np.mean(features,axis=0)# 求每列的均值[0.91111111 1.34444444]
np_square = np.power(features-mean,2)# 
# print('mean:\n',mean)
print('features - mean:\n',features-mean)
# print('square:\n',np_square)
mean_sqare = np.mean(np_square,axis = 0)# 方差
# print('mean_square:\n',mean_sqare)
stand_devation = np.sqrt(mean_sqare)# 标准差
# print('stand_devation:\n',stand_devation)
np_whit = features/stand_devation# scales by devation
print('******* np_white ******')
print(np_whit)

whitened = whiten(features)# 调用scipy中的函数
print('******* whitened ******')
print(whitened)

whiten后kmeans

from numpy import random
import numpy as np
from numpy import array
from scipy.cluster.vq import vq, kmeans, whiten
import matplotlib.pyplot as plt

random.seed((1000,2000))
codes = 3
kmeans(whitened,codes)
    # (array([[ 2.3110306 ,  2.86287398],    # random
    #        [ 1.32544402,  0.65607529],
    #        [ 0.40782893,  2.02786907]]), 0.5196582527686241)

# Create 50 datapoints in two clusters a and b
pts = 50
a = np.random.multivariate_normal([0, 0], [[4, 1], [1, 4]], size=pts)# 中心点、协方差、数据量
b = np.random.multivariate_normal([30, 10],
               [[10, 2], [2, 1]],
               size=pts)
features = np.concatenate((a, b))
# Whiten data
whitened = whiten(features)
# Find 2 clusters in the data
codebook, distortion = kmeans(whitened, 2)# 返回聚类中心和误差
# Plot whitened data and cluster centers in red
plt.scatter(whitened[:, 0], whitened[:, 1])
# plt.scatter(features[:, 0], features[:, 1],c = 'k')
plt.scatter(codebook[:, 0], codebook[:, 1], c='r',label = 'centroid')

plt.legend()
plt.show()

在这里插入图片描述

  • 2
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值