代码来源:Python数据分析与挖掘实战
源代码有如下错误:
line22: 原: data.reshape 修改后: data.values.reshape
line23: 原: sort(0) 修改后: sort_values(0)
line24: 原: pd.rolling_mean(c, 2).iloc[1:] 修改后: pd.DataFrame.rolling(c, 2).mean().iloc[1:]
line31: 原: [j for i in d[d==j]] 修改后: [i for i in d[d==j]]
import pandas as pd
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False
datafile = '../data/discretization_data.xls'
data = pd.read_excel(datafile)
data = data[u'肝气郁结证型系数'].copy()
k = 4
d1 = pd.cut(data, k, labels=range(k)) #等宽离散化,各个类比依次命名为0,1,2,3
#等频率离散化
w = [1.0*i/k for i in range(k+1)]
w = data.describe(percentiles=w)[4:4+k+1] #使用desc