feature_dict = {i:label for i,label in zip(range(4),('spepal length in cm','spepal width in cm','petal length in cm','petal width in cm'))} import pandas as pd df = pd.io.parsers.read_csv(filepath_or_buffer='链接(省略)',header=None,sep=',',) df columns = [l for i,l in sorted(feature_dict.items())] + ['class label'] df.dropna(how="all",inplace=True) df.tail()
映射:将y变成可衡量的数据格式
from sklearn.preprocessing import LabelEncoder X = df[['spepal length in cm','spepal width in cm','petal length in cm','petal width in cm']].values y = df['class label'].values enc = LabelEncoder() label_encoder = enc.fit(y) y = label_encoder.transform(y) + 1
转化后的结果:
![](https://i-blog.csdnimg.cn/blog_migrate/65d44377a4c224c1ccdc9377a6f782a7.png)
每类样例的均值:
import numpy as np np.set_printoptions(precision=4) mean_vectors = [] for c1 in range(1,4): mean_vectors.append(np.mean(X[y==c1],axis=0) print('Mean vector class %s:%s\n'%(c1,mean_vectors[c1-1]))