聚类概述
聚类常见算法
- 划分法(分裂法) kmeans
- 层级分析法
- 密度分析法
聚类三发
kmeans算法概述
- 随机选择k个点作为聚类中心
- 计算各个点到这k个点的距离
- 将对应的点聚到与他最近的这个聚类中心
- 重新计算聚类中心
- 比较当前聚类中心与前一次聚类中心,如果是同一个点,得到聚类结果,若为不同的点,则重复2-5
kmeans算法实战
import pandas as pd
import numpy
import matplotlib.pylab as pyl
import os
from sklearn.cluster import Birch
from sklearn.cluster import KMeans
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
fname = os.path.join(BASE_DIR, 'data', 'luqu2.csv')
df = pd.read_csv(fname)
data = df.iloc[:,1:4]
x = data.values
kms = KMeans(n_clusters=2, n_jobs=3, max_iter=500)
y = kms.fit_predict(x)
x1 = numpy.arange(0, len(y))
pyl.plot(x1, y, 'o')
pyl.show()
import pandas as pd
import numpy
import matplotlib.pylab as pyl
import pymysql
from sklearn.cluster import Birch
from sklearn.cluster import KMeans
db = pymysql.connect(host='127.0.0.1', user='root', passwd='123456', db='taobao')
sql = 'select price, comment from taob limit 300;'
df = pd.read_sql(sql, con=db)
x = df.values
kms = KMeans(n_clusters=3, n_jobs=3, max_iter=500)
y = kms.fit_predict(x)
print(y)
for i in range(0, len(y)):
x1 = df.iloc[i:i+1,0:1].values
y1 = df.iloc[i:i+1, 1:2].values
if y[i] == 0:
pyl.plot(x1, y1, '*r')
elif y[i] == 1:
pyl.plot(x1, y1, 'sy')
else:
pyl.plot(x1, y1, 'pk')
pyl.show()