Demo
Model & Solution
- Objective function :
- Solution :
,
- General :
Relation to MoG
MoG[MoG传送门]模型为:
参数更新:
特别地,如果K个Gaussian分布的协方差矩阵相同,均为 ,即:
各样本点对应的responsibility计算公式为:
当时,
,即responsibility从soft转变为K-means的hard模式。btw,此时各类点分别集中于各类的中心!
Code
- utils
# cal dist
def distEclud(x,y):
return np.sqrt(np.sum((x-y)**2))
# initial center
def randCent(dataSet,k):
m,n = dataSet.shape
centroids = np.zeros((k,n))
for i in range(k):
index = int(np.random.uniform(0,m))
centroids[i,:] = dataSet[index,:]
return centroids
- K-means
def KMeans(dataSet, k):
m = np.shape(dataSet)[0]
clusterAssment = np.mat(np.zeros((m, 2)))
clusterChange = True
# initial centroids
centroids = randCent(dataSet, k)
while clusterChange:
clusterChange = False
for i in range(m):
minDist = 100000.0
minIndex = -1
# update pseudo_label for each instance
for j in range(k):
distance = distEclud(centroids[j,:], dataSet[i,:])
if distance < minDist:
minDist = distance
minIndex = j
if clusterAssment[i,0] != minIndex:
clusterChange = True
clusterAssment[i,:] = minIndex,minDist**2
# update pseudo_center for each cluster
for j in range(k):
pointsInCluster = dataSet[np.nonzero(clusterAssment[:,0].A == j)[0]]
centroids[j,:] = np.mean(pointsInCluster,axis=0)
return centroids, clusterAssment
Application
Reference
[1]. Bishop, Christopher M. Pattern recognition and machine learning. Springer Science+ Business Media, 2006.