Unsupervised Nearest Neighbors Clustering With Application to Hyperspectral Images

A dynamic niching clustering algorithm based on individual-connectedness and its application to color image segmentation

Abstract
KSEM, a stochastic extension of the k NN density-based clustering (KNNCLUST) method which randomly assigns objects to clusters by sampling a posterior class label distribution.


Notations

X: Dataset, i.e. X={xi} , xiRd , i=1,,n .
Ci : Discrete random variable corresponding to the class label held by object xi .
ci : Outcome label sampled from some distribution on Ci .
c : c=[c1,,cn]T be the vector of cluster labels.
p(Ci|xi;{xj,cj}ji) : Local posterior distribution of Ci .
κ(i) : Set of indices of the k NNs of xi.
Ω(i) : {cj|jκ(i)} .

Algorithm

The local posterior label distribution in KSEM can be modelled primarily as:

p^(Ci=cL|xi;{xj,cj}jκ(i))jκ(i)g(xj,xi)δcjcL(1)

cLΩ(i) , 1in , where g is a (non negative) kernel function defined on Rd, δij is the Kronecker delta. Though many kernel functions can be used, in this work, they have restricted to the following Gaussian kernel:
g(x,xi)=1(2πdk,κ(xi))dexp12xxi22d2k,κ(xi)(xi),(2)

where xRd , and dk,S(xi) represents the distance from xi to its k th NN. Then they propose the estimation of posterior label distribution as follows:
p^α(Ci=cL|xi;{xj,cj}jκ(i))=[jκ(i)g(xj,xi)δcjcL]αcmΩ(i)[jκ(i)g(xj,xi)δcjcm]α(3)

cLΩ(i),1in , where α[1,+] is a parameter controlling the degree of determinism in the construction of the pseudo-sample: α=1 corresponds to the SEM (stochastic) scheme, while α+ corresponds to the CEM (deterministic) scheme, leading to a labeling scheme which is similar to the KNNCLUST’s rule. In this work, setting α=1.2 is recommended.
Leting ScL={xiX|ci=cL} , teh Kozachenko-Leonenko conditional differential entropy estimate writes:
h^(X|cL)=dnLxiScLlndk,ScL(xi)+ln(nL1)ψ(k)+lnVd(4)

cLΩ , where nL=|ScL| , ψ(k)=Γ(k)/Γ(k) is the digamma function, Γ(k) is the gamma function and Vd=πd/2/Γ(d/2+1) is the volume of the unit ball in Rd . An overall clustering entropy measure can be obtained from conditional entropies (4) as:
h^(X|c)=1ncLΩnLh^(X|cL)(5)

This measure can be used as a stopping criterion during the iterations quite naturally. Since objects are aggregated into preciously formed clusters during the iterations, the individual class-conditional entropies can only increase, and so does the conditional entropy(5). However, when convergence is achieved, this measure reaches an upper limit, and therefore a stopping criterion can be set up from its relative magnitude variation Δh=|h^(X|c(t))h^(X|c(t1))|/h^(X|c(t1)) , where c(t) is the vector of cluster labels at iteration t . The stopping criterion Δh<104 is recommended.

Pseudo-code

Application

Despite the reduction in complexity brought by the k NN search, the case of image segmentation by unsupervised clustering of pixels with KSEM remains computationally difficult, which can severely lower its usage for large size images. In the particular domain of multivariate imagery (multispectral/hyperspectral), the objects of interest are primarily grouped thanks to their spectral information characteristics. To help the clustering of image pixels, one often uses the spatial information, and the fact that two neighboring pixels are likely to belong to the same cluster. So they limit the search of a pixel’s kNNs to a subset of its spatial neighbors, selected via a predefined sampling pattern. Specifically, the pattern has a local sampling density inversely proportional to the distance from the central (query) point (as shown in figure below).
这里写图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值