Unsupervised Nearest Neighbors Clustering With Application to Hyperspectral Images

最新推荐文章于 2022-03-21 17:25:05 发布

gyarenas

最新推荐文章于 2022-03-21 17:25:05 发布

阅读量458

点赞数

分类专栏：读论文文章标签： clustering

本文链接：https://blog.csdn.net/gyarenas/article/details/51587958

版权

读论文专栏收录该内容

5 篇文章 0 订阅

订阅专栏

A dynamic niching clustering algorithm based on individual-connectedness and its application to color image segmentation

Abstract
KSEM, a stochastic extension of the $k$ NN density-based clustering (KNNCLUST) method which randomly assigns objects to clusters by sampling a posterior class label distribution.

Notations

$X$ : Dataset, i.e. $X=\{x_i\}$ , $x_i\in \mathbb{R}^d$ , $i=1,\dots,n$ .
$C_i$ : Discrete random variable corresponding to the class label held by object $x_i$ .
$c_i$ : Outcome label sampled from some distribution on $C_i$ .
$c$ : $c={[c_1,\dots, c_n]}^T$ be the vector of cluster labels.
$p\left(C_i|x_i;\{x_j,c_j\}_{j\not=i}\right)$ : Local posterior distribution of $C_i$ .
$\kappa(i)$ : Set of indices of the $k$ NNs of $x_i$ .
$\Omega(i)$ : $\{c_j|j\in\kappa(i)\}$ .

Algorithm

The local posterior label distribution in KSEM can be modelled primarily as:

p^(Ci=cL|xi;{xj,cj}j∈κ(i))∝∑j∈κ(i)g(xj,xi)δcjcL(1)

$\begin{equation} \hat{p}\left(C_i=c_{\mathscr{L}}|x_i;\{x_j,c_j\}_{j\in\kappa(i)}\right)\propto \sum_{j\in\kappa(i)}{g(x_j,x_i)\delta_{c_j c_\mathscr{L}}}\tag{1} \end{equation}$

∀cL∈Ω(i) $\forall c_{\mathscr{L}}\in\Omega(i)$ ,

1≤i≤n $1\leq i\leq n$ , where

g $g$ is a (non negative) kernel function defined on

Rd $\mathbb{R}^d$ ,

δij $\delta_{ij}$ is the Kronecker delta. Though many kernel functions can be used, in this work, they have restricted to the following Gaussian kernel:

g (x, x i) = 1 ( 2 π - - \sqrt d k , κ ( x i ) ) d exp ⎛ ⎝ - 1 2 ∥ x - x i ∥ 2 2 d 2 k , κ ( x i ) ( x i ) ⎞ ⎠, (2)

$\begin{equation} g(x,x_i)=\frac{1}{\left(\sqrt{2\pi}d_{k,\kappa(x_i)}\right)^d}\exp{\left(-\frac{1}{2}\frac{\|x-x_i\|_2^2}{d_{k,\kappa(x_i)}^2(x_i)}\right)},\tag{2} \end{equation}$
where

x∈Rd $x\in\mathscr{R}^d$ , and

dk,S(xi) $d_{k,S(x_i)}$ represents the distance from

xi $x_i$ to its

k $k$ th NN. Then they propose the estimation of posterior label distribution as follows:

p^α(Ci=cL|xi;{xj,cj}j∈κ(i))=[∑j∈κ(i)g(xj,xi)δcjcL]α∑cm∈Ω(i)[∑j∈κ(i)g(xj,xi)δcjcm]α(3)

$\begin{equation} \hat{p}_\alpha\left(C_i=c_{\mathscr{L}}|x_i;\{x_j,c_j\}_{j\in\kappa(i)}\right)=\frac{\left[\sum\limits_{j\in\kappa(i)}{g(x_j,x_i)\delta_{c_jc_\mathscr{L}}}\right]^\alpha}{\sum\limits_{c_m\in\Omega(i)}\left[\sum\limits_{j\in\kappa(i)}{g(x_j,x_i)\delta_{c_jc_m}}\right]^\alpha}\tag{3} \end{equation}$

∀cL∈Ω(i),1≤i≤n $\forall c_{\mathscr{L}}\in \Omega(i), 1\leq i\leq n$ , where

α∈[1,+∞] $\alpha \in [1, +\infty]$ is a parameter controlling the degree of determinism in the construction of the pseudo-sample:

α=1 $\alpha=1$ corresponds to the SEM (stochastic) scheme, while

α→+∞ $\alpha \rightarrow +\infty$ corresponds to the CEM (deterministic) scheme, leading to a labeling scheme which is similar to the KNNCLUST’s rule. In this work, setting

α=1.2 $\alpha=1.2$ is recommended.
Leting

ScL={xi∈X|ci=cL} $S_{c\mathscr{L}}=\{x_i\in X|c_i=c_{\mathscr{L}}\}$ , teh Kozachenko-Leonenko conditional differential entropy estimate writes:

h^(X | c L) = d n L \sum x i \in S c L ln d k, S c L (x i) + ln (n L - 1) - ψ (k) + ln V d (4)

$\begin{equation} \hat{h}(X|c_{\mathscr{L}})=\frac{d}{n_\mathscr{L}}\sum_{x_i\in S_{c_\mathscr{L}}}\ln d_{k,S_{c_\mathscr{L}}}(x_i)+\ln(n_\mathscr{L}-1)-\psi(k)+\ln V_d\tag{4} \end{equation}$

∀cL∈Ω $\forall c_\mathscr{L} \in \Omega$ , where

nL=|ScL| $n_\mathscr{L}=|S_{c_\mathscr{L}}|$ ,

ψ(k)=Γ′(k)/Γ(k) $\psi(k)=\Gamma'(k)/\Gamma(k)$ is the digamma function,

Γ(k) $\Gamma(k)$ is the gamma function and

Vd=πd/2/Γ(d/2+1) $V_d=\pi^{d/2}/\Gamma(d/2+1)$ is the volume of the unit ball in

Rd $\mathbb{R}^d$ . An overall clustering entropy measure can be obtained from conditional entropies (4) as:

h^(X | c) = 1 n \sum c L \in Ω n L h^(X | c L) (5)

$\begin{equation} \hat{h}(X|c)=\frac{1}{n}\sum_{c_\mathscr{L}\in\Omega}{n_\mathscr{L}\hat{h}(X|c_\mathscr{L})}\tag{5} \end{equation}$
This measure can be used as a stopping criterion during the iterations quite naturally. Since objects are aggregated into preciously formed clusters during the iterations, the individual class-conditional entropies can only increase, and so does the conditional entropy(5). However, when convergence is achieved, this measure reaches an upper limit, and therefore a stopping criterion can be set up from its relative magnitude variation

Δh=|h^(X|c(t))−h^(X|c(t−1))|/h^(X|c(t−1)) $\Delta_h=|\hat{h}(X|c^{(t)})-\hat{h}(X|c^{(t-1)})|/\hat{h}(X|c^{(t-1)})$ , where

c(t) $c^{(t)}$ is the vector of cluster labels at iteration

t $t$ . The stopping criterion

Δh<10−4 $\Delta_h<{10}^{-4}$ is recommended.

Pseudo-code

Application

Despite the reduction in complexity brought by the $k$ NN search, the case of image segmentation by unsupervised clustering of pixels with KSEM remains computationally difficult, which can severely lower its usage for large size images. In the particular domain of multivariate imagery (multispectral/hyperspectral), the objects of interest are primarily grouped thanks to their spectral information characteristics. To help the clustering of image pixels, one often uses the spatial information, and the fact that two neighboring pixels are likely to belong to the same cluster. So they limit the search of a pixel’s $k$ NNs to a subset of its spatial neighbors, selected via a predefined sampling pattern. Specifically, the pattern has a local sampling density inversely proportional to the distance from the central (query) point (as shown in figure below).
这里写图片描述

gyarenas

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Unsupervised Nearest Neighbors Clustering With Application to Hyperspectral Images

A dynamic niching clustering algorithm based on individual-connectedness and its application to color image segmentationAbstract KSEM, a stochastic extension of the kkNN density-based clustering (KNNC
复制链接

扫一扫

专栏目录