Computer Vision | Clustering Basics

最新推荐文章于 2021-07-17 22:10:22 发布

grid_vision

最新推荐文章于 2021-07-17 22:10:22 发布

阅读量216

点赞数

分类专栏： cluster

本文链接：https://blog.csdn.net/qq_40776179/article/details/106292269

版权

cluster 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

There are two major issues in thinking of clustering:
What is a good inter-cluster distance?
Single-link: Using the distance between the closest elements
Complete-link: Using the distance between the furthest elements
Group average: Using the distance between two averages
How many clusters are there?
It can be given a priori, e.g., the number of desired regions
It can be indirectly inferred by setting a threshold by which two points can be decided whether they belong to the same cluster.

Two problems to avoid
Under-fitting: too few clusters
Over-fitting: too many clusters

Two clustering strategies
Agglomerative clustering
Divisive clustering

Missing Data Problem: Example
Let us consider a missing data problem example.
Assume that people can be classified into three groups according to the physical size, big, median, and small people.
Each group is characterized by the population percentage and a 2-D Gaussian showing the distribution of weight-height.
The reason for using Gaussian distributions instead hard-thresholds is due to the uncertainty or error for weight-height measurement.
在这里插入图片描述

Now you are given the statistics of a certain population, and you are given two tasks:
Estimate the model parameters for each class
Classify each data point into one of three classes
That means the class labels are missing in the data we have collected, and we need to find them.
在这里插入图片描述

Probabilistic Formulation

Prior probability: something you know before you even see the data or the observation. It is like your prior knowledge.
在这里插入图片描述
Likelihood function: something to evaluate how likely a data sample is generated from a certain class. It is like your evidence.

Posterior probability: based on what you see and you know, what is the probability of a data sample y belonging to certain class label. It is like the estimate of the missing data.
在这里插入图片描述

grid_vision

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Computer Vision | Clustering Basics

There are two major issues in thinking of clustering:What is a good inter-cluster distance?Single-link: Using the distance between the closest elementsComplete-link: Using the distance between the furthest elementsGroup average: Using the distance betw
复制链接

扫一扫