Evaluting the result of clustering method by F value

最新推荐文章于 2024-07-21 16:32:09 发布

blueice12138

最新推荐文章于 2024-07-21 16:32:09 发布

阅读量187

点赞数

文章标签：聚类评价

本文链接：https://blog.csdn.net/qq_41187476/article/details/90679395

版权

F1 score

In statistical analysis of binary classification, the $F 1$ score (also F-score or F-measure) is a measure of a test’s accuracy. It considers both the precision $p$ and the recall $r$ of the test to compute the score:

$p$ is the number of correct positive results divided by the number of all positive results returned by the classifier
$r$ is the number of correct positive results divided by the number of all relevant samples (all samples that should have been identified as positive).

Two examples(from Wikipedia) explaining the two concepts:

Suppose a computer program for recognizing dogs in photographs identifies 8 dogs in a picture containing 12 dogs and some cats. Of the 8 identified as dogs, 5 actually are dogs (true positives), while the rest are cats (false positives). The program’s precision is 5/8 while its recall is 5/12.

When a search engine returns 30 pages only 20 of which were relevant while failing to return 40 additional relevant pages, its precision is 20/30 = 2/3 while its recall is 20/60 = 1/3. So, in this case, precision is “how useful the search results are”, and recall is “how complete the results are”.

The F1 score is the harmonic average of the precision and recall, where an F1 score reaches its best value at 1 (perfect precision and recall) and worst at 0.

Definition:
$F_{1}=\left(\frac{\text { recall }^{-1}+\text { precision }^{-1}}{2}\right)^{-1}=2 \cdot \frac{\text { precision } \cdot \text { recall }}{\text { precision }+\text { recall }}$

$F_{\beta}=\left(1+\beta^{2}\right) \cdot \frac{\text { precision } \cdot \text { recall }}{\left(\beta^{2} \cdot \text { precision }\right)+\text { recall }}$

Used in clustering (only considering the condition that labels have been given):
$P\left(P_{j}, C_{i}\right)=\frac{\left|P_{j} \cap C_{i}\right|}{\left|C_{i}\right|}$

$R\left(P_{j}, C_{i}\right)=\frac{\left|P_{j} \cap C_{i}\right|}{\left|P_{j}\right|}$

$F\left(P_{j}, C_{i}\right)=\frac{2 \cdot P\left(P_{j}, C_{i}\right) \cdot R\left(P_{j}, C_{i}\right)}{P\left(P_{j}, C_{i}\right)+R\left(P_{j}, C_{i}\right)}$

where $P_j$ is the cluster labeled manually and $C_i$ is the cluster by our clustering method. Supposed the number of clustering results is $m$ , we can define the $F$ for every $P_j$ :

$F\left(P_{j}\right)=\max _{1 \leq i \leq m} F\left(P_{j}, C_{i}\right)$

So the $F$ value of the clustering result can be calculated by weighted average:

$F=\sum_{j=1}^{s} w_{j} \cdot F\left(P_{j}\right), \quad w_{j}=\frac{\left|P_{j}\right|}{\sum_{i=1}^{s}\left|P_{i}\right|}=\frac{\left|P_{j}\right|}{n}$

The $F 1$ score consideres accuracy and recall as equally important. So if we need, we can choose $F_{0.5}$ which weighs recall higher than precision and $F_{2}$ which weighs recall lower than precision.

References

blueice12138

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Evaluting the result of clustering method by F value

F1 scoreIn statistical analysis of binary classification, the F1F1F1 score (also F-score or F-measure) is a measure of a test’s accuracy. It considers both the precision ppp and the recall rrr of the...
复制链接

扫一扫