Cluster Analysis 的 评估方法 Evaluation Methods(Apply For Bioinformatics)
-----------------------【Basic information】--------------------------------
【Density】
Density is to describe how closely the nodes in the cluster interact with each other. Given a cluster consisting of n nodes and m edges, its density is 2m/n(n-1).【size distribution】
The size distribution is to describe the basic information of the cluster results by showing the charts with the nodes size and the cluster number distributed on it.
-----------------------------【c-score】-------------------------------------
【Gene Annotation】基因注释
1,GO:Biological Process (BP)/Molecular Function(MF)/Cellular Component(CC)
2,MIPS :Munich Information Center for Protein Sequences,
a genomics research center in Germany
3,other
【p-value】
In order to detect the functional characteristics of the predicted clusters,we compare the predicted clusters with known functional classification.The P-value based on hypergeometric distribution is often used to estimate whether a given set of proteins is accumulated by chance.It has been used as a criteria to assign each predicted cluster a main function.Here,we also calculate Pvalue for each predicted cluster and assign a function category to it when the minimum P-value occurrs.
【precision】
The Precision for a cluster is the number of true positives divided by the total number of elements labeled as belonging to the positive cluster.
precision = tp/(tp+fp) where tp is the number of overlap and fp+tp is the namuber of the nodes in the cluster
【recall】
Recall is defined as the number of true positives divided by the total number of elements that actually belong to the positive.recall=tp/(tp+fn) where tp is the number of overlap and tp+fn is the number of the background
【f-measure】
A measure that combines Precision and Recall is the harmonic mean of precision and recall, the traditional F-measure or balanced F-score.f-measure=2*precision*recall/(precision+recall)