Cluster Analysis 的 评估方法 Evaluation Methods(Apply For Bioinformatics)

Cluster Analysis 的 评估方法 Evaluation Methods(Apply For Bioinformatics)

-----------------------【Basic information】--------------------------------

【Density】

Density is to describe how closely the nodes in the cluster interact with each other. Given a cluster consisting of n nodes and m edges, its density is 2m/n(n-1).


【size distribution】

The size distribution is to describe the basic information of the cluster results by showing the charts with the nodes size and the cluster number distributed on it.


-----------------------------【c-score】-------------------------------------

【Gene Annotation】基因注释
1,GO:Biological Process (BP)/Molecular Function(MF)/Cellular Component(CC)
2,MIPS :Munich Information Center for Protein Sequences, 
a genomics research center in Germany
3,other


【p-value】

In order to detect the functional characteristics of the predicted clusters,we compare the predicted clusters with known functional classification.The P-value based on hypergeometric distribution is often used to estimate whether a given set of proteins is accumulated by chance.It has been used as a criteria to assign each predicted cluster a main function.Here,we also calculate Pvalue for each predicted cluster and assign a function category to it when the minimum P-value occurrs.


【precision】

The Precision for a cluster is the number of true positives divided by the total number of elements labeled as belonging to the positive cluster.
precision = tp/(tp+fp) where tp is the number of overlap and fp+tp is the namuber of the nodes in the cluster


【recall】

Recall is defined as the number of true positives divided by the total number of elements that actually belong to the positive.
recall=tp/(tp+fn) where tp is the number of overlap and tp+fn is the number of the background


【f-measure】

A measure that combines Precision and Recall is the harmonic mean of precision and recall, the traditional F-measure or balanced F-score.
f-measure=2*precision*recall/(precision+recall)




-------------【Comparison with known complexes】-----------------------------



【Compare with known complexes】 OS/Kc/Pc

To evaluate the effectiveness of a algorithm for detecting protein complexes,we compare the predicted clusters produced by the algorithm with known protein complexes,The overlapping scoreOS(Pc,Kc) between a predicted cluster Pc and a known complex Kc is calculated by the following formula: OS (Pc,Kc)=i*i/a*b where i is the size of the intersection set of the predicted cluster and the known complex,  a is the size of the predicted cluster and b is the size of the known complex.


【Sn & Sp】

  Sensitivity and specificity are two important aspects to estimate the performance of algorithms for detecting protein complexes.Sensitivity is the fraction of the true-positive predictions out of all the true predictions,defined by the following formula:Sn = TP/(TP+FN) where TP(true positive)is the number of the predicted  clusters matched by the known complexes with OS(Pc,Kc)≥os(the default os value is 0.2,here you can also set the os value),and FN(false negative)is the number of the known complexes that are not matched by the predicted clusters.Specificity is the fraction of the true-positive predictions out of all the positive predictions,defined by the following formula:Sp=TP/(TP+FP) where FP(false positive)equals the total number of the predicted clusters minus TP.According to the assumption ,a predicted cluster and a known complex are considered to be matched if OS(Pc,Kc)≥os(os is the value you set).Generally,we use 0.2 as the matched overlapping threshold but here you can set the value you like.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值