F1 score
In statistical analysis of binary classification, the F 1 F1 F1 score (also F-score or F-measure) is a measure of a test’s accuracy. It considers both the precision p p p and the recall r r r of the test to compute the score:
- p p p is the number of correct positive results divided by the number of all positive results returned by the classifier
- r r r is the number of correct positive results divided by the number of all relevant samples (all samples that should have been identified as positive).
Two examples(from Wikipedia) explaining the two concepts:
Suppose a computer program for recognizing dogs in photographs identifies 8 dogs in a picture containing 12 dogs and some cats. Of the 8 identified as dogs, 5 actually are dogs (true positives), while the rest are cats (false positives). The program’s precision is 5/8 while its recall is 5/12.
When a search engine returns 30 pages only 20 of which were relevant while failing to return 40 additional relevant pages, its precision is 20/30 = 2/3 while its recall is 20/60 = 1/3. So, in this case, precision is “how useful the search results are”, and recall is “how complete the results are”.
The F1 score is the harmonic average of the precision and recall, where an F1 score reaches its best value at 1 (perfect precision and recall) and worst at 0.
Definition:
F
1
=
(
recall
−
1
+
precision
−
1
2
)
−
1
=
2
⋅
precision
⋅
recall
precision
+
recall
F_{1}=\left(\frac{\text { recall }^{-1}+\text { precision }^{-1}}{2}\right)^{-1}=2 \cdot \frac{\text { precision } \cdot \text { recall }}{\text { precision }+\text { recall }}
F1=(2 recall −1+ precision −1)−1=2⋅ precision + recall precision ⋅ recall
F β = ( 1 + β 2 ) ⋅ precision ⋅ recall ( β 2 ⋅ precision ) + recall F_{\beta}=\left(1+\beta^{2}\right) \cdot \frac{\text { precision } \cdot \text { recall }}{\left(\beta^{2} \cdot \text { precision }\right)+\text { recall }} Fβ=(1+β2)⋅(β2⋅ precision )+ recall precision ⋅ recall
Used in clustering (only considering the condition that labels have been given):
P
(
P
j
,
C
i
)
=
∣
P
j
∩
C
i
∣
∣
C
i
∣
P\left(P_{j}, C_{i}\right)=\frac{\left|P_{j} \cap C_{i}\right|}{\left|C_{i}\right|}
P(Pj,Ci)=∣Ci∣∣Pj∩Ci∣
R ( P j , C i ) = ∣ P j ∩ C i ∣ ∣ P j ∣ R\left(P_{j}, C_{i}\right)=\frac{\left|P_{j} \cap C_{i}\right|}{\left|P_{j}\right|} R(Pj,Ci)=∣Pj∣∣Pj∩Ci∣
F ( P j , C i ) = 2 ⋅ P ( P j , C i ) ⋅ R ( P j , C i ) P ( P j , C i ) + R ( P j , C i ) F\left(P_{j}, C_{i}\right)=\frac{2 \cdot P\left(P_{j}, C_{i}\right) \cdot R\left(P_{j}, C_{i}\right)}{P\left(P_{j}, C_{i}\right)+R\left(P_{j}, C_{i}\right)} F(Pj,Ci)=P(Pj,Ci)+R(Pj,Ci)2⋅P(Pj,Ci)⋅R(Pj,Ci)
where P j P_j Pj is the cluster labeled manually and C i C_i Ci is the cluster by our clustering method. Supposed the number of clustering results is m m m, we can define the F F F for every P j P_j Pj:
F ( P j ) = max 1 ≤ i ≤ m F ( P j , C i ) F\left(P_{j}\right)=\max _{1 \leq i \leq m} F\left(P_{j}, C_{i}\right) F(Pj)=max1≤i≤mF(Pj,Ci)
So the F F F value of the clustering result can be calculated by weighted average:
F = ∑ j = 1 s w j ⋅ F ( P j ) , w j = ∣ P j ∣ ∑ i = 1 s ∣ P i ∣ = ∣ P j ∣ n F=\sum_{j=1}^{s} w_{j} \cdot F\left(P_{j}\right), \quad w_{j}=\frac{\left|P_{j}\right|}{\sum_{i=1}^{s}\left|P_{i}\right|}=\frac{\left|P_{j}\right|}{n} F=∑j=1swj⋅F(Pj),wj=∑i=1s∣Pi∣∣Pj∣=n∣Pj∣
The F 1 F1 F1 score consideres accuracy and recall as equally important. So if we need, we can choose F 0.5 F_{0.5} F0.5 which weighs recall higher than precision and F 2 F_{2} F2 which weighs recall lower than precision.
References