Fleiss’ kappa is a statistical measure used to assess the reliability or agreement among multiple raters when categorizing items into multiple categories. It is an extension of Cohen’s kappa, which is used for measuring agreement between two raters.
Fleiss’ kappa takes into account the possibility of agreement occurring by chance and provides a measure of agreement that corrects for chance agreement. It is commonly used in fields such as psychology, sociology, and medical research where multiple raters are involved in the classification of items.
The formula for Fleiss’ kappa involves calculating the observed agreement among raters and comparing it to the expected agreement due to chance. The formula is as follows:
κ = (P_obs - P_exp) / (1 - P_exp)
where κ is the Fleiss’ kappa value, P_obs is the observed proportion of agreement among raters, and P_exp is the expected proportion of agreement due to chance.
Fleiss’ kappa ranges from -1 to 1, where a value of 1 indicates perfect agreement, 0 indicates agreement equivalent to chance, and negative values indicate less agreement than expected by chance.
It’s worth noting that Fleiss’ kappa assumes that the categories being rated are mutually exclusive and that each item being rated falls into only one category. Additionally, it is important to have an adequate number of raters and items to obtain reliable results when using Fleiss’ kappa.
Fleiss’ kappa can be affected by various factors, including the number of raters, the number of categories, and the distribution of ratings.
import numpy as np
def fleiss_kappa(ratings):
num_subjects, num_categories = ratings.shape
num_items = np.sum(ratings[0, :]) # Assuming equal number of items rated by each subject
# Calculate the observed agreement
p_obs = np.sum(ratings * ratings, axis=1)
p_obs = (p_obs - num_items) / (num_items * (num_items - 1))
# Calculate the expected agreement
p_j = np.sum(ratings, axis=0) / (num_subjects * num_items)
p_exp = np.sum(p_j * p_j)
# Calculate Fleiss' kappa
kappa = (np.mean(p_obs) - p_exp) / (1 - p_exp)
return kappa
ratings = np.array([
[0, 3, 1, 0], # Subject 1's ratings
[1, 2, 1, 0], # Subject 2's ratings
[0, 1, 3, 0], # Subject 3's ratings
[0, 0, 0, 4] # Subject 4's ratings
])
kappa = fleiss_kappa(ratings)
print("Fleiss' kappa:", kappa)