Mutual Information

Mutal Information, MI, 中文名称:互信息. 用于描述两个概率分布的相似/相关程度. 常用于衡量两个不同聚类算法在同一个数据集的聚类结果的相似性/共享的信息量.
给定两种聚类结果\(X,Y\), 现在用MI来衡量它们之间的相似程度 计算方式为:
\[ MI(X, Y) = \sum_{u \in U} \sum_{v in V} p(u, v)log \frac{p(u, v)}{p(u)p(v)} \]
其中\(U=set(X), V = set(Y)\)(set()为去重操作).
从概率论的角度来理解, \(\frac{p(u, v)}{p(u)p(v)}\)描述了\(u, v\)之间的相关性: 相关性越大, 值越大(大于1);若独立, 则为1. 从整体来看, \(X, Y\)的distribution pattern越相似, MI越大.

下面是摘自http://www.cnblogs.com/ziqiao/archive/2011/12/13/2286273.html的matlab代码, 可帮助理解.

function MIhat = nmi( A, B ) %NMI Normalized mutual information
% http://en.wikipedia.org/wiki/Mutual_information
% http://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html
% Author: http://www.cnblogs.com/ziqiao/   [2011/12/13] 
if length( A ) ~= length( B)
    error('length( A ) must == length( B)');
end
total = length(A);
A_ids = unique(A);
B_ids = unique(B);

% Mutual information
MI = 0;
for idA = A_ids
    for idB = B_ids
         idAOccur = find( A == idA );
         idBOccur = find( B == idB );
         idABOccur = intersect(idAOccur,idBOccur); 
         
         px = length(idAOccur)/total;
         py = length(idBOccur)/total;
         pxy = length(idABOccur)/total;
         
         MI = MI + pxy*log2(pxy/(px*py)+eps); % eps : the smallest positive number

    end
end

% Normalized Mutual information
Hx = 0; % Entropies
for idA = A_ids
    idAOccurCount = length( find( A == idA ) );
    Hx = Hx - (idAOccurCount/total) * log2(idAOccurCount/total + eps);
end
Hy = 0; % Entropies
for idB = B_ids
    idBOccurCount = length( find( B == idB ) );
    Hy = Hy - (idBOccurCount/total) * log2(idBOccurCount/total + eps);
end

MIhat = 2 * MI / (Hx+Hy);
end

% Example :  
% (http://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html)
% A = [1 1 1 1 1 1   2 2 2 2 2 2    3 3 3 3 3];
% B = [1 2 1 1 1 1   1 2 2 2 2 3    1 1 3 3 3];
% nmi(A,B)% ans = 0.3646

转载于:https://www.cnblogs.com/dengdan890730/p/6280051.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值