社团检测评估指标——重叠标准互信息NMIov的Python实现

本篇文章分享复杂网络领域重叠社团检测研究方向的算法评估指标——重叠标准互信息的Python实现,代码中包含三个指标:

  • LFK: The original implementation proposed by Lacichinetti et al.(Lacichinetti 提出的最初版本)
  • MGH: McDaid et al. argued that the original NMI normalization was flawed and introduced a new (global) normalization by the max of entropy(McDaid 提出的改进版本)
  • MGH_LFK: This is a variant of the LFK method introduced in (2), with the same type of normalization but done globally instead of at each community(McDaid 提出的改进版本)
    指标具体区别请参考McDaid在论文*[1]Mcdaid A F, Greene D, Hurley N. Normalized Mutual Information to evaluate overlapping community finding algorithms[J]. Computer Science, 2011.*中的公式定义。在这里插入图片描述如果分享对你有用,请一键三连。
import scipy as sp
import math
import scipy.stats

################## Helper functions ##############
logBase = 2


def __partial_entropy_a_proba(proba):
    if proba == 0:
        return 0
    return -proba * math.log(proba, logBase)


def __cover_entropy(cover, allNodes):  # cover is a list of set, no com ID
    allEntr = []
    for com in cover:
        fractionIn = len(com) / len(allNodes)
        allEntr.append(sp.stats.entropy([fractionIn, 1 - fractionIn], base=logBase))

    return sum(allEntr)


def __com_pair_conditional_entropy(cl, clKnown, allNodes):  # cl1,cl2, snapshot_communities (set of nodes)
    # H(Xi|Yj ) =H(Xi, Yj ) − H(Yj )
    # h(a,n) + h(b,n) + h(c,n) + h(d,n)
    # −h(b + d, n)−h(a + c, n)
    # a: count agreeing on not belonging
    # b: count disagreeing : not in 1 but in 2
    # c: count disagreeing : not in 2 but in 1
    # d: count agreeing on belonging
    nbNodes = len(allNodes)

    a = len((allNodes - cl) - clKnown) / nbNodes
    b = len(clKnown - cl) / nbNodes
    c = len(cl - clKnown) / nbNodes
    d = len(cl & clKnown) / nbNodes

    if __partial_entropy_a_proba(a) + __partial_entropy_a_proba(d) > __partial_entropy_a_proba(
            b) + __partial_entropy_a_proba(c):
        entropyKnown = sp.stats.entropy([len(clKnown) / nbNodes, 1 - len(clKnown) / nbNodes], base=logBase)
        conditionalEntropy = sp.stats.entropy([a, b, c, d], base=logBase) - entropyKnown
        # print("normal",entropyKnown,sp.stats.entropy([a,b,c,d],base=logBase))
    else:
        conditionalEntropy = sp.stats.entropy([len(cl) / nbNodes, 1 - len(cl) / nbNodes], base=logBase)
    # print("abcd",a,b,c,d,conditionalEntropy,cl,clKnown)

    return conditionalEntropy  # *nbNodes


def __cover_conditional_entropy(cover, coverRef, allNodes, normalized=False):  # cover and coverRef and list of set
    X = cover
    Y = coverRef

    allMatches = []
    # print(cover)
    # print(coverRef)
    for com in cover:
        matches = [(com2, __com_pair_conditional_entropy(com, com2, allNodes)) for com2 in coverRef]
        bestMatch = min(matches, key=lambda c: c[1])
        HXY_part = bestMatch[1]
        if normalized:
            HX = __partial_entropy_a_proba(len(com) / len(allNodes)) + __partial_entropy_a_proba(
                (len(allNodes) - len(com)) / len(allNodes))
            if HX == 0:
                HXY_part = 1
            else:
                HXY_part = HXY_part / HX
        allMatches.append(HXY_part)
    # print(allMatches)
    to_return = sum(allMatches)
    if normalized:
        to_return = to_return / len(cover)
    return to_return


################## Main function ##############


def onmi(cover, coverRef, allNodes=None, variant="LFK"):  # cover and coverRef should be list of set, no community ID
    """
    Compute Overlapping NMI
    This implementation allows to compute 3 versions of the overlapping NMI
    LFK: The original implementation proposed by Lacichinetti et al.(1). The normalization of mutual information is done community by community
    MGH: In (2), McDaid et al. argued that the original NMI normalization was flawed and introduced a new (global) normalization by the max of entropy
    MGH_LFK: This is a variant of the LFK method introduced in (2), with the same type of normalization but done globally instead of at each community
    Results are checked to be similar to the C++ implementations by the authors of (2): https://github.com/aaronmcdaid/Overlapping-NMI
    :param cover: set of set of nodes
    :param coverRef:set of set of nodes
    :param allNodes: if for some reason you want to take into account the fact that both your cover are partial coverages of a larger graph. Keep default unless you know what you're doing
    :param variant: one of "LFK", "MGH", "MGH_LFK"
    :return: an onmi score
    :Reference:
    1. Lancichinetti, A., Fortunato, S., & Kertesz, J. (2009). Detecting the overlapping and hierarchical community structure in complex networks. New Journal of Physics, 11(3), 033015.
    2. McDaid, A. F., Greene, D., & Hurley, N. (2011). Normalized mutual information to evaluate overlapping community finding algorithms. arXiv preprint arXiv:1110.2515. Chicago
    """
    if (len(cover) == 0 and len(coverRef) != 0) or (len(cover) != 0 and len(coverRef) == 0):
        return 0
    if cover == coverRef:
        return 1

    if allNodes == None:
        allNodes = {n for c in coverRef for n in c}
        allNodes |= {n for c in cover for n in c}

    if variant == "LFK":
        HXY = __cover_conditional_entropy(cover, coverRef, allNodes, normalized=True)
        HYX = __cover_conditional_entropy(coverRef, cover, allNodes, normalized=True)
    else:
        HXY = __cover_conditional_entropy(cover, coverRef, allNodes)
        HYX = __cover_conditional_entropy(coverRef, cover, allNodes)

    HX = __cover_entropy(cover, allNodes)
    HY = __cover_entropy(coverRef, allNodes)
    # print("HX:", HX, "  ", "HY:", HY,"  HXY:", HXY, "  ", "HYX:", HYX)
    NMI = -10
    if variant == "LFK":
        NMI = 1 - (0.5 * (HXY + HYX))
    elif variant == "MGH_LFK":
        if HXY == 0 or HX == 0:
            a = 0
        else:
            a = HXY / HX
        if HYX == 0 or HY == 0:
            b = 0
        else:
            b = HYX / HY
        NMI = 1 - 0.5 * (a + b)
    elif variant == "MGH":
        IXY = 0.5 * (HX - HXY + HY - HYX)
        NMI = IXY / (max(HX, HY))
    if NMI < 0 or NMI > 1 or math.isnan(NMI):
        # print("NMI: %s  from %s %s %s %s " % (NMI, HXY, HYX, HX, HY))
        raise Exception("incorrect NMI")
    return NMI


if __name__ == "__main__":
    c = [{512, 513, 642, 132, 260, 517, 646, 136, 392, 648, 395, 526, 912, 277, 919, 152, 921, 161, 289, 673, 677, 933, 296, 297, 170, 681, 44, 809, 936, 559, 305, 433, 947, 564, 823, 824, 825, 314, 187, 953, 317, 67, 324, 837, 967, 585, 335, 847, 209, 594, 595, 598, 727, 217, 478, 96, 608, 98, 228, 614, 615, 616, 363, 113, 244, 372, 118, 374, 501, 888, 507}, {770, 903, 521, 651, 272, 533, 537, 410, 923, 159, 32, 802, 421, 40, 298, 172, 431, 434, 690, 694, 440, 568, 316, 323, 712, 74, 75, 586, 461, 843, 208, 340, 342, 855, 601, 218, 476, 990, 352, 994, 489, 362, 750, 367, 629, 633, 378, 895}, {386, 4, 389, 134, 775, 138, 652, 141, 16, 656, 403, 534, 23, 538, 417, 930, 292, 550, 935, 430, 175, 179, 692, 55, 951, 952, 698, 59, 443, 962, 332, 716, 718, 591, 336, 467, 982, 603, 732, 737, 865, 100, 230, 870, 104, 232, 361, 364, 625, 887, 255}, {892, 777, 14, 271, 783, 17, 405, 794, 926, 162, 549, 294, 39, 684, 943, 690, 949, 54, 191, 198, 840, 203, 215, 89, 349, 605, 861, 483, 359, 487, 114, 373, 127, 383}, {384, 897, 259, 510, 902, 139, 910, 144, 273, 785, 663, 26, 539, 31, 804, 303, 304, 562, 818, 439, 312, 696, 188, 449, 835, 838, 199, 328, 333, 590, 207, 977, 341, 597, 600, 988, 735, 482, 103, 744, 632, 748, 880, 503, 376, 122, 763, 254}, {8, 524, 528, 529, 18, 531, 25, 544, 547, 560, 63, 575, 576, 582, 583, 79, 592, 607, 101, 107, 626, 124, 638, 639, 650, 653, 662, 667, 682, 685, 183, 184, 705, 196, 197, 719, 211, 213, 726, 730, 219, 740, 742, 234, 746, 752, 753, 242, 764, 252, 766, 256, 257, 772, 261, 265, 268, 269, 278, 281, 793, 805, 325, 329, 348, 353, 357, 881, 891, 896, 389, 396, 401, 917, 406, 918, 408, 931, 424, 937, 944, 946, 446, 460, 471, 985, 987, 991, 997, 494, 502, 504, 506}, {2, 3, 5, 12, 15, 529, 532, 30, 38, 553, 42, 557, 46, 50, 52, 578, 584, 76, 77, 81, 599, 95, 97, 617, 620, 624, 116, 628, 637, 644, 133, 140, 657, 659, 157, 669, 160, 679, 169, 192, 707, 708, 713, 202, 729, 238, 239, 754, 243, 755, 759, 249, 784, 787, 798, 287, 801, 300, 812, 311, 826, 830, 321, 848, 864, 358, 369, 370, 371, 901, 905, 394, 398, 912, 404, 412, 413, 414, 925, 416, 928, 934, 426, 958, 959, 451, 965, 973, 468, 983, 484, 499}, {904, 765, 143, 785, 150, 542, 670, 293, 808, 558, 180, 185, 319, 66, 452, 453, 839, 464, 486, 870, 112, 886, 247, 248, 377, 635, 509}, {641, 898, 131, 515, 645, 7, 649, 654, 914, 148, 792, 800, 37, 43, 299, 555, 308, 568, 441, 697, 571, 828, 964, 721, 338, 468, 724, 93, 734, 102, 231, 105, 233, 745, 877, 625, 246, 250, 381, 767}, {385, 387, 135, 778, 397, 142, 655, 927, 418, 422, 426, 683, 569, 444, 573, 833, 71, 204, 337, 210, 860, 477, 609, 488, 492, 110, 751}, {768, 1, 769, 516, 773, 393, 522, 781, 270, 399, 657, 894, 913, 788, 915, 151, 153, 154, 156, 796, 797, 929, 419, 41, 173, 815, 307, 310, 58, 315, 572, 954, 195, 963, 457, 717, 590, 722, 84, 852, 86, 470, 472, 346, 731, 221, 94, 223, 480, 481, 867, 360, 617, 875, 621, 241, 497, 116, 884, 760, 126}, {6, 520, 11, 523, 540, 541, 545, 35, 47, 561, 57, 64, 68, 73, 82, 83, 596, 88, 613, 622, 111, 115, 631, 120, 121, 634, 123, 128, 146, 158, 176, 177, 688, 689, 703, 194, 711, 714, 715, 725, 747, 243, 258, 771, 263, 264, 782, 276, 278, 288, 803, 295, 810, 302, 822, 313, 827, 318, 322, 858, 860, 350, 862, 872, 362, 876, 365, 366, 882, 380, 893, 382, 388, 390, 907, 908, 920, 437, 955, 956, 447, 960, 961, 969, 970, 463, 466, 978, 475, 993, 498}, {0, 774, 9, 660, 21, 33, 45, 687, 178, 699, 829, 574, 72, 841, 458, 459, 334, 859, 351, 610, 871, 490, 108, 245, 757, 375, 636}, {518, 530, 535, 536, 28, 551, 552, 556, 565, 566, 580, 69, 587, 593, 87, 602, 91, 606, 119, 129, 130, 642, 137, 658, 158, 671, 166, 678, 168, 680, 688, 181, 182, 200, 206, 720, 212, 220, 736, 226, 739, 228, 761, 251, 262, 266, 780, 275, 290, 807, 811, 813, 814, 309, 326, 330, 845, 849, 853, 344, 345, 856, 857, 863, 354, 356, 916, 407, 409, 427, 940, 950, 442, 450, 966, 971, 972, 975, 976, 980, 469, 474, 995, 996, 485, 493, 495, 505, 508}, {512, 519, 525, 527, 20, 24, 29, 543, 546, 548, 48, 49, 560, 563, 567, 56, 570, 61, 581, 70, 78, 80, 89, 93, 99, 611, 630, 643, 149, 664, 155, 691, 693, 701, 702, 201, 720, 723, 216, 728, 222, 225, 738, 743, 235, 253, 779, 268, 786, 789, 282, 283, 285, 286, 306, 820, 831, 320, 832, 836, 842, 846, 851, 346, 347, 873, 874, 368, 372, 899, 906, 909, 911, 924, 415, 420, 425, 938, 939, 942, 432, 436, 948, 445, 448, 454, 462, 974, 979, 984, 992, 491}, {655, 145, 402, 659, 790, 791, 666, 27, 284, 675, 423, 819, 65, 709, 331, 587, 854, 90, 986, 92, 479, 227, 106, 619, 109, 237, 749, 878, 883, 885}, {900, 391, 274, 147, 22, 799, 672, 674, 163, 36, 165, 806, 554, 686, 816, 817, 189, 190, 62, 193, 710, 327, 456, 850, 85, 604, 355, 488, 623, 240, 879, 500, 890, 766}, {514, 3, 5, 10, 13, 19, 34, 51, 53, 60, 577, 579, 588, 77, 589, 91, 612, 618, 627, 117, 125, 640, 647, 653, 661, 151, 665, 668, 164, 676, 167, 171, 174, 695, 186, 700, 704, 706, 205, 214, 733, 735, 224, 229, 741, 236, 756, 758, 762, 776, 267, 780, 273, 279, 280, 795, 286, 291, 301, 821, 830, 834, 836, 844, 339, 343, 866, 868, 869, 889, 379, 400, 922, 411, 932, 428, 429, 941, 945, 946, 435, 438, 957, 455, 968, 465, 981, 473, 986, 989, 486, 998, 999, 496, 511}]
    d = [{0, 774, 9, 660, 21, 33, 45, 687, 178, 699, 829, 574, 72, 841, 458, 459, 334, 859, 351, 610, 871, 490, 108, 245, 757, 375, 636}, {768, 1, 769, 516, 773, 393, 522, 781, 270, 399, 657, 894, 913, 788, 915, 151, 153, 154, 156, 796, 797, 929, 419, 41, 173, 815, 307, 310, 58, 315, 572, 954, 195, 963, 457, 717, 722, 84, 852, 86, 470, 472, 346, 731, 221, 94, 223, 480, 481, 867, 360, 617, 875, 621, 241, 497, 884, 760, 126}, {2, 3, 12, 15, 529, 532, 30, 38, 553, 42, 557, 46, 50, 52, 578, 584, 76, 77, 81, 599, 95, 97, 617, 620, 624, 116, 628, 637, 644, 133, 140, 657, 157, 669, 160, 679, 169, 192, 707, 708, 713, 202, 729, 238, 239, 754, 755, 759, 249, 784, 787, 798, 287, 801, 300, 812, 311, 826, 830, 321, 848, 864, 358, 369, 370, 371, 901, 905, 394, 398, 912, 404, 412, 413, 414, 925, 416, 928, 934, 958, 959, 451, 965, 973, 468, 983, 484, 499}, {386, 4, 389, 134, 775, 138, 652, 141, 16, 656, 403, 534, 23, 538, 417, 930, 292, 550, 935, 430, 175, 179, 692, 55, 951, 952, 698, 59, 443, 962, 332, 716, 718, 591, 336, 467, 982, 603, 732, 737, 865, 100, 230, 870, 104, 232, 361, 364, 625, 887, 255}, {514, 5, 10, 13, 19, 34, 51, 53, 60, 577, 579, 588, 77, 589, 91, 612, 618, 627, 117, 125, 640, 647, 653, 661, 665, 668, 164, 676, 167, 171, 174, 695, 186, 700, 704, 706, 205, 214, 733, 735, 224, 229, 741, 236, 756, 758, 762, 776, 267, 780, 279, 280, 795, 286, 291, 301, 821, 830, 834, 836, 844, 339, 343, 866, 868, 869, 889, 379, 400, 922, 411, 932, 428, 429, 941, 945, 946, 435, 438, 957, 455, 968, 465, 981, 473, 986, 989, 486, 998, 999, 496, 511}, {6, 520, 11, 523, 540, 541, 545, 35, 47, 561, 57, 64, 68, 73, 82, 83, 596, 88, 613, 622, 111, 115, 631, 120, 121, 634, 123, 128, 146, 158, 176, 177, 688, 689, 703, 194, 711, 714, 715, 725, 747, 243, 258, 771, 263, 264, 782, 276, 278, 288, 803, 295, 810, 302, 822, 313, 827, 318, 322, 858, 860, 350, 862, 872, 876, 365, 366, 882, 380, 893, 382, 388, 390, 907, 908, 920, 437, 955, 956, 447, 960, 961, 969, 970, 463, 466, 978, 475, 993, 498}, {641, 898, 131, 515, 645, 7, 649, 654, 914, 148, 792, 800, 37, 43, 299, 555, 308, 441, 697, 571, 828, 964, 721, 338, 724, 93, 734, 102, 231, 105, 233, 745, 877, 625, 246, 250, 381, 767}, {8, 524, 528, 529, 18, 531, 25, 544, 547, 63, 575, 576, 582, 583, 79, 592, 607, 101, 107, 626, 124, 638, 639, 650, 653, 662, 667, 682, 685, 183, 184, 187, 705, 196, 197, 719, 211, 213, 726, 730, 219, 740, 742, 234, 746, 752, 753, 242, 764, 252, 766, 256, 257, 772, 261, 265, 268, 269, 278, 281, 793, 805, 325, 329, 348, 353, 357, 881, 891, 896, 389, 396, 401, 917, 406, 918, 408, 931, 424, 937, 944, 946, 446, 460, 471, 985, 987, 991, 997, 494, 502, 504, 506}, {900, 391, 10, 274, 147, 22, 799, 672, 674, 163, 36, 165, 806, 554, 686, 816, 817, 189, 190, 62, 193, 710, 327, 456, 850, 85, 604, 355, 623, 240, 879, 500, 890, 766}, {892, 777, 14, 271, 783, 17, 405, 794, 926, 162, 549, 294, 39, 684, 943, 949, 54, 191, 198, 840, 203, 215, 89, 349, 605, 861, 483, 359, 487, 114, 373, 127, 383}, {512, 519, 525, 527, 20, 24, 29, 543, 546, 548, 48, 49, 560, 563, 567, 56, 570, 61, 581, 70, 78, 80, 93, 99, 611, 630, 643, 149, 664, 155, 691, 693, 701, 702, 201, 723, 216, 728, 222, 225, 738, 743, 235, 253, 779, 786, 789, 282, 283, 285, 286, 306, 820, 831, 320, 832, 836, 842, 846, 851, 346, 347, 873, 874, 368, 899, 906, 909, 911, 924, 415, 420, 425, 938, 939, 942, 432, 436, 948, 445, 448, 454, 462, 974, 979, 984, 992, 491}, {384, 897, 259, 510, 902, 139, 910, 144, 273, 785, 663, 26, 539, 31, 804, 303, 304, 562, 818, 439, 312, 696, 188, 449, 835, 838, 199, 328, 333, 590, 207, 977, 341, 597, 600, 988, 735, 482, 103, 744, 632, 748, 880, 503, 376, 122, 763, 254}, {655, 145, 402, 659, 790, 791, 666, 27, 284, 675, 423, 819, 65, 709, 331, 854, 90, 986, 92, 479, 227, 106, 619, 109, 237, 749, 878, 883, 885}, {518, 530, 535, 536, 28, 551, 552, 556, 565, 566, 580, 69, 587, 593, 87, 602, 91, 606, 119, 129, 130, 642, 137, 658, 158, 671, 166, 678, 168, 680, 688, 181, 182, 200, 206, 720, 212, 220, 736, 226, 739, 228, 761, 251, 262, 266, 780, 275, 290, 807, 811, 813, 814, 309, 326, 330, 845, 849, 853, 344, 345, 856, 857, 863, 354, 356, 916, 407, 409, 427, 940, 950, 442, 450, 966, 971, 972, 975, 976, 980, 469, 474, 995, 996, 485, 493, 495, 505, 508}, {770, 903, 521, 651, 272, 533, 537, 410, 923, 159, 32, 802, 421, 40, 298, 172, 431, 434, 690, 694, 440, 568, 316, 323, 712, 74, 75, 586, 461, 843, 208, 340, 342, 855, 601, 218, 476, 990, 352, 994, 489, 362, 750, 367, 629, 633, 378, 895}, {512, 513, 642, 132, 260, 517, 646, 136, 392, 648, 395, 526, 912, 277, 919, 152, 921, 161, 289, 673, 677, 933, 296, 297, 170, 681, 44, 809, 936, 559, 305, 433, 947, 564, 823, 824, 825, 314, 187, 953, 317, 67, 324, 837, 967, 585, 335, 847, 209, 594, 595, 598, 727, 217, 478, 223, 96, 608, 98, 228, 614, 615, 616, 363, 113, 244, 372, 118, 374, 501, 888, 507}, {904, 765, 143, 785, 150, 542, 670, 293, 808, 558, 180, 185, 319, 66, 452, 453, 839, 464, 486, 870, 112, 886, 247, 248, 377, 635, 509}, {385, 387, 135, 778, 397, 142, 655, 927, 418, 422, 426, 683, 569, 444, 573, 833, 71, 204, 337, 210, 860, 477, 609, 488, 492, 110, 751}]
    print("LFK", onmi(c, d, variant='LFK'))
    print("MGH_LFK", onmi(c, d, variant='MGH_LFK'))
    print("MGH", onmi(c, d, variant='MGH'))

  • 10
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值