python 共现,Python单词列表中的共现矩阵

I have a list of names like:

names = ['A', 'B', 'C', 'D']

and a list of documents, that in each documents some of these names are mentioned.

document =[['A', 'B'], ['C', 'B', 'K'],['A', 'B', 'C', 'D', 'Z']]

I would like to get an output as a matrix of co-occurrences like:

A B C D

A 0 2 1 1

B 2 0 2 1

C 1 2 0 1

D 1 1 1 0

There is a solution (Creating co-occurrence matrix) for this problem in R, but I couldn't do it in Python. I am thinking of doing it in Pandas, but yet no progress!

解决方案

Obviously this can be extended for your purposes, but it performs the general operation in mind:

import math

for a in 'ABCD':

for b in 'ABCD':

count = 0

for x in document:

if a != b:

if a in x and b in x:

count += 1

else:

n = x.count(a)

if n >= 2:

count += math.factorial(n)/math.factorial(n - 2)/2

print '{} x {} = {}'.format(a, b, count)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值