I have a very large dictionary containing tuples as keys and their values. This dictionary is supposed to represent an adjacency matrix with word co-occurrence vectors, eg 'work' appears with 'experience' 16 times and 'work' appears with 'services' 15 times. Whether or not this is the preferred storage method is another issue (with the massive amount of data I have, nested dictionaries became a nightmare for traversal), but it's simply what I have for right now.
Frequency:{
('work', 'experience'): 16,
('work', 'services'): 25,
('must', 'services'): 15,
('data', 'services'): 10,
...
...}
Thanks to a previous post, I've been able to do a simple binary adjacency matrix with NetworkX, simply by using this methodology:
A=Frequency.keys()
networkx.Graph(A)
That result was great then, but my question is what do I have to do to convert Frequency into an adjacency matrix using its co-occurrence value as the value in the matrix, so that the result would it would look something along the lines of this:
array([[ 0., 16., 25., 0.],
[ 16., 0., 1., 0.],
[ 25., 1., 0., 1.],
[ 10., 0., 0., 0.]
...)
I apologize if this is similar to previous posts, but I just can't find the correct way to convert these tuples to a matrix that I can use in NetworkX. I'm assuming I would use numpy, but I cannot find any documentation for a method like this.
Thanks in advance,
Ron
解决方案
This answer may be of help. With your sample data:
>>> frequency = {('work', 'experience'): 16,
... ('work', 'services'): 25,
... ('must', 'services'): 15,
... ('data', 'services'): 10}
>>> keys = np.array(frequency.keys())
>>> vals = np.array(frequency.values())
>>> keys
array([['work', 'services'],
['must', 'services'],
['work', 'experience'],
['data', 'services']],
dtype='|S10')
>>> vals
array([25, 15, 16, 10])
>>> unq_keys, key_idx = np.unique(keys, return_inverse=True)
>>> key_idx = key_idx.reshape(-1, 2)
>>> unq_keys
array(['data', 'experience', 'must', 'services', 'work'],
dtype='|S10')
>>> key_idx
array([[4, 3],
[2, 3],
[4, 1],
[0, 3]])
>>> n = len(unq_keys)
>>> adj = np.zeros((n, n) ,dtype=vals.dtype)
>>> adj[key_idx[:,0], key_idx[: ,1]] = vals
>>> adj
array([[ 0, 0, 0, 10, 0],
[ 0, 0, 0, 0, 0],
[ 0, 0, 0, 15, 0],
[ 0, 0, 0, 0, 0],
[ 0, 16, 0, 25, 0]])
>>> adj += adj.T
>>> adj
array([[ 0, 0, 0, 10, 0],
[ 0, 0, 0, 0, 16],
[ 0, 0, 0, 15, 0],
[10, 0, 15, 0, 25],
[ 0, 16, 0, 25, 0]])