python字典修改键值_Python：如何更新嵌套字典中键值对的值？

最新推荐文章于 2024-07-28 03:10:02 发布

weixin_39870132

最新推荐文章于 2024-07-28 03:10:02 发布

阅读量447

点赞数

文章标签： python字典修改键值

i am trying to make an inversed document index, therefore i need to know from all unique words in a collection in which doc they occur and how often.

i have used this answer in order two create a nested dictionary. The provided solution works fine, with one problem though.

First i open the file and make a list of unique words. These unique words i than want to compare with the original file. When there is a match, the frequency counter should be updated and its value be stored in the two dimensional array.

output should eventually look like this:

word1, {doc1 : freq}, {doc2 : freq}

word2, {doc1 : freq}, {doc2 : freq}, {doc3:freq}

etc....

Problem is that i cannot update the dictionary variable. When trying to do so i get the error:

File "scriptV3.py", line 45, in main

freq = dictionary[keyword][filename] + 1

TypeError: unsupported operand type(s) for +: 'AutoVivification' and 'int'

I think i need to cast in some way the instance of AutoVivification to int....

How to go?

thanks in advance

my code:

#!/usr/bin/env python

# encoding: utf-8

import sys

import os

import re

import glob

import string

import sets

class AutoVivification(dict):

"""Implementation of perl's autovivification feature."""

def __getitem__(self, item):

try:

return dict.__getitem__(self, item)

except KeyError:

value = self[item] = type(self)()

return value

def main():

pad = 'temp/'

dictionary = AutoVivification()

docID = 0

for files in glob.glob( os.path.join(pad, '*.html') ): #for all files in specified folder:

docID = docID + 1

filename = "doc_"+str(docID)

text = open(files, 'r').read() #returns content of file as string

text = extract(text, '

', '

') #call extract function to extract text from within

 tags

text = text.lower() #all words to lowercase

exclude = set(string.punctuation) #sets list of all punctuation characters

text = ''.join(char for char in text if char not in exclude) # use created exclude list to remove characters from files

text = text.split() #creates list (array) from string

uniques = set(text) #make list unique (is dat handig? we moeten nog tellen)

for keyword in uniques: #For every unique word do

for word in text: #for every word in doc:

if (word == keyword and dictionary[keyword][filename] is not None): #if there is an occurence of keyword increment counter

freq = dictionary[keyword][filename] #here we fail, cannot cast object instance to integer.

freq = dictionary[keyword][filename] + 1

print(keyword,dictionary[keyword])

else:

dictionary[word][filename] = 1

#extract text between substring 1 and 2

def extract(text, sub1, sub2):

return text.split(sub1, 1)[-1].split(sub2, 1)[0]

if __name__ == '__main__':

main()

解决方案

One could use Python's collections.defaultdict instead of creating an AutoVivification class and then instantiating dictionary as an object of that type.

import collections

dictionary = collections.defaultdict(lambda: collections.defaultdict(int))

This will create a dictionary of dictionaries with a default value of 0. When you wish to increment an entry, use:

dictionary[keyword][filename] += 1