python字典修改键值_Python:如何更新嵌套字典中键值对的值?

1586010002-jmsa.png

i am trying to make an inversed document index, therefore i need to know from all unique words in a collection in which doc they occur and how often.

i have used this answer in order two create a nested dictionary. The provided solution works fine, with one problem though.

First i open the file and make a list of unique words. These unique words i than want to compare with the original file. When there is a match, the frequency counter should be updated and its value be stored in the two dimensional array.

output should eventually look like this:

word1, {doc1 : freq}, {doc2 : freq}

word2, {doc1 : freq}, {doc2 : freq}, {doc3:freq}

etc....

Problem is that i cannot update the dictionary variable. When trying to do so i get the error:

File "scriptV3.py", line 45, in main

freq = dictionary[keyword][filename] + 1

TypeError: unsupported operand type(s) for +: 'AutoVivification' and 'int'

I think i need to cast in some way the instance of AutoVivification to int....

How to go?

thanks in advance

my code:

#!/usr/bin/env python

# encoding: utf-8

import sys

import os

import re

import glob

import string

import sets

class AutoVivification(dict):

"""Implementation of perl's autovivification feature."""

def __getitem__(self, item):

try:

return dict.__getitem__(self, item)

except KeyError:

value = self[item] = type(self)()

return value

def main():

pad = 'temp/'

dictionary = AutoVivification()

docID = 0

for files in glob.glob( os.path.join(pad, '*.html') ): #for all files in specified folder:

docID = docID + 1

filename = "doc_"+str(docID)

text = open(files, 'r').read() #returns content of file as string

text = extract(text, '

', '
') #call extract function to extract text from within
 tags

text = text.lower() #all words to lowercase

exclude = set(string.punctuation) #sets list of all punctuation characters

text = ''.join(char for char in text if char not in exclude) # use created exclude list to remove characters from files

text = text.split() #creates list (array) from string

uniques = set(text) #make list unique (is dat handig? we moeten nog tellen)

for keyword in uniques: #For every unique word do

for word in text: #for every word in doc:

if (word == keyword and dictionary[keyword][filename] is not None): #if there is an occurence of keyword increment counter

freq = dictionary[keyword][filename] #here we fail, cannot cast object instance to integer.

freq = dictionary[keyword][filename] + 1

print(keyword,dictionary[keyword])

else:

dictionary[word][filename] = 1

#extract text between substring 1 and 2

def extract(text, sub1, sub2):

return text.split(sub1, 1)[-1].split(sub2, 1)[0]

if __name__ == '__main__':

main()

解决方案

One could use Python's collections.defaultdict instead of creating an AutoVivification class and then instantiating dictionary as an object of that type.

import collections

dictionary = collections.defaultdict(lambda: collections.defaultdict(int))

This will create a dictionary of dictionaries with a default value of 0. When you wish to increment an entry, use:

dictionary[keyword][filename] += 1

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值