I have multiple texts and I would like to create profiles of them based on their usage of various parts of speech, like nouns and verbs. Basially, I need to count how many times each part of speech is used.
I have tagged the text but am not sure how to go further:
tokens = nltk.word_tokenize(text.lower())
text = nltk.Text(tokens)
tags = nltk.pos_tag(text)
How can I save the counts for each part of speech into a variable?
解决方案
The pos_tag method gives you back a list of (token, tag) pairs:
tagged = [('the', 'DT'), ('dog', 'NN'), ('sees', 'VB'), ('the', 'DT'), ('cat', 'NN')]
If you are using Python 2.7 or later, then you can do it simply with:
>>> from collections import Counter
>>> counts = Cou