python根据词性进行词频统计_如何根据词性来确定语篇中的词频？

最新推荐文章于 2022-07-26 17:09:25 发布

chipsmile2017

最新推荐文章于 2022-07-26 17:09:25 发布

阅读量415

点赞数

文章标签： python根据词性进行词频统计

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_28976179/article/details/114440872

版权

这篇博客介绍如何使用Python的nltk库进行词性标注，去除停用词、单字符和数字，并通过词形还原来统计词频。虽然nltk在词性分析上表现出色，但面对同形异义词（如'brown'）的词频统计，提出了如何根据词性区分不同含义的挑战。

摘要由CSDN通过智能技术生成

我可以很容易地得到最常见的单词：stopwords = set(nltk.corpus.stopwords.words('english'))

tagged_words = nltk.word_tokenize(text)

tagged_words = nltk.pos_tag(tagged_words)

# Remove single-character tokens (mostly punctuation)

tagged_words = [tagged_word for tagged_word in tagged_words if len(tagged_word[0]) > 1]

# Remove numbers

tagged_words = [tagged_word for tagged_word in tagged_words if not tagged_word[0].isnumeric()]

# Remove stopwords

if remove_stopwords:

tagged_words = [tagged_word for tagged_word in tagged_words if tagged_word[0] not in stopwords]

# Dark magic

lemmatizer = nltk.stem.WordNetLemmatizer()

words = []

for tagged_word in tagged_words:

pos = wordnet_pos_code(tagged_word[1])

# Ignoring all words, except nouns, verbs, adjectives and adverbs

if pos is not None:

words.append({'word':lemmatizer.lemmatize(tagged_word[0], pos=pos), 'pos':tagged_word[1]})

# Calculate frequency distribution

fdist = nltk.FreqDist(words)

# Return top % words_count % words

res = []

for word, frequency in fdist.most_common(words_count):

word_dict = {}

word_dict['word'] = word

word_dict['count'] = frequency

res.append(word_dict)

return res

但我有一些词，比如“Brown”是人名，“Brown”是颜色，它们是不一样的。好的，我可以用大写字母核对。但如果我得到的是：Brown is not just a color. Brown is part of lifestyle. And Mr Brown should agree with me.

所以，nltk对词性分析做得很好。但如何才能得到最常见的词取决于词性？在

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python根据词性进行词频统计_如何根据词性来确定语篇中的词频？

我可以很容易地得到最常见的单词：stopwords = set(nltk.corpus.stopwords.words('english'))tagged_words = nltk.word_tokenize(text)tagged_words = nltk.pos_tag(tagged_words)# Remove single-character tokens (mostly punctuat...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。