python怎么读单词_python-文本中单词的音节数

最新推荐文章于 2021-03-01 10:28:18 发布

weixin_39713814

最新推荐文章于 2021-03-01 10:28:18 发布

阅读量234

点赞数

文章标签： python怎么读单词

我有以下代码摘录,用于使用NLTK查找给定输入文本’sample.txt’中所有单词的音节数：

import re

import nltk

from curses.ascii import isdigit

from nltk.corpus import cmudict

import nltk.data

import pprint

d = cmudict.dict()

tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')

fp = open("sample.txt")

data = fp.read()

tokens = nltk.wordpunct_tokenize(data)

text = nltk.Text(tokens)

words = [w.lower() for w in text]

print words #to print all the words in input text

regexp = "[A-Za-z]+"

exp = re.compile(regexp)

def nsyl(word):

return max([len([y for y in x if isdigit(y[-1])]) for x in d[word]])

sum1 = 0

count = 0

count1 = 0

for a in words:

if exp.match(a)):

print a

print "no of syllables:",nysl(a)

sum1 = sum1 + nysl(a)

print "sum of syllables:",sum1

if nysl(a)<3:

count = count + 1

else:

count1 = count1 + 1

print "no of words with syll count less than 3:",count

print "no of complex words:",count1

此代码将使每个输入单词与cmu词典匹配,并给出该单词的音节数.但是,如果在词典中找不到该单词,或者我在输入中使用专有名词,它将无法正常工作并显示错误.我想检查字典中是否存在该单词,如果不存在,请跳过该单词,然后继续考虑下一个单词.我该怎么做呢？

解决方法:

我猜这是一个关键错误.将您的定义替换为

def nsyl(word):

lowercase = word.lowercase()

if lowercase not in d:

return -1

else:

return max([len([y for y in x if isdigit(y[-1])]) for x in d[lowercase]])

相反,您可以在调用nsyl之前先检查单词是否在字典中,然后不必在nsyl方法本身中担心这一点.

标签：nltk,python

来源： https://codeday.me/bug/20191208/2091572.html

weixin_39713814

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。