我有以下代码摘录,用于使用NLTK查找给定输入文本’sample.txt’中所有单词的音节数:
import re
import nltk
from curses.ascii import isdigit
from nltk.corpus import cmudict
import nltk.data
import pprint
d = cmudict.dict()
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
fp = open("sample.txt")
data = fp.read()
tokens = nltk.wordpunct_tokenize(data)
text = nltk.Text(tokens)
words = [w.lower() for w in text]
print words #to print all the words in input text
regexp = "[A-Za-z]+"
exp = re.compile(regexp)
def nsyl(word):
return max([len([y for y in x if isdigit(y[-1])]) for x in d[word]])
sum1 = 0
count = 0
count1 = 0
for a in words:
if exp.match(a)):
print a
print "no of syllables:",nysl(a)
sum1 = sum1 + nysl(a)
print "sum of syllables:",sum1
if nysl(a)<3:
count = count + 1
else:
count1 = count1 + 1
print "no of words with syll count less than 3:",count
print "no of complex words:",count1
此代码将使每个输入单词与cmu词典匹配,并给出该单词的音节数.但是,如果在词典中找不到该单词,或者我在输入中使用专有名词,它将无法正常工作并显示错误.我想检查字典中是否存在该单词,如果不存在,请跳过该单词,然后继续考虑下一个单词.我该怎么做呢?
解决方法:
我猜这是一个关键错误.将您的定义替换为
def nsyl(word):
lowercase = word.lowercase()
if lowercase not in d:
return -1
else:
return max([len([y for y in x if isdigit(y[-1])]) for x in d[lowercase]])
相反,您可以在调用nsyl之前先检查单词是否在字典中,然后不必在nsyl方法本身中担心这一点.
标签:nltk,python
来源: https://codeday.me/bug/20191208/2091572.html