我有以下代码
import nltk, os, json, csv, string, cPickle
from scipy.stats import scoreatpercentile
lmtzr = nltk.stem.wordnet.WordNetLemmatizer()
def sanitize(wordList):
answer = [word.translate(None, string.punctuation) for word in wordList]
answer = [lmtzr.lemmatize(word.lower()) for word in answer]
return answer
words = []
for filename in json_list:
words.extend([sanitize(nltk.word_tokenize(' '.join([tweet['text']
for tweet in json.load(open(filename,READ))])))])
当我写的时候,我在一个单独的tests.py文件中测试了第2-4行
import nltk, os, json, csv, string, cPickle
from scipy.stats import scoreatpercentile
wordList= ['\'the', 'the', '"the']
print wordList
wordList2 = [word.translate(None, string.punctuation) for word in wordList]
print wordList2
answer = [lmtzr.lemmatize(word.lower()) for word in wordList2]
print answer
freq = nltk.FreqDist(wordList2)
print freq
并且命令提示符返回[‘the’,’the’,’the’],这是我想要的(删除标点符号)。
但是,当我将完全相同的代码放在不同的文件中时,python会返回一个TypeError
File "foo.py", line 8, in
for tweet in json.load(open(filename, READ))])))])
File "foo.py", line 2, in sanitize
answer = [word.translate(None, string.punctuation) for word in wordList]
TypeError: translate() takes exactly one argument (2 given)
json_list是所有文件路径的列表(我打印并检查此列表是否有效)。我对此TypeError感到困惑,因为当我只是在另一个文件中测试时,所有的工作都很好。