前情提要
在学习NLP的新手教程Bag of Words Meets Bags of Popcorn这个比赛的Overview的教程里有一个把每篇文本切分成按句划分的单词list的预处理过程,原文的代码是这样的:
sentences = [] # Initialize an empty list of sentences
print "Parsing sentences from training set"
for review in train["review"]:
sentences += review_to_sentences(review, tokenizer)
print "Parsing sentences from unlabeled set"
for review in unlabeled_train["review"]:
sentences += review_to_sentences(review, tokenizer)
运行之后会花费长达五分钟的时间完成对7500