1.大小写转换
sentence.lower()
2.去除标点符号
import string
punct = str.maketrans('','',string.punctuation)
sentence.translate.(punct)
3. 分词
按照空格分词就好。
sentence.split(' ')
4.去除暂停词
from nltk.corpus import stopwords
stop = set(stopwords.words('english'))
sentence = [w for w in sentence if not w in stop]
参考链接:https://blog.csdn.net/weixin_43216017/article/details/88324093