python清洗数据去除停用词_在python中从短语中删除自定义停止词-CSDN博客

在我进一步处理输入之前，我试图从用户输入中删除某些短语和单词，而在尝试这样做时，我遇到了一个问题，即“索引超出范围”错误，完全卡住了。我怎么解决这个问题？

我将输入短语转换为一个字符串，将其转换为一个列表来比较每个单词，并将停止单词作为预定义列表。

输入示例：

[“好吧”，“你”，“知道”，“天气”，“糟糕”]

[“你”，“知道”，“什么”，“我”，“意思是”，“所以”，“只是”，“转”，“灯”，“开”]#Gets user input and removes the selected stop words from it and returns a filtered phrase back.

def stop_word_remover(phrase_list):

stop_words_lst = ["yo", "so", "well", "um", "a", "the","you know", "i mean"]

#initalize clean phrase string

clean_input_phrase= ""

#copying phrase_list into a new variable for stopword removal.

Copy_phrase_list = list(phrase_list)

#Cleanup loop

for i in range(1,len(phrase_list)):

has_stop_words = False

for x in range(len(stop_words_lst)):

has_stop_words = False

#if one of the stop words matches the word passed by the first main loop the flag is raised.

if (phrase_list[i-1]+" "+phrase_list[i]) == stop_words_lst[x].strip():

has_stop_words = True

# this if statement adds the word of the phrase only if the flag is not raised thus making sure all the stop words are filtered out

if has_stop_words == True:

Copy_phrase_list.remove(Copy_phrase_list[i-1])

#first for loop takes a individual words of the phrase given and makes a loop until the whole phrase goes through one word at a time

for i in range(len(Copy_phrase_list)):

#flag initialized for marking stop words

has_stop_words = False

#second loop takes all the stop words and compares them to the first word passed on by the first loop to sheck for a stop word

for x in range(len(stop_words_lst)):

#if one of the stop words matches the word passed by the first main loop the flag is raised.

if Copy_phrase_list[i] == stop_words_lst[x].strip():

has_stop_words = True

# this if statement adds the word of the phrase only if the flag is not raised thus making sure all the stop words are filtered out

if has_stop_words == False:

clean_input_phrase += str(Copy_phrase_list[i]) +" "

return clean_input_phrase