序列匹配器将为您完成任务。调整得分比率以获得更好的结果。
尝试这个:
from difflib import SequenceMatcher
sentence_list = ["I love cat", "I love dog", "I love fish", "I hate banana", "I hate apple", "I hate orange"]
result=[]
for sentence in sentence_list:
if(len(result)==0):
result.append([sentence])
else:
for i in range(0,len(result)):
score=SequenceMatcher(None,sentence,result[i][0]).ratio()
if(score<0.5):
if(i==len(result)-1):
result.append([sentence])
else:
if(score != 1):
result[i].append(sentence)输出:
[['I love cat', 'I love dog', 'I love fish'], ['I hate banana', 'I hate apple', 'I hate orange']]