我是python编程的新手.我正在尝试使用具有两列字符串值的csv文件,并希望比较两列之间字符串的相似性比率.然后我想取值并在另一个文件中输出比率.
csv可能如下所示:
Column 1|Column 2
tomato|tomatoe
potato|potatao
apple|appel
我希望输出文件显示每一行,第1列中的字符串与第2列的相似程度.我使用difflib输出比率分数.
这是我到目前为止的代码:
import csv
import difflib
f = open('test.csv')
csf_f = csv.reader(f)
row_a = []
row_b = []
for row in csf_f:
row_a.append(row[0])
row_b.append(row[1])
a = row_a
b = row_b
def similar(a, b):
return difflib.SequenceMatcher(a, b).ratio()
match_ratio = similar(a, b)
match_list = []
for row in match_ratio:
match_list.append(row)
with open("output.csv", "wb") as f:
writer = csv.writer(f, delimiter=',')
writer.writerows(match_list)
f.close()
我收到错误:
Traceback (most recent call last):
File "comparison.py", line 24, in
for row in match_ratio:
TypeError: 'float' object is not iterable
我觉得我没有正确导入列列表并针对sequencematcher函数运行它.