将列表中的每个项与(另一个或同一个)列表中的每个项进行比较的过程在数学上称为Cartesian product。Python有一个内置函数来完成这个任务:itertools.product它相当于嵌套for循环:
假设A和B是列表:for x in A:
for y in B:
print (x,y)
或者,更简洁地说:from itertools import product
for pair in product(A, B):
print pair
在您的例子中,您将一个列表的所有项与其自身进行比较,因此您可以编写product(texts, texts),但是product在本例中有可选的关键字参数repeat:product(A, repeat=4)的意思与product(A, A, A, A)相同。在
现在可以这样重写代码:from itertools import product
caesar = """BOOK I
I.--All Gaul is divided into three parts, one of which the Belgae
inhabit, the Aquitani another, those who in their own language are
called Celts, in ours Gauls, the third. All these differ from each other
in language, customs and laws."""
hamlet = """Who's there?"
"Nay, answer me. Stand and unfold yourself."
"Long live the King!"
"Barnardo!"
"He." (I.i.1-5)"""
macbeth = """ACT I SCENE I A desert place. Thunder and lightning.
[Thunder and lightning. Enter three Witches]
First Witch When shall we three meet again
In thunder, lightning, or in rain?
Second Witch When the hurlyburly's done,
When the battle's lost and won."""
texts = [caesar, hamlet, macbeth]
def similarity(x, y):
"""similarity based on length of the text,
substitute with similarity function from Natural Language Toolkit"""
return float(len(x))/len(y)
for pair in product(texts, repeat=2):
print "{}".format(similarity(*pair))