您可以看到它认为是匹配的块:>>> difflib.SequenceMatcher(isjunk=lambda x: x == " ", a="a b c", b="a bc").get_matching_blocks()
[Match(a=0, b=0, size=3), Match(a=4, b=3, size=1), Match(a=5, b=4, size=0)]
前两个告诉你它匹配“a b”到“ab”和“c”到“c”。(最后一个是琐碎的)
问题是为什么“a b”可以匹配。我在密码里找到了答案。首先,算法通过反复调用find_longest_match来查找一组匹配块。find_longest_match值得注意的是,它允许在字符串末尾存在垃圾字符:If isjunk is defined, first the longest matching block is
determined as above, but with the additional restriction that no
junk element appears in the block. Then that block is extended as
far as possible by matching (only) junk elements on both sides. So
the resulting block never matches on junk except as identical junk
happens to be adjacent to an "interesting" match.
这意味着首先它认为“a”和“b”是匹配的(允许在“a”的末尾和“b”的开头使用空格字符)。
然后,有趣的部分是:代码执行最后一次检查,看看是否有任何块是相邻的,然后合并它们。请参见代码中的以下注释:# It's possible that we have adjacent equal blocks in the
# matching_blocks list now. Starting with 2.5, this code was added
# to collapse them.
所以基本上是匹配“a”和“b”,然后将这两个块合并成“ab”,并称之为匹配,尽管空格字符是垃圾。