python输出两个字符串中间没有空格_在python中比较字符串时忽略空格

最新推荐文章于 2025-01-14 19:45:04 发布

weixin_39630126

最新推荐文章于 2025-01-14 19:45:04 发布

阅读量822

点赞数

文章标签： python输出两个字符串中间没有空格

本文详细解析了Python difflib库中SequenceMatcher类的isjunk参数使用方法及其实现原理。通过实例展示了如何正确设置isjunk来忽略特定字符对匹配长度的影响，并提供了一种替代方案。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

I am using difflib python package. No matter whether I set isjunk argument, the calculated ratios are the same. Isn't the difference of spaces ignored when isjunk is lambda x: x == " "?

In [193]: difflib.SequenceMatcher(isjunk=lambda x: x == " ", a="a b c", b="a bc").ratio()

Out[193]: 0.8888888888888888

In [194]: difflib.SequenceMatcher(a="a b c", b="a bc").ratio()

Out[194]: 0.8888888888888888

解决方案

isjunk works a little differently than you might think. In general, isjunk merely identifies one or more characters that do not affect the length of a match but that are still included in the total character count. For example, consider the following:

>>> SequenceMatcher(lambda x: x in "abcd", " abcd", "abcd abcd").ratio()

0.7142857142857143

The first four characters of the second string ("abcd") are all ignorable, so the second string can be compared to the first string beginning with the space. Starting with the space in both the first string and the second string, then, the above SequenceMatcher finds ten matching characters (five in each string) and 4 non-matching characters (the ignorable first four characters in the second string). This gives you a ratio of 10/14 (0.7142857142857143).

In your case, then, the first string "a b c" matches the second string at indices 0, 1, and 2 (with values "a b"). Index 3 of the first string (" ") does not have a match but is ignored with regard to the length of the match. Since the space is ignored, index 4 ("c") matches index 3 of the second string. Thus 8 of your 9 characters match, giving you a ratio of 0.88888888888888.

You might want to try this instead:

>>> c = a.replace(' ', '')

>>> d = b.replace(' ', '')

>>> difflib.SequenceMatcher(a=c, b=d).ratio()

1.0