difflib简介
我们知道,在shell中,可以直接使用vimdiff
命令比对文本。
vimdiff a.txt b.txt
在Python中如何对比文件差异呢?
我们可以导入difflib
库,difflib
是python的标准库,无需安装。
- 作用:对比文本之间的差异,而且支持输出可读性比较强的HTML文档。
常用函数
difflib.Differ()
difflib.Differ().compare(text1,text2)
可以以类似shell中的diff方法显示差异,其中符号的含义如下表:
符号 | 含义 |
---|---|
+ | 包含在第二个系列行中,但不包含第一个 |
- | 包含在第一个系列行中,但不包含第二个 |
’ ’ | 空字符串表示两个系列行一致 |
? | 存在增量差异 |
^ | 存在差异字符 |
示例
import difflib
a = '''
1. Beautiful is better than ugly.
2. Explicit is better than implicit.
3. Simple is better than complex.
4. Complex is better than complicated.
'''.splitlines(keepends=True) # 保留换行符
b = '''
1. Beautiful is better than ugly.
3. Simple is better than complex.
4. Complicated is better than complex.
5. Flat is better than nested.
'''.splitlines(keepends=False) # 不保留换行符
print(a)
print(b)
diff_content = difflib.Differ().compare(a,b)
print(''.join(diff_content))
结果:
difflib.HtmlDiff()
当然,上面的方式看起来并不方便,所以我们使用此方法,可以更清晰地看出差异。
difflib.HtmlDiff()
可以用于创建一个完整HTML文件,该文件显示具有行间和行内更改突出的文本的逐行比较。
difflib.HtmlDiff().make_file(a.txt,b.txt)
比较两文件并返回一个字符串,该字符串是一个完整的HTML文件,其中包含一个表格,显示逐行差异,突出显示行间和行内更改。
difflib.HtmlDiff().make_table()
比较两文件并返回一个字符串,该字符串是一个完整的HTML表格,显示逐行差异,突出显示行间和行内更改。
示例
import difflib
a = '''
1. Beautiful is better than ugly.
2. Explicit is better than implicit.
3. Simple is better than complex.
4. Complex is better than complicated.
'''.splitlines(keepends=True) # 保留换行符
b = '''
1. Beautiful is better than ugly.
3. Simple is better than complex.
4. Complicated is better than complex.
5. Flat is better than nested.
'''.splitlines(keepends=False) # 不保留换行符
htmlContent = difflib.HtmlDiff().make_file(a,b)
# print(htmlContent)
with open('diff.html','w+') as f:
f.write(htmlContent)
结果:
import difflib
a = '''
1. Beautiful is better than ugly.
2. Explicit is better than implicit.
3. Simple is better than complex.
4. Complex is better than complicated.
'''.splitlines(keepends=True) # 保留换行符
b = '''
1. Beautiful is better than ugly.
3. Simple is better than complex.
4. Complicated is better than complex.
5. Flat is better than nested.
'''.splitlines(keepends=False) # 不保留换行符
htmlContent = difflib.HtmlDiff().make_table(a,b)
# print(htmlContent)
with open('diff.html','w+') as f:
f.write(htmlContent)
结果:
对比之下,还是difflib.HtmlDiff().make_file()
更好用一些。