一、difflib的定义
1. 什么是difflib? 用来做什么?
difflib为python的标准库模块,无需安装。
作用时对比文本之间的差异。
并且支持输出可读性比较强的HTML文档,与Linux下的diff 命令相似。
在版本控制方面非常有用。
2. 符号理解
符号 含义
'-' 包含在第一个系列行中,但不包含第二个。
'+' 包含在第二个系列行中,但不包含第一个。
' ' 两个系列行一致
'?' 存在增量差异
'^' 存在差异字符
二、应用实例
(1)比较两个文本的不同(内置模块的文件对比)
"""
text1 = ''' 1. Beautiful is better than ugly.
2. Explicit is better than implicit.
3. Simple is better than complex.
4. Complex is better than complicated.
'''
text2 = ''' 1. Beautiful is better than ugly.
3. Simple is better than complex.
4. Complicated is better than complex.
5. Flat is better than nested.
'''
"""
import difflib
text1 = """
1. Beautiful is better than ugly.
2. Explicit is better than implicit.
3. Simple is better than complex.
4. Complex is better than complicated.
""".splitlines(keepends=False)
"""
splitlines()按行进行分割
返回一个包含各个行作为元素的列表
参数:
keepends=False 不包含换行符
keepends=True 保留换行符
"""
print(text1)
text2 = """
1. Beautiful is better than ugly.
3. Simple is better than complex.
4. Complicated is better than complex.
5. Flat is better than nested.
""".splitlines(keepends=False)
print(text2)
# 实现linux中类似diff命令的功能
d = difflib.Differ()
print(''.join(list(d.compare(text1,text2))))
结果:
['', ' 1. Beautiful is better than ugly.', ' 2. Explicit is better than implicit.', ' 3. Simple is better than complex.', ' 4. Complex is better than complicated.']
['', '1. Beautiful is better than ugly.', ' 3. Simple is better than complex.', ' 4. Complicated is better than complex.', ' 5. Flat is better than nested.']
- 1. Beautiful is better than ugly.? -
+ 1. Beautiful is better than ugly.- 2. Explicit is better than implicit.- 3. Simple is better than complex.+ 3. Simple is better than complex.? ++
- 4. Complex is better than complicated.? ^ ---- ^
+ 4. Complicated is better than complex.? ++++ ^ ^
+ 5. Flat is better than nested.
d = difflib.HtmlDiff()
htmlContent = d.make_file(text1,text2)
print(htmlContent)
结果:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=utf-8" />
<title></title>
<style type="text/css">
table.diff {font-family:Courier; border:medium;}
.diff_header {background-color:#e0e0e0}
td.diff_header {text-align:right}
.diff_next {background-color:#c0c0c0}
.diff_add {background-color:#aaffaa}
.diff_chg {background-color:#ffff77}
.diff_sub {background-color:#ffaaaa}
</style>
</head>
<body>
<table class="diff" id="difflib_chg_to0__top"
cellspacing="0" cellpadding="0" rules="groups" >
<colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
<colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
<tbody>
<tr><td class="diff_next" id="difflib_chg_to0__0"><a href="#difflib_chg_to0__0">f</a></td><td class="diff_header" id="from0_1">1</td><td nowrap="nowrap"></td><td class="diff_next"><a href="#difflib_chg_to0__0">f</a></td><td class="diff_header" id="to0_1">1</td><td nowrap="nowrap"></td></tr>
<tr><td class="diff_next"><a href="#difflib_chg_to0__top">t</a></td><td class="diff_header" id="from0_2">2</td><td nowrap="nowrap"><span class="diff_sub"> </span>1. Beautiful is better than ugly.</td><td class="diff_next"><a href="#difflib_chg_to0__top">t</a></td><td class="diff_header" id="to0_2">2</td><td nowrap="nowrap">1. Beautiful is better than ugly.</td></tr>
<tr><td class="diff_next"></td><td class="diff_header" id="from0_3">3</td><td nowrap="nowrap"><span class="diff_sub"> 2. Explicit is better than implicit.</span></td><td class="diff_next"></td><td class="diff_header"></td><td nowrap="nowrap"></td></tr>
<tr><td class="diff_next"></td><td class="diff_header" id="from0_4">4</td><td nowrap="nowrap"> 3. Simple is better than complex.</td><td class="diff_next"></td><td class="diff_header" id="to0_3">3</td><td nowrap="nowrap"> 3.<span class="diff_add"> </span> Simple is better than complex.</td></tr>
<tr><td class="diff_next"></td><td class="diff_header" id="from0_5">5</td><td nowrap="nowrap"><span class="diff_sub"> 4. Complex is better than complicated.</span></td><td class="diff_next"></td><td class="diff_header" id="to0_4">4</td><td nowrap="nowrap"><span class="diff_add"> 4. Complicated is better than complex.</span></td></tr>
<tr><td class="diff_next"></td><td class="diff_header"></td><td nowrap="nowrap"></td><td class="diff_next"></td><td class="diff_header" id="to0_5">5</td><td nowrap="nowrap"><span class="diff_add"> 5. Flat is better than nested.</span></td></tr>
</tbody>
</table>
<table class="diff" summary="Legends">
<tr> <th colspan="2"> Legends </th> </tr>
<tr> <td> <table border="" summary="Colors">
<tr><th> Colors </th> </tr>
<tr><td class="diff_add"> Added </td></tr>
<tr><td class="diff_chg">Changed</td> </tr>
<tr><td class="diff_sub">Deleted</td> </tr>
</table></td>
<td> <table border="" summary="Links">
<tr><th colspan="2"> Links </th> </tr>
<tr><td>(f)irst change</td> </tr>
<tr><td>(n)ext change</td> </tr>
<tr><td>(t)op</td> </tr>
</table></td> </tr>
</table>
</body>
</html>
Process finished with exit code 0
htmlContent = d.make_file(text1,text2)
print(htmlContent)
with open('diff.html','w') as f: #将生成的结果写入文件中
f.write(htmlContent)
(2)、Linux 文件之间的对比
比较/mnt/passwd 和/mnt/passwd1 的不同 (/mnt/passwd1经过修改)
import difflib
filename1 = '/mnt/passwd'
filename2 = '/mnt/passwd1'
with open(filename1) as f1,open(filename2) as f2:
content1 = f1.read().splitlines(keepends=True)
content2 = f2.read().splitlines(keepends=True)
d = difflib.HtmlDiff()
htmlContent = d.make_file(content1,content2)
with open('passwdDiff.html','w') as f: ##将结果写入passwdDiff.html
f.write(htmlContent)