项目中需要在不同操作系统上使用 GNU diff 命令,因此用 Python 自带的 difflib 库模拟它的输出格式。
代码
import difflib
import re
def diff(file1, file2, encoding='utf-8'):
"""
Simulates the output of GNU diff.
You can use `diff(f1, f2)` to simulate `diff -w f1 f2`
"""
texts = []
for f in [file1, file2]:
with open(f, 'r', encoding=encoding) as f:
text = f.read()
# Ignore whitespace characters
for i in '\t\r\v\f':
text = text.replace(i, '')
texts += [text.split('\n')]
text1, text2 = texts
output = []
new_part = True
num = 0
for line in difflib.unified_diff(text1, text2, fromfile=file1, tofile=file2, n=0, lineterm=''):
num += 1
if num < 3:
continue
flag = line[0]
if flag == '-': # line unique to sequence 1
new_flag = '< '
elif flag == '+': # line unique to sequence 2
new_flag = '> '
if new_part:
new_part = False
output += ['---']
elif flag == ' ': # line common to both sequences
# new_flag = b' '
continue
elif flag == '?': # line not present in either input sequence
new_flag = '? '
elif flag == '@':
output += [re.sub(r'@@ -([^ ]+) \+([^ ]+) @@', r'\1c\2', line)]
new_part = True
continue
else:
new_flag = flag
output += [new_flag + line[1:]]
return '\n'.join(output)
效果演示
GNU diff 的输出格式:
will@Will:/nt/d/1$ diff -w 1.txt 2.txt
1,3c1,2
< 1
< 2
< 3
---
> Hello
> World
用 Python 模拟的输出格式:
>>> print(diff('1.txt', '2.txt'))
1,3c1,2
< 1
< 2
< 3
---
> Hello
> World
可见已经能成功模拟。