python输出文本换行,获取Python中文本文件的换行统计

I had a nasty CRLF / LF conflict in git file that was probably committed from Windows machine. Is there a cross-platform way (preferably in Python) to detect what type of newlines is dominant through the file?

import sys

if not sys.argv[1:]:

sys.exit('usage: %s ' % sys.argv[0])

with open(sys.argv[1],"rb") as f:

d = f.read()

crlf, lfcr = d.count('\r\n'), d.count('\n\r')

cr, lf = d.count('\r'), d.count('\n')

print('crlf: %s' % crlf)

print('lfcr: %s' % lfcr)

print('cr: %s' % cr)

print('lf: %s' % lf)

print('\ncr-crlf-lfcr: %s' % (cr - crlf - lfcr))

print('lf-crlf-lfcr: %s' % (lf - crlf - lfcr))

print('\ntotal (lf+cr-2*crlf-2*lfcr): %s\n' % (lf + cr - 2*crlf - 2*lfcr))

But it gives the stats wrong (for this file):

crlf: 1123

lfcr: 58

cr: 1123

lf: 1123

cr-crlf-lfcr: -58

lf-crlf-lfcr: -58

total (lf+cr-2*crlf-2*lfcr): -116

解决方案import sys

def calculate_line_endings(path):

# order matters!

endings = [

b'\r\n',

b'\n\r',

b'\n',

b'\r',

]

counts = dict.fromkeys(endings, 0)

with open(path, 'rb') as fp:

for line in fp:

for x in endings:

if line.endswith(x):

counts[x] += 1

break

print(counts)

if __name__ == '__main__':

if len(sys.argv) == 2:

calculate_line_endings(sys.argv[1])

sys.exit('usage: %s ' % sys.argv[0])

Gives output for your file

crlf: 1123

lfcr: 0

cr: 0

lf: 0

Is it enough?

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值