python批量读取文件行数,如何在Python中便宜地获取大文件的行数?

I need to get a line count of a large file (hundreds of thousands of lines) in python. What is the most efficient way both memory- and time-wise?

At the moment I do:

def file_len(fname):

with open(fname) as f:

for i, l in enumerate(f):

pass

return i + 1

is it possible to do any better?

解决方案

I had to post this on a similar question until my reputation score jumped a bit (thanks to whoever bumped me!).

All of these solutions ignore one way to make this run considerably faster, namely by using the unbuffered (raw) interface, using bytearrays, and doing your own buffering. (This only applies in Python 3. In Python 2, the raw interface may or may not be used by default, but in Python 3, you'll default into Unicode.)

Using a modified version of the timing tool, I believe the following code is faster (and marginally more pythonic) than any of the solutions offered:

def rawcount(filename):

f = open(filename, 'rb')

lines = 0

buf_size = 1024 * 1024

read_f = f.raw.read

buf = read_f(buf_size)

while buf:

lines += buf.count(b'\n')

buf = read_f(buf_size)

return lines

Using a separate generator function, this runs a smidge faster:

def _make_gen(reader):

b = reader(1024 * 1024)

while b:

yield b

b = reader(1024*1024)

def rawgencount(filename):

f = open(filename, 'rb')

f_gen = _make_gen(f.raw.read)

return sum( buf.count(b'\n') for buf in f_gen )

This can be done completely with generators expressions in-line using itertools, but it gets pretty weird looking:

from itertools import (takewhile,repeat)

def rawincount(filename):

f = open(filename, 'rb')

bufgen = takewhile(lambda x: x, (f.raw.read(1024*1024) for _ in repeat(None)))

return sum( buf.count(b'\n') for buf in bufgen )

Here are my timings:

function average, s min, s ratio

rawincount 0.0043 0.0041 1.00

rawgencount 0.0044 0.0042 1.01

rawcount 0.0048 0.0045 1.09

bufcount 0.008 0.0068 1.64

wccount 0.01 0.0097 2.35

itercount 0.014 0.014 3.41

opcount 0.02 0.02 4.83

kylecount 0.021 0.021 5.05

simplecount 0.022 0.022 5.25

mapcount 0.037 0.031 7.46

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值