python批量读取文件行数,如何在Python中便宜地获取大文件的行数？

最新推荐文章于 2022-10-10 17:54:17 发布

深蓝保

最新推荐文章于 2022-10-10 17:54:17 发布

阅读量141

点赞数

文章标签： python批量读取文件行数

I need to get a line count of a large file (hundreds of thousands of lines) in python. What is the most efficient way both memory- and time-wise?

At the moment I do:

def file_len(fname):

with open(fname) as f:

for i, l in enumerate(f):

pass

return i + 1

is it possible to do any better?

解决方案

I had to post this on a similar question until my reputation score jumped a bit (thanks to whoever bumped me!).

All of these solutions ignore one way to make this run considerably faster, namely by using the unbuffered (raw) interface, using bytearrays, and doing your own buffering. (This only applies in Python 3. In Python 2, the raw interface may or may not be used by default, but in Python 3, you'll default into Unicode.)

Using a modified version of the timing tool, I believe the following code is faster (and marginally more pythonic) than any of the solutions offered:

def rawcount(filename):

f = open(filename, 'rb')

lines = 0

buf_size = 1024 * 1024

read_f = f.raw.read

buf = read_f(buf_size)

while buf:

lines += buf.count(b'\n')

buf = read_f(buf_size)

return lines

Using a separate generator function, this runs a smidge faster: