python读取大文件

最新推荐文章于 2024-04-11 21:49:50 发布

半九拾

最新推荐文章于 2024-04-11 21:49:50 发布

阅读量240

点赞数

分类专栏：大数据

本文链接：https://blog.csdn.net/b285795298/article/details/109197607

版权

大数据专栏收录该内容

5 篇文章 0 订阅

订阅专栏

read(size=-1)

Read up to size bytes from the object and return them. As a convenience, if size is unspecified or -1, all bytes until EOF are returned. Otherwise, only one system call is ever made. Fewer than size bytes may be returned if the operating system call returns fewer than size bytes.

If 0 bytes are returned, and size was not 0, this indicates end of file. If the object is in non-blocking mode and no bytes are available, None is returned.

readline(size=-1)

Read and return one line from the stream. If size is specified, at most size bytes will be read.

The line terminator is always b'\n' for binary files; for text files, the newline argument to open() can be used to select the line terminator(s) recognized.

readlines(hint=-1)

Read and return a list of lines from the stream. hint can be specified to control the number of lines read: no more lines will be read if the total size (in bytes/characters) of all lines so far exceeds hint.

Note that it’s already possible to iterate on file objects using for line in file: ... without calling file.readlines().

处理7GB的文本文件：

# File: readline-example-3.py
 
file = open("sample.txt")
 
while 1:
    lines = file.readlines(100000)
    if not lines:
        break
    for line in lines:
        pass # do something**strong text**

为了每秒处理96,900行文本。其他作者建议使用islice（）

from itertools import islice
 
with open(...) as f:
    while True:
        next_n_lines = list(islice(f, n))
        if not next_n_lines:
            break
        # process next_n_lines

list(islice(f, n))将返回n文件的下一行列表f。在循环中使用它将为您提供大量n行的文件

itertools.islice(iterable, start, stop[, step])

Make an iterator that returns selected elements from the iterable. If start is non-zero, then elements from the iterable are skipped until start is reached. Afterward, elements are returned consecutively unless step is set higher than one which results in items being skipped. If stop is None, then iteration continues until the iterator is exhausted, if at all; otherwise, it stops at the specified position. Unlike regular slicing, islice() does not support negative values for start, stop, or step. Can be used to extract related fields from data where the internal structure has been flattened (for example, a multi-line report may list a name field on every third line). Roughly equivalent to:

def islice(iterable, *args):
    # islice('ABCDEFG', 2) --> A B
    # islice('ABCDEFG', 2, 4) --> C D
    # islice('ABCDEFG', 2, None) --> C D E F G
    # islice('ABCDEFG', 0, None, 2) --> A C E G
    s = slice(*args)
    start, stop, step = s.start or 0, s.stop or sys.maxsize, s.step or 1
    it = iter(range(start, stop, step))
    try:
        nexti = next(it)
    except StopIteration:
        # Consume *iterable* up to the *start* position.
        for i, element in zip(range(start), iterable):
            pass
        return
    try:
        for i, element in enumerate(iterable):
            if i == nexti:
                yield element
                nexti = next(it)
    except StopIteration:
        # Consume to *stop*.
        for i, element in zip(range(i + 1, stop), iterable):
            pass

If start is None, then iteration starts at zero. If step is None, then the step defaults to one.

半九拾

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
1
评论
python读取大文件

read(size=-1)Read up tosizebytes from the object and return them. As a convenience, ifsizeis unspecified or -1, all bytes until EOF are returned. Otherwise, only one system call is ever made. Fewer thansizebytes may be returned if the operating sys...
复制链接

扫一扫