python 的read 、readline和 readlines以及如何读取大文件


read()方法:


1、读取整个文件,将文件内容放到一个字符串变量中,如果需要对文件按行进行处理,则不可用该方法
2、如果文件大于可用内存(好几个G的),不可能使用这种处理,系统会报错:MemoryError 以下两种实现方式

f = open(r'C:\Users\DELL\Desktop\python\words.txt','r') #创建一个文件对象,也是一个可迭代对象
try:
    content = f.read()  #结果为str类型
    print (type(content))
    print (content)
finally:
    f.close()


with open(r'C:\Users\DELL\Desktop\python\words.txt','r') as f:#创建一个文件对象,也是一个可迭代对象
    content = f.read()  #结果为str类型
    print (type(content))
    print (content)

结果为:

<class 'str'>
Last sunday, it was a fine day. My friend and I went to Mount Daifu.
In the morning,we rode bikes to the foot of the mountain.After a short rest,we climbed the mountain.on its peak,
we shared the beautiful scenery in our eyes.
 
There were lots of light foggy clouds around us.What's more, varieties of birds flying around us were pretty.
What a harmony situation! At noon,we had lunch in a restaurant. With a happy emotion, we finished our climbing.
Although tired, we still felt happy and relaxed.we hope our country will become more and more beautiful.

readline()方法:


1、readline()每次读取一行,比readlines()慢得多
2、readline()返回的是一个字符串对象,保存当前行的内容,两种方式:

f = open(r'C:\Users\DELL\Desktop\python\words.txt','r')
try:
  while True:
      line = f.readline()
      if line:
          print (line)
      else:
          break
finally:
    f.close()


with open(r'C:\Users\DELL\Desktop\python\words.txt','r') as f:
      while True:
          line = f.readline()
          if line:
              print (line)
          else:
              break  #跳出循环,不写的if条件判断Line是不是读完了的话就是死循环

因为readline是按行读取,按行打印,而print函数默认输出完,需要跨行!所以每打印一行中间都有空一行

Last sunday, it was a fine day. My friend and I went to Mount Daifu.

In the morning,we rode bikes to the foot of the mountain.After a short rest,we climbed the mountain.on its peak,

we shared the beautiful scenery in our eyes.

 

There were lots of light foggy clouds around us.What's more, varieties of birds flying around us were pretty.

What a harmony situation! At noon,we had lunch in a restaurant. With a happy emotion, we finished our climbing.

Although tired, we still felt happy and relaxed.we hope our country will become more and more beautiful.

readlines()方法

1、一次性读取整个文件。

2、自动将文件内容分析成一个行的列表。两种实现方式如下

f = open(r'C:\Users\DELL\Desktop\python\words.txt','r')
try:
  lines = f.readlines()
  print ("type(lines)=",type(lines)) #type(lines)= <type 'list'>
  print (lines)
finally:
    f.close()


with open(r'C:\Users\DELL\Desktop\python\words.txt','r') as f:
    lines = f.readlines()
    print ("type(lines)=",type(lines)) #type(lines)= <type 'list'>
    for line in lines:
        print (line) 

结果如下:

type(lines)= <class 'list'>
['Last sunday, it was a fine day. My friend and I went to Mount Daifu.\n', 'In the morning,we rode bikes to the foot of the mountain.After a short rest,we climbed the mountain.on its peak,\n', 'we shared the beautiful scenery in our eyes.\n', ' \n', "There were lots of light foggy clouds around us.What's more, varieties of birds flying around us were pretty.\n", 'What a harmony situation! At noon,we had lunch in a restaurant. With a happy emotion, we finished our climbing.\n', 'Although tired, we still felt happy and relaxed.we hope our country will become more and more beautiful.']




type(lines)= <class 'list'>
Last sunday, it was a fine day. My friend and I went to Mount Daifu.

In the morning,we rode bikes to the foot of the mountain.After a short rest,we climbed the mountain.on its peak,

we shared the beautiful scenery in our eyes.

 

There were lots of light foggy clouds around us.What's more, varieties of birds flying around us were pretty.

What a harmony situation! At noon,we had lunch in a restaurant. With a happy emotion, we finished our climbing.

Although tired, we still felt happy and relaxed.we hope our country will become more and more beautiful.

大文件读取数据

处理大文件是很容易想到的就是将大文件分割成若干小文件处理,处理完每个小文件后释放该部分内存。这里用了iter & yield

  • 分批读取
def read_in_chunks(filePath, chunk_size=1024*1024):
    """    Lazy function (generator) to read a file piece by piece.    
    Default chunk size: 1M    You can set your own chunk size     """    
    file_object = open(filePath)    
    while True:
        chunk_data = file_object.read(chunk_size)        
        if not chunk_data:            
            break        
        yield chunk_data
        
if __name__ == "__main__":    
    filePath = './path/filename'    
    for chunk in read_in_chunks(filePath):        
        process(chunk) # <do something with chunk>

注意:

1 size设置成1024*1024的意思,就是1bit*1024*1024 = 1M,也就是说,每隔1M读取一次数据

2 这里使用了yield哦!!! 为啥不用return,要使用yeild的原因,可以看看这个:

https://blog.csdn.net/Strive_0902/article/details/82924728

例如,就像上面的例子,我们在read()的括号里面加了个参数,表示一次要读取得大小,代码改变如下:

def read_trunk(file_name,trunk_size):
    f = open(file_name,'r') #创建一个文件对象,也是一个可迭代对象
    while True:
        content = f.read(trunk_size)  #结果为str类型
        if not content:
            break
        #print (type(content))
        #print()
        #print (content)
        yield content
        #return content
        
if __name__=='__main__':
    file_name = r'C:\Users\DELL\Desktop\python\words.txt'
    #print(read_trunk(file_name,200))
    for i in read_trunk(file_name,200):
        print(i)
        print()

可以看到,它会根据每200大小,读取一次,这里说明一下,为啥不能用return,而要用yeild,因为return执行一次,就带着返回值退出函数了,后面得没法读取,在上面得代码中,如果用注释掉得return语句时,函数运行结果如下:

  • Using with open()

对可迭代对象 f,进行迭代遍历:for line in f,会自动地使用缓冲IO(buffered IO)以及内存管理,而不必担心任何大文件的问题。

with open(filename, 'rb') as f:
    for line in f:
        <do something with the line>

 

 

  • 12
    点赞
  • 80
    收藏
    觉得还不错? 一键收藏
  • 3
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值