python 多线程读写文件,在python中使用多线程读取txt文件

I'm trying to read a file in python (scan it lines and look for terms) and write the results- let say, counters for each term. I need to do that for a big amount of files (more than 3000). Is it possible to do that multi threaded? If yes, how?

So, the scenario is like this:

Read each file and scan its lines

Write counters to same output file for all the files I've read.

Second question is, does it improve the speed of read/write.

Hope it is clear enough. Thanks,

Ron.

解决方案

I agree with @aix, multiprocessing is definitely the way to go. Regardless you will be i/o bound -- you can only read so fast, no matter how many parallel processes you have running. But there can easily be some speedup.

Consider the following (input/ is a directory that contains several .txt files from Project Gutenberg).

import os.path

from multiprocessing import Pool

import sys

import time

def process_file(name):

''' Process one file: count number of lines and words '''

linecount=0

wordcount=0

with open(name, 'r') as inp:

for line in inp:

linecount+=1

wordcount+=len(line.split(' '))

return name, linecount, wordcount

def process_files_parallel(arg, dirname, names):

''' Process each file in parallel via Poll.map() '''

pool=Pool()

results=pool.map(process_file, [os.path.join(dirname, name) for name in names])

def process_files(arg, dirname, names):

''' Process each file in via map() '''

results=map(process_file, [os.path.join(dirname, name) for name in names])

if __name__ == '__main__':

start=time.time()

os.path.walk('input/', process_files, None)

print "process_files()", time.time()-start

start=time.time()

os.path.walk('input/', process_files_parallel, None)

print "process_files_parallel()", time.time()-start

When I run this on my dual core machine there is a noticeable (but not 2x) speedup:

$ python process_files.py

process_files() 1.71218085289

process_files_parallel() 1.28905105591

If the files are small enough to fit in memory, and you have lots of processing to be done that isn't i/o bound, then you should see even better improvement.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值