python线程过多_Python递归多处理线程过多

最新推荐文章于 2024-02-01 19:45:39 发布

Dr.Pride

最新推荐文章于 2024-02-01 19:45:39 发布

阅读量431

点赞数

文章标签： python线程过多

本文链接：https://blog.csdn.net/weixin_30384285/article/details/111963917

版权

背景：

Python 3.5.1，Windows 7

我有一个网络驱动器，可以存放大量的文件和目录。我正在尝试编写一个脚本来尽可能快地解析所有这些文件，以找到与RegEx匹配的所有文件，并将这些文件复制到我的本地PC上以供审阅。大约有3500个目录和子目录，以及几百万个文件。我试图使它尽可能通用(即，不写代码到这个确切的文件结构)，以便在其他网络驱动器上重用它。当我的代码运行在一个小型网络驱动器上时，这里的问题似乎是可伸缩性。在

我用多处理库做了一些尝试，但似乎不能使它可靠地工作。我的想法是创建一个新的作业来解析每个子目录，以便尽快工作。我有一个递归函数，它解析一个目录中的所有对象，然后为任何子目录调用自己，并根据正则表达式检查找到的任何文件。在

问题：如何在不使用池来实现目标的情况下限制线程/进程的数量？在

我的尝试：如果我只使用进程作业，那么在超过几百个线程启动后，我会得到错误RuntimeError: can't start new thread，它开始断开连接。最后我找到了大约一半的文件，因为有一半的目录出错了(下面是代码)。在

为了限制线程总数，我尝试使用Pool方法，但是我不能根据this question将Pool对象传递给被调用的方法，这使得递归实现不可能。在

为了解决这个问题，我尝试在Pool方法中调用进程，但是我得到了错误daemonic processes are not allowed to have children。在

我认为如果我可以限制并发线程的数量，那么我的解决方案将按设计工作。在

代码：import os

import re

import shutil

from multiprocessing import Process, Manager

CheckLocations = ['network drive location 1', 'network drive location 2']

SaveLocation = 'local PC location'

FileNameRegex = re.compile('RegEx here', flags = re.IGNORECASE)

# Loop through all items in folder, and call itself for subfolders.

def ParseFolderContents(path, DebugFileList):

FolderList = []

jobs = []

TempList = []

if not os.path.exists(path):

return

try:

for item in os.scandir(path):

try:

if item.is_dir():

p = Process(target=ParseFolderContents, args=(item.path, DebugFileList))

jobs.append(p)

p.start()

elif FileNameRegex.search(item.name) != None:

DebugFileList.append((path, item.name))

else:

pass

except Exception as ex:

if hasattr(ex, 'message'):

print(ex.message)

else:

print(ex)

# print('Error in file:\t' + item.path)

except Exception as ex:

if hasattr(ex, 'message'):

print(ex.message)

else:

print('Error in path:\t' + path)

pass

else:

print('\tToo many threads to restart directory.')

for job in jobs:

job.join()

# Save list of debug files.

def SaveDebugFiles(DebugFileList):

for file in DebugFileList:

try:

shutil.copyfile(file[0] + '\\' + file[1], SaveLocation + file[1])

except PermissionError:

continue

if __name__ == '__main__':

with Manager() as manager:

# Iterate through all directories to make a list of all desired files.

DebugFileList = manager.list()

jobs = []

for path in CheckLocations:

p = Process(target=ParseFolderContents, args=(path, DebugFileList))

jobs.append(p)

p.start()

for job in jobs:

job.join()

print('\n' + str(len(DebugFileList)) + ' files found.\n')

if len(DebugFileList) == 0:

quit()

# Iterate through all debug files and copy them to local PC.

n = 25 # Number of files to grab for each parallel path.

TempList = [DebugFileList[i:i + n] for i in range(0, len(DebugFileList), n)] # Split list into small chunks.

jobs = []

for item in TempList:

p = Process(target=SaveDebugFiles, args=(item, ))

jobs.append(p)

p.start()

for job in jobs:

job.join()

Dr.Pride

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫