问题是open为了写入文件,必须在操作系统中创建一个文件处理程序。每次调用open命令时都会创建一个新的文件处理程序。相反,您应该打开文件处理程序一次,然后将其作为参数传递给get_by_id。那么每个线程只有一个文件处理程序。在
或者,您可以使用文件.close()释放操作系统资源。当文件移出作用域时,由于垃圾回收,最终可能会发生这种情况,但在这种情况下,依赖GC是非常糟糕的做法。无论如何,在循环中创建不必要的对象都是不好的做法。所以做些类似的事情:import requests
import threading
import time
class myThread(threading.Thread):
def __init__(self, threadID, name, st, ed):
threading.Thread.__init__(self)
self.threadID = threadID
self.name = name
self.st = st
self.ed = ed
def run(self):
print("Starting "+self.name)
get_range(self.st, self.ed)
print("Exiting " + self.name)
def get_by_id(n, f):
payload = {"id":n}
url = "http://www.example.com" # This is for example
headers = { 'Content-Type': 'application/x-www-form-urlencoded',
'Accept': "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
'Accept-Encoding':"gzip, deflate",
}
try:
r = requests.post(url, data=payload, headers=headers)
except Exception as e:
return -2
if r.status_code is not 200:
return -2
if "Cannot find" in r.text:
return -1
else:
f.write(r.text)
return 1
def get_range(a, b):
with open(os.path.join("./pages", n), 'w') as f:
for i in range(a, b):
r = get_by_id(str(i), f)
f.close();
if __name__ == "__main__":
threads = []
for x in range(20):
threads.append(myThread(x, "Thread-"+str(x), 800000000000+x*4000, 800000000000+(x+1)*4000))
threads[-1].start()
time.sleep(0.3)
for t in threads:
t.join()
print("Exiting Main")