preface:看极客学院关于xpath的视频时,偶然看到可以用multiprocessing进行多线程爬取网页,只有小段代码,故先贴出来。mark下。新增xpath提取网页内容,主要分析html文本,然后存为字典写到文件夹中。
参考极客学院的python并行化介绍与演示视频
coding:
#!/usr/bin/env python # coding=utf-8 from multiprocessing.dummy import Pool as ThreadPool import requests import time def getsource(url): html = requests.get(url) urls = [] for i in range(1,21): newpage = "http://tieba.baidu.com/p/3522395718?pn=" + str(i) urls.append(newpage)#构造url列表 time1 = time.time() for i in urls: print i getsource(i) time2 = time.ti