ajax pool,Python爬网ajax页面使用pywebkit和threadpool？

最新推荐文章于 2023-06-17 13:36:45 发布

weixin_39917894

最新推荐文章于 2023-06-17 13:36:45 发布

阅读量117

点赞数

文章标签： ajax pool

但我想知道如何同时抓取页面(多线程)。在

我写了下面的代码，似乎函数do\u crawl不能被调用pool.apply_异步. 如果使用池.应用，将调用do\u crawl。在import gtk

import webkit

from multiprocessing.pool import ThreadPool

class WebView(webkit.WebView):

def get_html(self):

self.execute_script('oldtitle=document.title;document.title=document.documentElement.innerHTML;')

html = self.get_main_frame().get_title()

self.execute_script('document.title=oldtitle;')

return html

class Crawler(gtk.Window):

def __init__(self, url):

gtk.gdk.threads_init() # suggested by Nicholas Herriot for Ubuntu Koala

gtk.Window.__init__(self)

self._url = url

def crawl(self):

view = WebView()

view.open(self._url)

view.connect('load-finished', self._finished_loading)

self.add(view)

gtk.main()

def _finished_loading(self, view, frame):

view.get_html()

gtk.main_quit()

def main():

pool = ThreadPool(10)

[pool.apply_async(do_crawl, ('http://google.com/')) for i in range(100)]

pool.join()

def do_crawl(url):

crawler = Crawler(url)

crawler.crawl()

if __name__ == '__main__':

main(

)

weixin_39917894

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
ajax pool,Python爬网ajax页面使用pywebkit和threadpool？

但我想知道如何同时抓取页面(多线程)。在我写了下面的代码，似乎函数do\u crawl不能被调用pool.apply_异步. 如果使用池.应用，将调用do\u crawl。在import gtkimport webkitfrom multiprocessing.pool import ThreadPoolclass WebView(webkit.WebView):def get_html(self...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。