爬虫
mannnn__
这个作者很懒,什么都没留下…
展开
-
实现python自定义爬虫框架
import urllib2from lxml import etreeimport Queueimport sslimport reimport threadingimport jsonclass CrawlThread(threading.Thread): def __init__(self, urlQueue, dataQueue, threadName): ...原创 2018-10-11 16:26:48 · 944 阅读 · 0 评论 -
urllib2爬取小说三寸人间
# -*- coding: UTF-8 -*-import urllib2import reimport sslimport sysif __name__ == "__main__": #代理 proxy = { 'http': 'xxx', 'https': 'xxx' } ssl_context = ssl._...原创 2018-10-10 16:20:57 · 271 阅读 · 0 评论