![](https://img-blog.csdnimg.cn/20201014180756928.png?x-oss-process=image/resize,m_fixed,h_64,w_64)
scrapy
qq123aa2006
这个作者很懒,什么都没留下…
展开
-
用scrapy爬取小说网站,并保存到数据库
spider.py# -*- coding: utf-8 -*-import scrapyimport uuid from datetime import datetimefrom novel.items import NovelItem,ChapterItemclass A17kSpider(scrapy.Spider): name = '17k' allowed_...原创 2019-04-01 18:01:08 · 1245 阅读 · 0 评论 -
scrapy 在middelware里面加上随机headers 和代理
from fake_useragent import UserAgentclass RandomUserAgentMiddlerware(object): def __init__(self,crawler): super(RandomUserAgentMiddlerware,self).__init__() self.ua = UserAgent() ...原创 2019-05-05 08:37:22 · 180 阅读 · 0 评论 -
自建代理池
MAX_SCORE = 100 MIN_SCORE = 0 INITIAL_SCORE = 10 REDIS_HOST = "127.0.0.1"REDIS_PORT = 6379REDIS_PASSWORD = None REDIS_KEY = "proxies"import redis from random import choice import timeimport ...原创 2019-05-05 08:50:34 · 176 阅读 · 0 评论