Scrapy 抓取图片301 和 403错误

1. 301错误

 

301是重定向,在settings加这个就可以了,默认是False

MEDIA_ALLOW_REDIRECTS =True

 

2.403错误

 

403是禁止访问的错误,我这边是因为对方对Referer进行了判断,如果是空就会403,在process_request中的request中加Referer.用目标网址替换这边的xxxxx

    def process_request(self, request, spider):
        # Called for each request that goes through the downloader
        # middleware.

        # Must either:
        # - return None: continue processing this request
        # - or return a Response object
        # - or return a Request object
        # - or raise IgnoreRequest: process_exception() methods of
        #   installed downloader middleware will be called
        agent = random.choice(agents)
        request.headers["User-Agent"] = agent

        #request.meta["proxy"] = proxyServer
        #request.headers["Proxy-Authorization"] = proxyAuth
        request.headers['Referer'] = 'xxxxx;



        return None

http://www.waitingfy.com/archives/3290
关于User-Agent和proxy设置,可以参考上一篇的文章 《Scrapy middleware 设置随机User-Agent 和 proxy》

展开阅读全文

没有更多推荐了,返回首页