1.
get_media_requests方法调用一次
2.file_path方法调用二次
2017-02-10 14:51:32 [scrapy] DEBUG: Crawled (200) <GET http://ojyhagqv7.bkt.clouddn.com/9cn_2017021007f9c6940c907f35490bcffafc5b53d1.png?imageView2/2/w/200> (referer: None) 2017-02-10 14:51:32 [scrapy] DEBUG: File (downloaded): Downloaded file from <GET http://ojyhagqv7.bkt.clouddn.com/9cn_2017021007f9c6940c907f35490bcffafc5b53d1.png?imageView2/2/w/200> referred in <None> 2017-02-10 14:51:32 [PIL.PngImagePlugin] DEBUG: STREAM IHDR 16 13 2017-02-10 14:51:32 [PIL.PngImagePlugin] DEBUG: STREAM gAMA 41 4 2017-02-10 14:51:32 [PIL.PngImagePlugin] DEBUG: STREAM cHRM 57 32 2017-02-10 14:51:32 [PIL.PngImagePlugin] DEBUG: cHRM 57 32 (unknown) request=== <GET http://ojyhagqv7.bkt.clouddn.com/9cn_2017021007f9c6940c907f35490bcffafc5b53d1.png?imageView2/2/w/200> 4f49e9ef3aea424a8199701fbdc82056----------8f1a52a03cd74c6ebb67e5ae75c41c8a request=== <GET http://ojyhagqv7.bkt.clouddn.com/9cn_2017021007f9c6940c907f35490bcffafc5b53d1.png?imageView2/2/w/200> 4f49e9ef3aea424a8199701fbdc82056----------f29dbde1f4b24cf28026194afcdac434
3.
最终走item_completed
方法一次2017-02-10 14:51:33 [scrapy] DEBUG: Scraped from <200 http://www.9.cn/cx/getList.html?cate=&status=1&order=1&page=1> {'app_id': '884061d3ce784ec5a8470b87994046cc', 'id': '4f49e9ef3aea424a8199701fbdc82056', 'image_paths': ['miniapp/8f1a52a03cd74c6ebb67e5ae75c41c8a.jpg'], 'image_type': 0, 'image_urls': [u'http://ojyhagqv7.bkt.clouddn.com/9cn_2017021007f9c6940c907f35490bcffafc5b53d1.png?imageView2/2/w/200']} 2017-02-10 14:51:33 [scrapy] INFO: Closing spider (finished)
总结:scrapy先crawl一下,再进行下载,所以实际路径是第二个,但是走item_completed方法的是crawl的,所以导致不一致