在我学习莫烦Python的爬虫最后的scrapy框架时,在terminal中使用scrapy runspider try24.py -o res.json命令,输出:
Fatal error in launcher: Unable to create process using
‘“d:\bld\scrapy_1572360424769_h_env\python.exe”
“G:\Anaconda3\Scripts\scrapy.exe” runspider try24.py -o res.json’
使用python -m scrapy runspider try24.py -o res.json,加上python -m 表示以模块为脚本运行。
-m mod : run library module as a script (terminates option list)
意思是将库中的python模块用作脚本去运行。
最终代码:
import scrapy
class MofanSpider(scrapy.Spider):
name = "mofan"
start_urls = [
'https://morvanzhou.github.io/',
]
# unseen = set()
# seen = set() # we don't need these two as scrapy will deal with them automatically
def parse(self, response):
yield { # return some results
'title': response.css('h1::text').extract_first(default='Missing').strip().replace('"', ""),#replace() seems can delete
'url': response.url,
}
urls = response.css('a::attr(href)').re(r'^/.+?/$') # find all sub urls
for url in urls:
yield response.follow(url, callback=self.parse) # it will filter duplication automatically
# lastly, run this in terminal
# python -m scrapy runspider try24.py -o res.json