scrapy爬虫框架使用命令运行出错

最新推荐文章于 2023-09-22 17:21:36 发布

海平面远方开始阴霾

最新推荐文章于 2023-09-22 17:21:36 发布

阅读量344

点赞数

分类专栏： python3 vscode

本文链接：https://blog.csdn.net/weixin_44416759/article/details/104544969

版权

python3 同时被 2 个专栏收录

17 篇文章 0 订阅

订阅专栏

vscode

7 篇文章 0 订阅

订阅专栏

在我学习莫烦Python的爬虫最后的scrapy框架时，在terminal中使用scrapy runspider try24.py -o res.json命令，输出：

Fatal error in launcher: Unable to create process using
‘“d:\bld\scrapy_1572360424769_h_env\python.exe”
“G:\Anaconda3\Scripts\scrapy.exe” runspider try24.py -o res.json’

参考

使用python -m scrapy runspider try24.py -o res.json，加上python -m 表示以模块为脚本运行。
-m mod : run library module as a script (terminates option list)
意思是将库中的python模块用作脚本去运行。

最终代码：

import scrapy


class MofanSpider(scrapy.Spider):
    name = "mofan"
    start_urls = [
        'https://morvanzhou.github.io/',
    ]
    # unseen = set()
    # seen = set()      # we don't need these two as scrapy will deal with them automatically

    def parse(self, response):
        yield {     # return some results
            'title': response.css('h1::text').extract_first(default='Missing').strip().replace('"', ""),#replace() seems can delete
            'url': response.url,
        }

        urls = response.css('a::attr(href)').re(r'^/.+?/$')     # find all sub urls
        for url in urls:
            yield response.follow(url, callback=self.parse)     # it will filter duplication automatically


# lastly, run this in terminal
# python -m scrapy runspider try24.py -o res.json

海平面远方开始阴霾

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
scrapy爬虫框架使用命令运行出错

在我学习莫烦Python的爬虫最后的scrapy框架时，在terminal中使用scrapy runspider try24.py -o res.json命令，输出：Fatal error in launcher: Unable to create process using ‘“d:\bld\scrapy_1572360424769_h_env\python.exe” “G:\Anacond...
复制链接

扫一扫

专栏目录