制作scrapy爬虫文件

最新推荐文章于 2023-10-09 11:23:32 发布

Leo_xzp

最新推荐文章于 2023-10-09 11:23:32 发布

阅读量183

点赞数

分类专栏： python

本文链接：https://blog.csdn.net/weixin_41906961/article/details/79866171

版权

python 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

制作Scrapy爬虫的基本步骤：

1.新建项目（scrapy startproject xxx)--新建一个爬虫文件

项目基本文件包括：

scrapy.cfg: 项目的配置文件
tutorial/: 该项目的 python 模块。之后您将在此加入代码。
tutorial/items.py: 项目中的 item 文件。
tutorial/pipelines.py: 项目中的 pipelines 文件。
tutorial/settings.py: 项目的设置文件。
tutorial/spiders/: 放置 spider 代码的目录。

2.明确目标（编写item.py文件）：明确需要爬取的目标

3.制作爬虫（spiders/xxsprders.py）

4.将爬取的内容进行存储（pipeline.py）

Scrapy---基本结构

核心、队列、网页下载、内容过滤、管道存储

命令格式：

scrapy <command> [options] [args]

Available commands:
bench Run quick benchmark test
fetch Fetch a URL using the Scrapy downloader
genspider Generate new spider using pre-defined templates
runspider Run a self-contained spider (without creating a project)
settings Get settings values
shell Interactive scraping console
startproject Create new project
version Print Scrapy version
view Open URL in browser, as seen by Scrapy

[ more ] More commands available when run from project directory