python爬虫系列2-------Scrapy目录结构介绍与配置详解

最新推荐文章于 2024-07-21 22:35:51 发布

lijian12388806

最新推荐文章于 2024-07-21 22:35:51 发布

阅读量1.3k

点赞数

分类专栏： Python爬虫系列文章标签： python Scrapy 爬虫

本文链接：https://blog.csdn.net/lijian12388806/article/details/80070997

版权

本文详细介绍了Scrapy爬虫项目的目录结构，包括scrapy.cfg配置文件、init.py初始化文件、items.py定义数据结构、pipelines.py处理数据的管道、settings.py核心配置文件以及spiders文件夹中存放的爬虫文件。通过理解这些内容，可以更好地搭建和管理Scrapy爬虫项目。

摘要由CSDN通过智能技术生成

Scrapy目录结构介绍与配置文件详解

    先上架构图，网上找的，不管懂不懂，先有个印象，结合文件目录和解释去看,结合以后的实践，原理一目了然。

这里写图片描述

├── mySpider
│   ├── __init__.py
│   ├── items.py
│   ├── middlewares.py
│   ├── pipelines.py
│   ├── __pycache__
│   ├── settings.py
│   └── spiders
│       ├── __init__.py
│       └── __pycache__
└── scrapy.cfg

scrapy.cfg文件

# Automatically created by: scrapy startproject
#
# For more information about the [deploy] section see:
# https://scrapyd.readthedocs.io/en/latest/deploy.html

[settings]
default = mySpider.settings

[deploy]
#url = http://localhost:6800/
project = mySpider

    项目基础设置文件，设置爬虫启用的功能，如并发，管道文件等，需要在基础设置文件设置

init.py 文件为python初始化文件

    为python模块初始化文件，可用__all__函数配置导出参数，也可什么都不写，但是必须要有，否则报错

items.py 文件

# -*- coding: utf-8 -*-

# Define here the models for your scraped items
#
# See documentation in:
# https://doc.scrapy.org/en/latest/topics/items.html

import scrapy


class MyspiderItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    pass

    此文件俗称模型文件，就是存放字段的文件，上面为简单实例，定义字段名称，以自己的任意形式存取数据

最低0.47元/天解锁文章

lijian12388806

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录