16-爬虫之scrapy框架手动请求发送实现全站数据爬取03

最新推荐文章于 2023-03-10 10:46:27 发布

gemoumou

最新推荐文章于 2023-03-10 10:46:27 发布

阅读量417

点赞数

分类专栏： python爬虫开发学习文章标签： python post django ajax

本文链接：https://blog.csdn.net/qq_37978800/article/details/108308697

版权

本文介绍了如何使用scrapy框架手动发送POST请求实现全站数据爬取，包括配置pipelines.py、items.py、settings.py文件，以及在爬虫源文件中设置start_requests进行POST请求的方法。

摘要由CSDN通过智能技术生成

scrapy的手动请求发送实现全站数据爬取

yield scrapy.Reques(url,callback) 发起的get请求
- callback指定解析函数用于解析数据
yield scrapy.FormRequest（url,callback,formdata）发起的post请求
- formdata：字典，请求参数
为什么start_urls列表中的url会被自动进行get请求的发送？
- 因为列表中的url其实是被start_requests这个父类方法实现的get请求

# 父类方法：这个是该方法的原始实现
def start_requests(self):
    for u in self.start_urls:
        yield scrapy.Request(url=url,callback=self.parse)

如何将start_urls中的url默认进行post请求发送？

# 重写父类方法默认进行post请求发送
def start_requests(self):
    for u in self.start_urls:
        yield scrapy.FormRequest(url=url,callback=self.parse)

开始

创建一个爬虫工程：scrapy startproject proName
进入工程目录创建爬虫源文件：scrapy genspider spiderName www.xxx.com
执行工程：scrapy crawl spiderName
在这里插入图片描述

配置pipelines.py文件

# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html


# useful for handling different item types with a single interface
from itemadapter import ItemAdapter


class GpcPipeline:
    def process_item

最低0.47元/天解锁文章

gemoumou

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
16-爬虫之scrapy框架手动请求发送实现全站数据爬取03

scrapy的手动请求发送实现全站数据爬取创建一个爬虫工程：scrapy startproject proName进入工程目录创建爬虫源文件：scrapy genspider spiderName www.xxx.com执行工程：scrapy crawl spiderName
复制链接

扫一扫

专栏目录