Scrapy框架获取get请求中数据-及获取其详情页内容

FREE_QIU

已于 2022-06-25 16:28:48 修改

阅读量565

点赞数 1

分类专栏： Scrapy Python

于 2022-06-25 16:26:55 首次发布

本文链接：https://blog.csdn.net/weixin_45681435/article/details/125460940

版权

scrapy python 爬虫

Python 同时被 2 个专栏收录

23 篇文章 0 订阅

订阅专栏

Scrapy

3 篇文章 0 订阅

订阅专栏

使用爬虫框架Scrapy爬取get请求中的数据及其详情页内容

"""
	直接请求get网页数据，不涉及使用items和pipelines
	目前不涉及翻页
"""
class XxxSpider(scrapy.Spider):
    ...		# name、allowed_domians
    start_urls = ['需要获取列表数据的链接']
    
    """
    	获取页面元素列表内容
    """
    def parse(self, response):
        # 判断请求的链接返回的状态码和文本内容长度
        if response.status == 200 and len(response.text) > 10:
            # 使用xpath解析网页，获取所需元素
            titles = response.xpath('xpath解析语句').extract()
            urls = response.xpath('xpath解析语句').extract()
            
            for i in range(0,len(titles)):
                url = urls[i]
                title = titles[i]
                # 测试输出
                print(url, title)
                # 将解析到的 内容详情页url 拿去 获取页面中的内容
                yield scrapy.Request(url=url, callback=self.html)
    """
    	获取url详情页中的数据
    """            
	def html(self.response):
        # 判断请求的链接返回的状态码和文本内容长度
        if response.status == 200 and len(response.text) > 10:
            # 使用xpath解析内容详情页，获取所需元素
            content = response.xpath('xpath解析语句').extract()
            # 测试输出
            print(content)