Python练习 scrapy 爬取汽车之家文章

最新推荐文章于 2024-04-21 17:44:26 发布

人僧苦短

最新推荐文章于 2024-04-21 17:44:26 发布

阅读量1k

点赞数

文章标签：爬虫 python

本文链接：https://blog.csdn.net/zhangjiabin1010/article/details/78978012

版权

autohome.py #spider文件# -*- coding: utf-8 -*-import scrapyfrom Autohome.items import AutohomeItemclass AutohomeSpider(scrapy.Spider): name = 'autohome' allowed_domains = ['https://www.au

摘要由CSDN通过智能技术生成

autohome.py #spider文件

# -*- coding: utf-8 -*-
import scrapy
from Autohome.items import AutohomeItem

class AutohomeSpider(scrapy.Spider):
    name = 'autohome'
    allowed_domains = ['https://www.autohome.com.cn/all/']
    start_urls = ['https://www.autohome.com.cn/all/']

    def parse(self, response):
#返回该表达式对应的所有selector list列表
        tit_list = response.xpath("//div[@class='article-wrapper']/ul/li/a")
        for tit in tit_list:
            item = AutohomeItem()
            #extract（）序列化为unicode字符串
            title = tit.xpath("./h3").extract()
            url = tit.xpath("./@href").extract()
            jianjie = tit.xpath("./p").extract()

            item['url'] = url[0]
            item['jianjie'] = jianjie[0]
            item['title'] = title[0]
            #返回提取到的每个

最低0.47元/天解锁文章

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

人僧苦短

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Python练习 scrapy 爬取汽车之家文章

autohome.py #spider文件# -*- coding: utf-8 -*-import scrapyfrom Autohome.items import AutohomeItemclass AutohomeSpider(scrapy.Spider): name = 'autohome' allowed_domains = ['https://www.au
复制链接

扫一扫