python练手小案例——采集二手车数据

最新推荐文章于 2024-04-17 17:30:00 发布

「已注销」

最新推荐文章于 2024-04-17 17:30:00 发布

阅读量894

点赞数 1

分类专栏：爬虫小案例文章标签： python 开发语言 pycharm

本文链接：https://blog.csdn.net/weixin_62853513/article/details/130344335

版权

爬虫小案例专栏收录该内容

116 篇文章 42 订阅

订阅专栏

前言

大家早好、午好、晚好吖 ❤ ~欢迎光临本文章

本次案例亮点:

1、系统分析目标网页

2、html标签数据解析方法

3、海量数据一键保存

环境介绍:

在开始写我们的代码之前，我们要准备好运行代码的程序

Python 3.8.8 | Anaconda, Inc. ：解释器
Pycharm 2021.2版本：代码编辑器

不会安装的可以文末名片+我获取哦 😎

requests >>> 该模块主要用来发送 HTTP 请求，内置模块，无需安装
parsel >>> HTML/XML 文档解析库,也是知名框架 Scrapy 内置的解析器，内置模块，无需安装

第三方安装模块：win + R 输入cmd 输入安装命令 pip install 模块名

如果出现爆红可能是因为网络连接超时切换国内镜像源

本次目标

代码展示

导入模块

import parsel
import requests

headers = {
    'Host': 'www.che168.com',
    'Referer': 'https://****m/china/a0_0msdgscncgpi1ltocsp100exx0/?pvareaid=102179',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36',
}

def get_proxies():
    url = 'http://*****/getip?secret=pdozxje3vveh2uvj&num=1&type=json&port=1&time=3&mr=1&sign=c651882369b0fffa9a01aeef9ae275b1'
    json_data = requests.get(url).json()
    data = json_data['data'][0]
    proxies = {
        'http://': f'http://{data["ip"]}:{data["port"]}',
        'https://': f'http://{data["ip"]}:{data["port"]}'
    }
    return proxies

发送请求请求列表页面

for page in range(1, 101):
    url = f'https://****/china/a0_0msdgscncgpi1ltocsp{page}exx0/'
    response = requests.get(url=url, headers=headers, proxies=get_proxies())

获取数据

    html_data = response.text

解析数据

    select = parsel.Selector(html_data)
    detail_url_list = select.xpath("//ul[@class='viewlist_ul']/li/a[@class='carinfo']/@href").getall()
    for detail_url in detail_url_list[:-1]:
        if detail_url[1] == '/':
            detail_url = 'https:' + detail_url
        else:
            detail_url = '******' + detail_url
        detail_html = requests.get(detail_url, headers=headers, proxies=get_proxies()).text
        detail_select = parsel.Selector(detail_html)
        title = detail_select.xpath("string(//h3[@class='car-brand-name'])").get("").strip()
        licheng = detail_select.xpath("//ul[@class='brand-unit-item fn-clear']/li[1]/h4/text()").get("").strip()
        shangpai = detail_select.xpath("//ul[@class='brand-unit-item fn-clear']/li[2]/h4/text()").get("").strip()
        pailiang = detail_select.xpath("//ul[@class='brand-unit-item fn-clear']/li[3]/h4/text()").get("").strip()
        suozaidi = detail_select.xpath("//ul[@class='brand-unit-item fn-clear']/li[4]/h4/text()").get("").strip()
        guobiao = detail_select.xpath("//ul[@class='brand-unit-item fn-clear']/li[5]/h4/text()").get("").strip()
        price = detail_select.xpath("string(//span[@id='overlayPrice'])").get("").strip()
        print(title, licheng, shangpai, pailiang, suozaidi, guobiao, price, detail_url)

尾语 💝

好了，今天的分享就差不多到这里了！

完整代码、更多资源、疑惑解答直接点击下方名片自取即可。

对下一篇大家想看什么，可在评论区留言哦！看到我会更新哒(ง •_•)ง

喜欢就关注一下博主，或点赞收藏评论一下我的文章叭！！！

最后，宣传一下呀~👇👇👇更多源码、资料、素材、解答、交流皆点击下方名片获取呀👇👇👇

「已注销」

关注

1
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
python练手小案例——采集二手车数据

大家早好、午好、晚好吖 ❤ ~欢迎光临本文章在开始写我们的代码之前，我们要准备好运行代码的程序Python 3.8.8 | Anaconda, Inc. ：解释器Pycharm 2021.2版本 requests >>> 该模块主要用来发送 HTTP 请求，内置模块，无需安装parsel >>> HTML/XML 文档解析库,也是知名框架 Scrapy 内置的解析器，内置模块，无需安装第三方安装模块。
复制链接

扫一扫