python获取汽车之家关注度排行榜

最新推荐文章于 2024-10-16 23:34:20 发布

平平无奇大师兄

最新推荐文章于 2024-10-16 23:34:20 发布

阅读量123

点赞数

文章标签： python 汽车

本文链接：https://blog.csdn.net/qq_27676831/article/details/134631052

版权

思路如下：
1、查看HTML结构，找到关注度数据存储位置在第三个 tab_content_items
2、运行脚本，数据为空，关注度数据需要点击后才加载进来，需用selenium模拟点击

3、找到要提取的车型信息存储在h4，提取保存即可。

4、python代码如下

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup

url = "https://www.autohome.com.cn/suvc/80_1-0.0_0.0-0-0-0-0-0-0-0-0/"

# 使用Selenium和浏览器
driver = webdriver.Chrome()

# 用Selenium加载页面
driver.get(url)

# 通过文本内容找到按钮元素并点击
button_text = "按关注度"
button_xpath = f'//*[@id="view-tab"]/li[2]'
button = driver.find_element(By.XPATH, button_xpath)
button.click()

# 在点击按钮后等待一段时间（根据需要调整）
time.sleep(1)

# 在切换后获取更新的页面源代码
page_source = driver.page_source

# 关闭浏览器
driver.quit()

# 解析更新页面的HTML内容
soup = BeautifulSoup(page_source, "html.parser")

# 找到所有类名为"tab-content-item"的div元素
tab_content_items = soup.find_all("div", class_="tab-content-item")

# 检查是否至少有三个tab-content-item元素
if len(tab_content_items) >= 3:
    # 获取第三个tab-content-item元素
    third_tab_content_item = tab_content_items[2]

    # 在第三个tab-content-item中找到所有h4元素
    h4_elements = third_tab_content_item.find_all("h4")

    # 提取并保存每个h4元素的文本
    car_names = [h4.text.strip() for h4 in h4_elements]

    # 打印
    print("Car Names:", car_names)
else:
    print("Error: 更新页面上的tab-content-item元素不足.")