2 三种解析方式爬取王者荣耀英雄图片

最新推荐文章于 2025-05-24 22:34:09 发布

学编程的菜恐龙

最新推荐文章于 2025-05-24 22:34:09 发布

阅读量638

点赞数 12

分类专栏： Python 项目实战文章标签：爬虫

本文链接：https://blog.csdn.net/weixin_51100340/article/details/136706866

版权

Python 项目实战专栏收录该内容

2 篇文章

订阅专栏

链接：英雄资料列表页-英雄介绍-王者荣耀官方网站-腾讯游戏

分析方法参考：1 爬取7K小说网用户书架信息-CSDN博客

因某种不知名原因，有些英雄爬取不到，就先不管。

代码：

# 爬取王者荣耀英雄图片,并保存在同目录的 heroPhoto 中，给每张图片命名为英雄.jpg
# https://pvp.qq.com/web201605/herolist.shtml

import requests
from lxml import etree
from bs4 import BeautifulSoup
from pyquery import PyQuery as pq


url = 'https://pvp.qq.com/web201605/herolist.shtml'
response = requests.get(url)
content = response.content


# Xpath 解析
# html = etree.HTML(content)
# image_urls = html.xpath('//ul[@class="herolist clearfix"]/li/a/img/@src')
# print(image_urls)
# hero_list = html.xpath('//ul[@class="herolist clearfix"]/li/a/text()')
# print(hero_list)
# for i in range(len(image_urls)):  # Xpath不可以对节点进行解析吗？
#     image_url = image_urls[i]
#     name = hero_list[i]
#     url = f'https:{image_url}'
#     jpg_content = requests.get(url).content
#     with open(f'heroPhoto/{name}.jpg', 'wb') as file:
#         file.write(jpg_content)
#     print(f'图片{i}存储完毕')


# BeautifulShop 解析
# soup = BeautifulSoup(content, 'lxml')
# image_a = soup.select('.herolist.clearfix li a')  # 找到a节点
# for a in image_a:
#     href = a.img.attrs['src']
#     hero_name = a.get_text()
#     hero_url = f'https:{href}'
#     jpg_content = requests.get(hero_url).content
#     with open(f'heroPhoto/{hero_name}.jpg', 'wb') as f:
#         f.write(jpg_content)
#     print(f'图片{hero_name}存储完毕')


# pyquery
doc = pq(content)
image_as = doc('.herolist.clearfix li a').items()
print(image_as, type(image_as))

for a in image_as:
    href = a.find('img').attr('src')
    hero_name = a.text()
    hero_url = f'https:{href}'
    jpg_content = requests.get(hero_url).content
    with open(f'heroPhoto/{hero_name}.jpg', 'wb') as f:
        f.write(jpg_content)
    print(f'图片{hero_name}存储完毕')

爬取到的图片如下：