爬虫爬取王者荣耀英雄

爬取英雄所在标签

import requests
import bs4
from bs4 import BeautifulSoup
# https://beautifulsoup.readthedocs.io/zh_CN/v4.4.0/#


def main():
    url = u'https://pvp.qq.com/web201605/herolist.shtml'
    html = requests.get(url=url)
    html.encoding = 'gbk'
    bs = bs4.BeautifulSoup(markup=html.content, features='lxml')
    hero_list = bs.find_all(href=re.compile('herodetail'))
    file = open(file='../file/out.txt', mode='w', encoding='utf')
    for i in hero_list:
        file.write(str(i)+'\n')
    file.close()


if __name__ == '__main__':
    main()


得到英雄详情页与图片地址

import bs4
from bs4 import BeautifulSoup


def main():
    hero_href =[]
    hero_img = []
    hero_name = []
    with open(file='../file/out.txt', mode='r', encoding='utf-8') as f:
        for i in f:
            bs = bs4.BeautifulSoup(markup=i, features='lxml')
            hero_href.append(bs.a['href'])
            hero_img.append(bs.a.img['src'])
            hero_name.append(bs.img['alt'])
    href = 'https://pvp.qq.com/web201605/'
    img ='https:'
    file = open(file='../file/out1.txt', mode='w', encoding='utf-8')
    total = len(hero_href)
    for i in range(total):
        file.write(str(hero_name[i])+' '+ href+str(hero_href[i])+' '+img+str(hero_img[i])+'\n')
    file.close()


if __name__ == '__main__':
    main()
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值