python爬取b站排行榜_抓取+硒元素,获得Bilibili排行榜(紧急列表)(动态加载),scrapyselenium,获取,哔哩,应援...

目标数据:

watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L0ZlcmVuY3o=,size_16,color_FFFFFF,t_70

爬虫代码:

# -*- coding: utf-8 -*-

import scrapy

from bilibili_yy.items import BilibiliYyItem

import re

from selenium import webdriver

import pyperclip

class BiliSpider(scrapy.Spider):

name = 'bili'

# allowed_domains = ['manga.bilibili.com']

start_urls = ['https://manga.bilibili.com/ranking?from=manga_homepage#/ouenn/']

def __init__(self):

self.driver = webdriver.Chrome()

def parse(self, response):

item = BilibiliYyItem()

for data_s in response.xpath('//div[@class="rank-item dp-i-block border-box p-relative"]'):

pmqingkuang = data_s.xpath('.//div[starts-with(@class,"rank-movement p-absolute bg-center bg-cover bg-no-repeat")]/@class').extract()[0]

if len(data_s.xpath('.//span[starts-with(@class,"digit-item bg-center bg-contain bg-no-repeat dp-i-block digit-")]')) == 2:

item['paiming'] = re.findall(r"\d", data_s.xpath('.//span[starts-with(@class,"digit-item bg-center bg-contain bg-no-repeat dp-i-block digit-")]/@class').extract()[0])[0]+ re.findall(r"\d", data_s.xpath('.//span[starts-with(@class,"digit-item bg-center bg-contain bg-no-repeat dp-i-block digit-")]/@class').extract()[1])[0]

else:

item['paiming'] = re.findall(r"\d", data_s.xpath('.//span[starts-with(@class,"digit-item bg-center bg-contain bg-no-repeat dp-i-block digit-")]/@class').extract()[0])[0].zfill(2)

if 'hold' in pmqingkuang:

item['pmqingkuang'] = '保持'

elif 'up' in pmqingkuang:

item['pmqingkuang'] = '上升'

else:

item['pmqingkuang'] = '下降'

item['pic_link'] = data_s.xpath('.//div[starts-with(@class,"manga-cover bg-center bg-cover bg-no-repeat")]/@data-src').extract()[0]

item['cartoon_link'] ='https://manga.bilibili.com'+ data_s.xpath('.//a[starts-with(@class,"dp-block manga-title")]/@href').extract()[0]

item['name'] = data_s.xpath('.//a[starts-with(@class,"dp-block manga-title")]/text()').extract()[0]

item['author'] = data_s.xpath('.//p[@class="fans-author-text t-over-hidden t-no-wrap"]/text()').extract()[0]

item['fensizhi'] = data_s.xpath('.//p[@class="fans-value"]/text()').extract()[0].replace(' 万 粉丝值','')

if data_s.xpath('.//div[@class="award-user-ctnr p-absolute w-100"]/div[2]/@title'):

item['zhugong1'] = data_s.xpath('.//div[@class="award-user-ctnr p-absolute w-100"]/div[2]/@title').extract()[0]

else:

item['zhugong1'] = ''

if data_s.xpath('.//div[@class="award-user-ctnr p-absolute w-100"]/div[3]/@title'):

item['zhugong2'] = data_s.xpath('.//div[@class="award-user-ctnr p-absolute w-100"]/div[3]/@title').extract()[0]

else:

item['zhugong2'] = ''

if data_s.xpath('.//div[@class="award-user-ctnr p-absolute w-100"]/div[4]/@title'):

item['zhugong3'] = data_s.xpath('.//div[@class="award-user-ctnr p-absolute w-100"]/div[4]/@title').extract()[0]

else:

item['zhugong3'] = ''

yield item

def close_spider(self,spider):

print('关闭浏览器对象')

self.driver.quit()

写出mongo:

watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L0ZlcmVuY3o=,size_16,color_FFFFFF,t_70

全部文件下载:

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
好的,我来给你讲一下Python爬取B站电影排行榜的方法。 首先,我们需要用到Python的requests库和BeautifulSoup库。requests库用于发送HTTP请求,BeautifulSoup库用于解析HTML内容。 接下来,我们需要找到B站电影排行榜的URL。可以在浏览器中打开B站电影排行榜页面,然后复制URL。例如:https://www.bilibili.com/v/popular/rank/movie。 然后,我们可以使用requests库发送GET请求获取HTML内容。代码如下: ```python import requests url = 'https://www.bilibili.com/v/popular/rank/movie' response = requests.get(url) html = response.text ``` 接下来,我们需要使用BeautifulSoup库解析HTML内容。我们可以使用find_all方法来获取电影列表。代码如下: ```python from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'html.parser') movie_list = soup.find_all('li', class_='rank-item') ``` 现在我们已经获取了电影列表。我们可以遍历列表,提取需要的信息,例如电影名称、播放量等。代码如下: ```python for movie in movie_list: title = movie.find('div', class_='title').text.strip() play_count = movie.find('div', class_='detail').find_all('span')[0].text.strip() print(title, play_count) ``` 这样我们就可以爬取B站电影排行榜的信息了。完整代码如下: ```python import requests from bs4 import BeautifulSoup url = 'https://www.bilibili.com/v/popular/rank/movie' response = requests.get(url) html = response.text soup = BeautifulSoup(html, 'html.parser') movie_list = soup.find_all('li', class_='rank-item') for movie in movie_list: title = movie.find('div', class_='title').text.strip() play_count = movie.find('div', class_='detail').find_all('span')[0].text.strip() print(title, play_count) ``` 希望这个例子能帮助你理解Python爬取网页的基本方法。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值