【爬虫】爬取B站小黑屋

最新推荐文章于 2024-03-05 01:03:19 发布

YancyKahn

最新推荐文章于 2024-03-05 01:03:19 发布

阅读量1.1k

点赞数 2

分类专栏：爬虫文章标签： python selenium 爬虫 b站

本文链接：https://blog.csdn.net/qq_37753409/article/details/108968345

版权

爬取B站小黑屋信息

由于b站更新了反爬虫策略，现在爬取B站可以采用模拟浏览器操作进行爬取。需要安装以下python模块:

pip3 install selenium 
pip3 install bs4

使用selenium模拟浏览器操作，对小黑屋进行模拟下拉操作，可以设置下拉次数（这里要注意每次下拉后要sleep一段时间，否则网页会加载不完）。等获取到足够的页面后在进行数据清洗。

from selenium import webdriver
from bs4 import BeautifulSoup  
import time
import json
import re


class BSpider():

    def __init__(self):
        # 设置无界面模式
        options = webdriver.FirefoxOptions()
        options.add_argument('--headless')
        self.browser = webdriver.Firefox(options = options)
        self.blackroom_page = 'https://www.bilibili.com/blackroom/ban'
        self.count = 0

    # 获取页面
    def get_page(self):
        
        self.browser

最低0.47元/天解锁文章

YancyKahn

关注

2
点赞
踩
9

收藏

觉得还不错? 一键收藏
2
评论
【爬虫】爬取B站小黑屋

爬取B站小黑屋信息由于b站更新了反爬虫策略，现在爬取B站可以采用模拟浏览器操作进行爬取。需要安装以下python模块:pip3 install selenium pip3 install bs4 使用selenium模拟浏览器操作，对小黑屋进行模拟下拉操作，可以设置下拉次数（这里要注意每次下拉后要sleep一段时间，否则网页会加载不完）。等获取到足够的页面后在进行数据清洗。from selenium import webdriverfrom bs4 import BeautifulSoup
复制链接

扫一扫

专栏目录