爬虫实战设计

最新推荐文章于 2025-03-10 21:59:21 发布

Gray.

最新推荐文章于 2025-03-10 21:59:21 发布

阅读量1.4k

点赞数 17

文章标签：爬虫

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/2302_76568248/article/details/135134738

版权

研究生院

from selenium import webdriver

from selenium.webdriver.common.by import By

from selenium.webdriver import ActionChains

import time

options = webdriver.ChromeOptions()

options.add_experimental_option('detach', True)

driver = webdriver.Chrome(options=options)

driver.get('https://yjsy.hunnu.edu.cn')

time.sleep(5)

xpath_1 = "//ul[@class='menu']/li[4]/a"

xpath_2 = "//ul[@class='menu']/li[4]/ul/li[2]/a"

button_1 = driver.find_element(By.XPATH, xpath_1)

button_2 = driver.find_element(By.XPATH, xpath_2)

ActionChains(driver).move_to_element(button_1).perform()

time.sleep(5)

ActionChains(driver).move_to_element(button_2).click().perform()

哔哩哔哩1

from selenium import webdriver

url = 'https://www.bilibili.com/video/BV1iN4y1a7KJ'

options = webdriver.ChromeOptions()

options.add_experimental_option('detach', True)

driver = webdriver.Chrome(options=options)

driver.get(url)

import time

time.sleep(5)

html = driver.page_source

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, 'lxml')

title = soup.find('h1', class_="video-title")

count = soup.find('span', class_="view item")

dm = soup.find('span', class_="dm item")

datetime = soup.find('span', class_="pubdate-text")

comments = soup.find_all('div', class_="content-warp")

comments_text = []

for comment in comments:

name = comment.find('div', class_="user-info").text

text = comment.find('span', class_="reply-content").text

comments_text.append({

'name': name,

'text': text

})

# 输出结果

print(f"标题：{title.text}，播放量：{count.text.strip()}，弹幕数：{dm.text.strip()}")

for comment in comments_text:

print(f"评论：\nID:{comment['name']}，评论内容：{comment['text']}")

driver.close()

哔哩哔哩2

from selenium import webdriver

from selenium.webdriver.common.by import By

#不让浏览器自动关闭

options = webdriver.EdgeOptions()

options.add_experimental_option('detach', True)

driver = webdriver.ChromiumEdge(options=options)

#加载网页，获取源代码

url = 'https://www.bilibili.com/v/popular/all/'

driver.get(url)

#导入BeautifulSoup，筛选数据

from bs4 import BeautifulSoup

soup = BeautifulSoup(driver.page_source, 'lxml')

result = soup.find_all('div', class_='video-card')

for item in result:

title = item.find('p', class_='video-name')

up = item.find('span', class_='up-name__text')

count = item.find('span', class_='play-text')

print(f'视频：{title.text}，UP:{up.text}，播放量：{count.text.strip()}')

Gray. CSDN认证博客专家 CSDN认证企业博客

码龄2年

2: 原创

202万+: 周排名

129万+: 总排名

1828: 访问

: 等级

46: 积分

20: 粉丝

25: 获赞

1: 评论

25: 收藏

私信

关注

热门文章

最新评论

云计算教程
CSDN-Ada助手: 评论：亲爱的作者，你的第二篇博客“云计算教程”非常精彩！看到你分享如何在ECS实例上手动搭建LNMP环境，我感到很受启发。希望你能继续创作，分享更多关于云计算和服务器搭建的教程。此外，你可以考虑深入探讨云计算的安全性和成本优化，以及如何进行自动化部署和容器化技术等内容，这些都是与云计算相关的有趣话题。期待你的更多精彩分享！如何写出更高质量的博客，请看该博主的分享：https://blog.csdn.net/lmy_520/article/details/128686434?utm_source=csdn_ai_ada_blog_reply2
爬虫实战设计
CSDN-Ada助手: 恭喜你这篇博客进入【CSDN每天最佳新人】榜单，全部的排名请看 https://bbs.csdn.net/topics/617777173。

大家在看

最新文章

云计算教程

目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。