考试自用爬虫

最新推荐文章于 2024-10-08 17:02:35 发布

大三考试人

最新推荐文章于 2024-10-08 17:02:35 发布

阅读量1k

点赞数 8

文章标签： python 开发语言

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/m0_66593765/article/details/135116195

版权

from selenium import webdriver
url = 'https://www.bilibili.com/video/BV1Wi4y1a7h5'
options = webdriver.ChromeOptions()
options.add_experimental_option('detach', True)
driver = webdriver.Chrome(options=options)

driver.get(url)
import time
time.sleep(5)

html = driver.page_source

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')

title = soup.find('h1', class_="video-title")
count = soup.find('span', class_="view item")
dm = soup.find('span', class_="dm item")
datetime = soup.find('span', class_="pubdate-text")

comments = soup.find_all('div', class_="content-warp")
comments_text = []

for comment in comments:
    name = comment.find('div', class_="user-info").text
    text = comment.find('span', class_="reply-content").text
    comments_text.append({
        'name': name,
        'text': text
    })

# 输出结果
print(f"标题：{title.text}，播放量：{count.text.strip()}，弹幕数：{dm.text.strip()}")
for comment in comments_text:
    print(f"评论：\nID:{comment['name']}，评论内容：{comment['text']}")

driver.close()

大三考试人

关注

8
点赞
踩
6

收藏

觉得还不错? 一键收藏
2
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

大三考试人 CSDN认证博客专家 CSDN认证企业博客

码龄3年

7: 原创

130万+: 周排名

12万+: 总排名

6035: 访问

: 等级

176: 积分

320: 粉丝

80: 获赞

28: 评论

73: 收藏

私信

关注

热门文章

最新评论

数据采集与预处理考试自用考点笔记
yggggg111: 操作1：显示所有新闻的缩略图片URL。（8分） import requests from bs4 import BeautifulSoup html_doc=””” ……. soup = BeautifulSoup(html_doc, "html.parser") img_tags = soup.find_all('img') for tag in img_tags: print(tag.get('src')) 操作2：显示所有新闻的标题。（10分） Import requests From bs4 import beautifulsoup html_doc=’’’’’’ …… Soup = beautifulsoup(html_doc.’html.parser’) H3_tags = soup.find_all(‘h3’) For tag in h3_tags; Print(tag.get_text()) 操作3：显示所有新闻的标题和发表时间。（12分） import requests from bs4 import BeautifulSoup html_doc=””” ……. Soup = beautifulSoup(html_doc.’html.parser’) a_tags = soup.find_all(‘a’) for tag in a_tags; data = tag.find(‘div’,class_=’date’).find(‘h3’).get_text() title = tag.find(‘div’.class_=’text’).find(‘h3’).get_text() print(f”{title}\t{date}”) 操作4：显示所有新闻的标题和URL。（15分） import requests from bs4 import BeautifulSoup html_doc=””” ……. Soup = beautifulSoup(html_doc.’html.parser’) a_tags = soup.find_all(‘a’) for tag in a_tags; title = tag.find(‘h3’).get_text() url = tag[‘href’]
数据采集与预处理考试自用考点笔记
2301_77213361: 第一个选d
数据采集与预处理考试自用考点笔记
2301_77213361: 对对对错对对错对对错 aacbb soup.find_all(‘a’) elem.get(‘href’) soup.find_all(‘img’) elem.get(‘src’) soup.find_all(‘p’) elem.text()
数据采集与预处理考试自用考点笔记
rby2391906189: √√√×√××√√× DACBB Soup.find_all（a） Elem.get（‘href’） Soup.find_all（img） elem.get（‘src’） Soup.find_all（p） Elem.text
考试自用爬虫
rby2391906189: 操作1：显示所有新闻的缩略图片URL。（8分） soup = BeautifulSoup(html_content,'html.parser') img_tags = soup.find_all('a', recursive=True) for a_tag in img_tags: img_in_div = a_tag.find_all('div', recursive=True) for div_tag in img_in_div: img = div_tag.find('img') if img: img_src = img['src'] print(img_src) 操作2：显示所有新闻的标题。（10分） soup = BeautifulSoup(html_content, 'html.parser') news_titles = soup.find_all('h3') for title in news_titles: print(title.text.strip()) 操作3：显示所有新闻的标题和发表时间。（12分） for item in news_items: title = item.find('div', class_='text').h3.text.strip() date = item.find('div', class_='data').h3.text.strip() print(f"{title}\t{date}") 操作4：显示所有新闻的标题和URL。（15分） soup = BeautifulSoup(html_content, 'html.parser') for a_tag in soup.find_all('a', class_='item'): title = a_tag.find('div', class_='text').h3.get_text(strip=True) url = a_tag['href'] print(f"{title}\t{url}")

最新文章

目录

评论 2

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。