小白通俗版Python爬取b站排行榜top100并编辑为csv文件

最新推荐文章于 2024-04-21 15:05:18 发布

沉稳缓

最新推荐文章于 2024-04-21 15:05:18 发布

阅读量991

点赞数 1

分类专栏：笔记

本文链接：https://blog.csdn.net/asgria/article/details/104333488

版权

这篇博客介绍了如何使用Python的requests、beautifulsoup4和csv库来爬取B站排行榜前100名，并将数据保存为CSV文件。文章详细讲解了每个库的作用，如requests负责网络连接，bs4用于解析网页，csv库用于存储数据到CSV格式。通过get和post方法获取网页，然后用BeautifulSoup解析找到所需信息，最终将数据存储到CSV文件中。

摘要由CSDN通过智能技术生成

import requests
from bs4 import BeautifulSoup
import csv

嘿嘿，该项目需要引入3个库
1.requests（用于连接URL(网络)）
2.bs4（用于解析URL（网络）)
3.csv（把文件保存为csv模式（类似excel））
如果没有这些库可以在dos窗口用‘pip install 库名’来安装

1.获取数据所需要的库: requests
2.解析数据所需要的库：beautifulsoup4
3.保存数据到csv的库：csv

1首先第一个库requests用于连接网络

url='https://www.bilibili.com/ranking'
response = requests.get(url)

requests里面的get和post后面加要爬取的网站就可以打开该网站啦
2其次用bs4里面的Beautifulsoup进行网络的解析

html_text=response.text
soup = BeautifulSoup(html_text,'html.parser')

其次用soup的findall来寻找我们要爬取的资料，我以下爬取了6个属性

items = soup.findAll('li', {
   'class': 'rank-item'})
print(len(items))
for itm in items:
    title = itm.find('a', {
   'class': 'title'}).text
    up = itm.find_all('a')[2].text
    score = itm.find('div', {
   'class': 'pts'}).find('div').text
    rank = itm.find('div', {
   'class': 'num'}).text
    url = itm.find('a', {
   'class': 'title'}).get('href')
    space = itm.find_all('a')[2