1. Python爬取pilipili排行榜
- 安装requests和beautifulsoup4
- 创建一个python文件引入库文件
- 利用requests的方法拿到html文档
- 通过bs4对html文档进行解析
- 将解析的结果写入到一个文件中
1.1 安装requests和beautifulsoup4
1.1.1 使用pycharm安装requests
1.1.2 安装beautifulsoup4
同样使用pycharm安装beautifulsoup
1.2 创建一个python文件引入库文件
将requests、Beautifulsoup引入,使用==request.get()==方法获取文旦,利用Beautifulsoup进行解析
import requests
from bs4 import BeautifulSoup
url = "https://www.bilibili.com/v/popular/rank/all"
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
1.3 爬取数据并写入到文本文档中
import requests
from bs4 import BeautifulSoup
url = "https://www.bilibili.com/v/popular/rank/all"
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
title = soup.title.text
all_products = []
products = soup.select("li.rank-item")
for product in products:
rank = product.select("div.num")[0].text
name = product.select("div.info > a")[0].text.strip()
play = product.select("span.data-box")[0].text.strip()
comment = product.select("span.data-box")[1].text.strip()
up = product.select("span.data-box")[2].text.strip()
url = product.select("div.info > a")[0].attrs['href'].strip()
all_products.append(
{
"视频排名": rank,
"视频名称": name,
"播放量": play,
"弹幕量": comment,
"up主": up,
"视频链接": url
}
)
with open("bili.txt", "w+", encoding="utf-8-sig") as f:
for i in range(0, len(all_products)):
for k, v in all_products[i].items():
f.write("{},{}\n".format(k, v))
f.write("--------------------------\n")