一、BeautifulSoup简介
BeautifulSoup是Python最受欢迎的HTML/XML解析库之一,它能将复杂的网页文档转换为树形结构,支持多种解析器(如lxml、html.parser)。配合requests库,可以快速构建网页爬虫项目。
二、环境准备
pip install requests beautifulsoup4 matplotlib
三、实战:百度热搜数据获取
1. 获取网页内容
import requests
from bs4 import BeautifulSoup
url = 'https://top.baidu.com/board?tab=realtime'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
}
response = requests.get(url, headers=headers)
html = response.content
<