废话不多说,直接开干!
知识点
# 简单来说,Beautiful Soup是python的一个库,最主要的功能是从网页抓取数据。官方解释如下
Python标准库 BeautifulSoup(markup, “html.parser”)
lxml HTML 解析器 BeautifulSoup(markup, “lxml”)
html5lib BeautifulSoup(markup, “html5lib”)
需要库
pip install beautifulsoup4
import requests
from bs4 import BeautifulSoup
url = 'https://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&rsv_idx=1&tn=baidu&wd=百度好不好&fenlei=256&rsv_pq=ec71a01d0021673f&rsv_t=b948k99bLAGTlkyuTIfZPaCd0zbmzagzHXilM0NkUIPLNUN3uW5LKyIADMI&rqlang=cn&rsv_enter=1&rsv_dl=tb&rsv_n=2&rsv_sug3=1&rsv_sug1=1&rsv_sug7=000&rsv_sug2=0&rsv_btype=t&inputT=3343&rsv_sug4=3343&rsv_sug=1'
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36',
'Host':'express.baidu.com'
}
da = requests.get(url,headers=headers).text
soup = BeautifulSoup(da,'lxml')
print(soup.title.text)
运行结果


470

被折叠的 条评论
为什么被折叠?



