源码:
from bs4 import BeautifulSoup
import requests
header={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36'}
def requestsdata(urln):#输入url 转换成可供bs4操作的数据汤
r=requests.get(urln,headers=header)
r.encoding='utf-8'
soup=BeautifulSoup(r.text,'html.parser')
return soup
print(requestsdata('https://www.baidu.com/'))
ps:其实请求头加不加百度都能爬。还得看网站的反扒措施,百度的验证码才恶心
print
关于get和post的理解:
个人感觉get方式更普遍更多(简单)
post方式需要配置一大串请求data(复杂)