1. requests模块:python中原生的基于网络请求的模块。下载pip install requests,如果下载失败,参考https://blog.csdn.net/qq_42231156/article/details/113786757。
1.1 作用:模拟浏览器发送请求。官网:https://pypi.org/project/requests/
1.2 使用:
import requests
if __name__ == "__main__":
url = "https://www.xxx.com/web"
kw=input("输入搜索内容:")
params = {
"query":kw
}
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36"
}
filepath=kw+".html"
res = requests.get(url=url, params=params,headers=headers)
page_html = res.text
with open(filepath, "w", encoding="utf-8") as fp:
fp.write(page_html)
print("get爬取成功")
// res = requests.post(url=url, data=params,headers=headers)
// page_json =res.json() ,如果响应结果是json格式,可以通过响应头content-type: application/json;charset=utf-8 判断是否响应为json格式
// fp=open(filepath,"w", encoding="utf-8")
// json.dump(page_json,fp=fp,ensure_ascii=False)
// print("post爬取成功")
// 爬虫机制之:UA伪装,即User-Agent伪装。
// 反爬虫机制之:UA检查,即User-Agent检查校验。