作用
模拟浏览器发请求
如何使用
requests模块的编码流程
①指定url
②发起请求(get/post)
③获取响应数据
④持久化存储
环境安装
pip install requests
实战编码
需求:爬取搜狗首页的页面数据
import requests
if __name__ == "__main__":
url = 'https://baidu.com/'
response = requests.get(url)
page_text = response.text
print(page_text)
with open('./baidu.html','w',encoding='utf-8') as fp:
fp.write(page_text)
print('爬取数据结束!')
实战巩固
需求:爬取搜狗指定词条对应的搜索结果页面
UA检测
UA伪装
import requests
if __name__ == "__main__":
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'
}
url = 'https://www.sogou.com/web'
kw = input('enter a word:')
params = {
'query': kw
}
response = requests.get(url = url, params = params,headers = headers)
page_text = response.text
fileName = kw +'.html'
with open(fileName, 'w', encoding="utf-8") as fp:
fp.write(page_text)
print(fileName, "保存成功")
需求:破解百度翻译
POST请求(携带参数)
响应数据是一组json数据
import requests
import json
if __name__ == "__main__":
post_url = 'https://fanyi.baidu.com/sug'
kw = input("enter key word\n")
data = {
'kw': kw
}
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'
}
response = requests.post(url=post_url, data=data,headers=headers)
dic_obj = response.json()
file = kw+'.json';
fp = open(file, 'w', encoding='utf-8')
json.dump(dic_obj, fp=fp, ensure_ascii=False)
print(file+'爬取完毕!!!')
需求:爬取豆瓣电影分类排行榜
需求:爬取肯德基餐厅中指定地点的餐厅数目
需求:爬取国家药品监督管理总局中基于中华人民共和国化妆品生产许可证相关数据