一、requests
requests模块:python中原生的基于网络请求的模块,功能强大,简单便捷,效率极高
作用:模拟浏览器发请求
- 1.1如何使用(requests模块的 编码流程):
1、指定url
2、发起请求
3、获取响应数据
4、持久化 存储
- 2 环境安装
pip install requests
- 1.3 实战编码
request之网页采集
import requests
if __name__ == "__main__":
header = {
'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/100.0.4896.127 Mobile Safari/537.36 '
}
url = 'https://www.sogou.com/web'
kw = input('Please enter a word:')
param = {
'query': kw
}
response = requests.get(url=url, params=param, headers=header)
page_text = response.text
fileName = kw + '.html'
with open(fileName, 'w', encoding='utf-8') as fp:
fp.write(page_text)
print(fileName, '保存成功!!!')
显示结果为:
- 4.UA伪装
让爬虫对应的请求载体身份标识伪装成某一款浏览器。