Requests 模块
python中基于网络请求的模块,模拟浏览器发送请求
1. 模块安装
pip install requests
2. requests 模块请求流程
-
指定 url
-
发起请求 get/post
- get 方法返回一个响应对象
- .text 返回的是字符串形式的响应数据
- 💎💎💎保存的 .html 文件,格式不规范,阅读困难,可以通过快捷键 ctrl+Alt+L 优化格式
-
获取相应数据
-
持久化存储
3. 实战编码
3.1 需求:爬取搜狗首页的页面数据
import requests
url = "https://www.sogou.com/"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36"
}
resp = requests.get(url=url,headers=headers)
page = resp.text
with open("sougou.html","w",encoding="utf-8") as f:
f.write(page)
resp.close()
print(page
3.2 需求:爬取搜狗指定词条的搜索结果页面(简易的页面采集器)
import requests
url = "https://www.sogou.com/web"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36"
}
name = input("请输入要查询的内容:")
data = {
"query":name
}
resp = requests.get(url=url,headers=headers,params=data)
with open(name+".html","w",encoding="utf-8") as f:
f.write(resp.text)
resp.close()
print(resp.text)
3.3 需求:破解百度翻译
import requests
url = "https://fanyi.baidu.com/sug"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36"
}
words = input("请输入要查询的单词:")
data ={
"kw":words
}
resp =requests.post(url=url,headers=headers,data=data)
response = resp.json()
for i in response["data"]:
print(i)
resp.close()
请求方法为 post,需要携带参数
响应数据类型类 jason 类型,返回的是obj对象,字典类型
3.4 需求:豆瓣电影爬取
import json
import requests
url = "https://movie.douban.com/j/chart/top_list"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36"
}
data = {
"type":24,
"interval_id": "100:90",
"action": "",
"start": 0,
"limit": "20"
}
resp = requests.get(url=url,headers=headers,params=data)
resp_obj = resp.json()
with open("豆瓣电影排行榜.json","w",encoding="utf-8") as f:
json.dump(resp_obj,f,ensure_ascii=False)
print("爬取完成!!!")
resp.close()
💎💎💎json 类型文件保存使用方法为**json.dump(obj,文件名,ensure_ascii=False)**函数💎💎💎
3.5 需求:肯德基餐厅位置爬取
import requests
url = "http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36"
}
params = {
"op":"keyword"
}
data = {
"cname": "",
"pid": "",
"keyword": input("请输入要查询的城市:"),
"pageIndex": 1,
"pageSize": 10
}
resp = requests.post(url=url,headers=headers,params=params,data=data)
with open("肯德基餐厅位置.text","w",encoding="utf-8") as fp:
fp.write(resp.text)
print("爬取完成!!!")
resp.close()