python爬虫——request
1.确定url
2.UA伪装
headers = {
‘User-Agent’ : ‘’
}
3.发起请求 request.get/post
request.get(url, param, headers)
request.get(url, data, headers)
4.获取响应数据
4.1 text(返回字符串信息)
4.2 json(返回对象)
4.3 content(返回二进制)
5.持久化存储
text
with open(‘文件名’, ‘w’ , encoding) as fp:
fp,write(‘文件’)
json存储
fp = open(‘文件名’, ‘w’ , encoding)
json.dump(‘文件’,fp,ensure_ascii=False)
content
with open(‘文件名’, ‘wb’) as fp:
fp,write(‘文件’)
ajax请求:url不变,页面发生变化的请求,在network中的XHR中捕获其发送请求的包
案例:爬取豆瓣电影信息
import requests
import json
if __name__ == '__main__':
url = 'https://movie.douban.com/j/chart/top_list'
param = {
'type': '24',
'interval_id': '100:90',
'action': '',
'start': '0',
'limit': '20'
}
headers = {
'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36'
}
response = requests.get(url=url, params=param, headers=headers)
list_data = response.json()
fileName = '豆瓣.json'
fp = open(fileName, 'w', encoding='utf-8')
json.dump(list_data, fp=fp, ensure_ascii=False)
print('over')