python爬虫——request

最新推荐文章于 2023-07-31 17:56:46 发布

写一篇多根头发

最新推荐文章于 2023-07-31 17:56:46 发布

阅读量183

点赞数

文章标签： python

本文链接：https://blog.csdn.net/qq_43685335/article/details/108539444

版权

python爬虫——request

1.确定url
2.UA伪装
headers = {
‘User-Agent’ : ‘’
}
3.发起请求 request.get/post
request.get(url, param, headers)
request.get(url, data, headers)
4.获取响应数据
4.1 text（返回字符串信息）
4.2 json（返回对象）
4.3 content（返回二进制）
5.持久化存储
text
with open(‘文件名’， ‘w’ , encoding) as fp:
fp,write(‘文件’)
json存储
fp = open(‘文件名’， ‘w’ , encoding)
json.dump(‘文件’,fp,ensure_ascii=False)
content
with open(‘文件名’， ‘wb’) as fp:
fp,write(‘文件’)

ajax请求：url不变，页面发生变化的请求，在network中的XHR中捕获其发送请求的包

案例：爬取豆瓣电影信息

import requests
import json

if __name__ == '__main__':
    url = 'https://movie.douban.com/j/chart/top_list'

    param = {
        'type': '24',
        'interval_id': '100:90',
        'action': '',
        'start': '0',
        'limit': '20'
    }

    headers = {
        'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36'
    }

    response = requests.get(url=url, params=param, headers=headers)

    list_data = response.json()

    fileName = '豆瓣.json'

    fp = open(fileName, 'w', encoding='utf-8')

    json.dump(list_data, fp=fp, ensure_ascii=False)

    print('over')