Python爬虫——ajax的Get和Post请求

最新推荐文章于 2024-03-23 16:15:48 发布

万里顾—程

最新推荐文章于 2024-03-23 16:15:48 发布

阅读量1k

点赞数 2

分类专栏： python 文章标签： python 爬虫 ajax

本文链接：https://blog.csdn.net/wpc2018/article/details/125784236

版权

python 专栏收录该内容

36 篇文章 16 订阅

订阅专栏

Python爬虫——ajax的Get和Post请求

有些网站内容是使用ajax加载的，ajax通常返回的是JSON格式的数据，直接对ajax的 url 进行POST或GET方式请求，得到的就是JSON格式的数据。当前端页面想和后端服务器进行数据交互时就可以使用ajax。

1、Get请求

使用实例A：获得网站第一页的数据

import urllib.request
import urllib.parse

url = 'https://movie.douban.com/j/new_search_subjects?sort=U&range=0,10&tags=&start=0&genres=%E5%8A%A8%E4%BD%9C'

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36 SLBrowser/8.0.0.5261 SLBChan/10'}

request = urllib.request.Request(url=url,headers=headers)

response = urllib.request.urlopen(request)

content = response.read().decode('utf-8')

#open方法默认使用gbk编码，如果文件中有汉字那么需要在open方法中指定编码格式为utf-8
file = open('douban.json','w',encoding='utf-8')
#将json格式内容写入文件
file.write(content)

执行结果：Pycharm 格式化json文件快捷键：ctrl+alt+L

在这里插入图片描述

使用实例B：获得网站前10页的数据

import urllib.request
import urllib.parse


#发送请求，返回响应对象
def create_response(page):
    url = 'https://movie.douban.com/j/chart/top_list?type=5&interval_id=100%3A90&action='
    #请求参数
    date = {
        'start':(page-1)*20,
        'limit':20
    }
    #对参数进行Unicode编码
    date = urllib.parse.urlencode(date)
    url = url+date
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36 SLBrowser/8.0.0.5261 SLBChan/10'
    }
    #定制请求并返回响应对象
    request = urllib.request.Request(url=url,headers=headers)
    response = urllib.request.urlopen(request)
    return response

#文件下载函数
def down_file(response,page):
    #对内容utf-8解码并写入文件
    content = response.read().decode('utf-8')
    file = open('douban_'+str(page)+'.json','w',encoding='utf-8')
    file.write(content)
    file.close()


#程序的入口
if __name__ == '__main__':
    start_page = int(input('请输入起始页码：'))
    end_page = int(input('请输入结束页码：'))
    for page in range(start_page,end_page+1):
       response = create_response(page)
       down_file(response,page)

执行结果：json文件成功下载

在这里插入图片描述

2、Post请求

使用实例：获取网站前十页数据

import urllib.request
import urllib.parse

#定制请求，发送请求，返回响应对象
def create_request(page):
    url = 'http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=cname'
    date = {
        'cname': '广州',
        'pid': '',
        'pageIndex': page,
        'pageSize': 10
    }

    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36 SLBrowser/8.0.0.5261 SLBChan/10'}

    #参数两次编码
    date = urllib.parse.urlencode(date).encode('utf-8')

    request = urllib.request.Request(url, date, headers)

    response = urllib.request.urlopen(request)

    return response
#下载文件到本地
def create_down(response,page):
    content = response.read().decode('utf-8')
    file = open('kendeji'+str(page)+'.json', 'w', encoding='utf-8')
    file.write(content)

if __name__ == '__main__':
    start_page = int(input('输入起始页：'))
    end_page = int(input('输入结束页：'))
    for page in range(start_page,end_page+1):
        response = create_request(page)
        create_down(response,page)

执行结果：json文件成功下载到本地

在这里插入图片描述

万里顾—程

关注

2
点赞
踩
5

收藏

觉得还不错? 一键收藏
打赏
0
评论
Python爬虫——ajax的Get和Post请求

有些网站内容是使用ajax加载的，ajax通常返回的是JSON格式的数据，直接对ajax的 url 进行POST或GET方式请求，得到的就是JSON格式的数据。当前端页面想和后端服务器进行数据交互时就可以使用ajax。执行结果：Pycharm 格式化json文件快捷键：ctrl+alt+L执行结果：json文件成功下载执行结果：json文件成功下载到本地...
复制链接

扫一扫