Python 获取ajax的post请求数据

目录

 

Python 获取ajax的post请求数据

需求:

1 ajax的过程分析:

1.1 首先是这个店铺分类的url地址,来打开这个绘画session:

1.2 获取ajax发送需要的数据:

2.页面的分析

2.1当点击下一页时,会发现地址栏中url和页面没变化,那么通过分析发现他是使用了ajax方法

2.2获取请求地址url和请求头headers 后面要用到:

2.3然后分析需要最后的结果数据:

3.开始Python 编码:

4 展示结果:


Python 获取ajax的post请求数据

知识小提示:用dict()方法把字符串转换为字典(可能报错) ,字符串转字典要用eval(),这个方法很多书上都没有介绍,eval()的用法

https://www.runoob.com/python/python-func-eval.html

需求:

http://shop.11st.co.kr/stores/522047/category 这个是店铺的每个产品的url网址。

1 ajax的过程分析:

1.1 首先是这个店铺分类的url地址,来打开这个绘画session:

session_url=http://shop.11st.co.kr/stores/522047/category

1.2 获取ajax发送需要的数据:

请求地址: jump_url , 发送表格数据: form_data, 发送方式: post,请求头信息: headers 

2.页面的分析

2.1当点击下一页时,会发现地址栏中url和页面没变化,那么通过分析发现他是使用了ajax方法

method:StoreSearchListingAjax

F12选择network的xhr ,点击下一页发现多了 多了一条xhr:http://shop.11st.co.kr/storesAjax/StoreListingAjaxAction.tmall? method=StoreSearchListingAjax

2.2获取请求地址url和请求头headers 后面要用到:

jump_url:

http://shop.11st.co.kr/storesAjax/StoreListingAjaxAction.tmall?method=StoreSearchListingAjax

headers:

POST /storesAjax/StoreListingAjaxAction.tmall?method=StoreSearchListingAjax HTTP/1.1
Host: shop.11st.co.kr
Connection: keep-alive
Content-Length: 137
Accept: application/json, text/javascript, */*; q=0.01
Origin: http://shop.11st.co.kr
X-Requested-With: XMLHttpRequest
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
Referer: http://shop.11st.co.kr/stores/522047/category
Accept-Encoding: gzip, deflate
Accept-Language: zh-CN,zh;q=0.9
Cookie: WMONID=QQRH5vHq0a_; PCID=15879036685077771016115; XSRF-TOKEN=cbfcf578-5b14-5ed1-380f-308df58ad410; TD=TB_ACT_DATA%7C0%3AN%3A-1%3A0%3AN%3A-1; _ga=GA1.3.1915814488.1587903671; _gid=GA1.3.1451948013.1587903671; PCID_FRV=true; RAKE_SID=15879036717188511103499; RAKE_SID_XSITE=15879036717188511103499; TP=scrnChk%7CY%23TB_DATA_CHK%7CN%3AY%23GLOBAL_DOMESTIC_ACCESS%7CY; DMP_UID=(DMPC)4d9b4410-f650-4dc5-bb6c-6f7bb58d62dc; AUID=AUID_iET0kpU0K6EvJpF84MPHow; TT=GLOBAL_CHINESE_IP_YN%7CY%2311ST_EN_CURR%7CCNY%23GLOBAL_DELIVERY%7C222%23GLOBAL_CHARSET%7Czh; JSESSIONID=m1O2eCNosbWjDxfe6IhayxSVgTg2P8w1h_urNbRD-KFHFV8Gn5kP!-1198550641

ajax的发送的数据form data:

searchKwd: 
storeId: 522047
storeNo: 522047
encSellerNo: 19wqwPhwf0bYTT5rhUwvVA==
sortCd: NP
filter: 
pageNo: 2
pageTypeCd: 02
trTypeCd: STP06

2.3然后分析需要最后的结果数据:

3.开始Python 编码:

#-*-coding:utf-8-*-
import requests
import json

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0',
    'Accept': 'application/json, text/javascript, */*; q=0.01',
    'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
    'X-Requested-With': 'XMLHttpRequest'
}
form_data = {
    'searchKwd': '',
    'storeId': '522047',
    'storeNo': '522047',
    'encSellerNo': '19wqwPhwf0bYTT5rhUwvVA==',
    'sortCd': 'NP',
    'filter': '',
    "pageNo": '',
    'pageTypeCd': '02',
    'trTypeCd': 'STP06',
}
first_url="http://shop.11st.co.kr/stores/522047/category"
jump_url="http://shop.11st.co.kr/storesAjax/StoreListingAjaxAction.tmall?method=StoreSearchListingAjax"
s = requests.Session()
print(type(s))
s.post(first_url)
prdDtlUrl_list=[]
def getProDtlUrl_list(s,prdDtlUrl_list):
    totalpage=int(9301/30+1)
    # for pageNo in range(1,totalpage+1): #全部数据
    for pageNo in range(1,3): #先测试2页数据
        form_data['pageNo'] = '{}'.format(pageNo)
        response = s.post(jump_url, data=form_data, headers=headers)
        byte_content=response.content #byte数据
        # str=str(byte_content,'utf-8') #字符串
        # json_data=json.loads(byte_content) #转json数据/load(file数据),dumps/dump转其他
        # print(str)
        print(eval(byte_content)) #转字典类型eval(),当使用dict()时,不太好用报错
        dict_content=eval(byte_content)
        productList=dict_content['data']['productList']
        print(productList)
        for d in productList:
            prdDtlUrl_list.append(d['prdDtlUrl'])
        print(prdDtlUrl_list)
getProDtlUrl_list(s,prdDtlUrl_list)
print('以下为提取的产品url地址=================')
for url in prdDtlUrl_list:
    print(url)

4 展示结果:

使用正则表达式方式:

import requests
import re


process = requests.Session()
url = 'http://shop.11st.co.kr/stores/522047/category'
url1 = 'http://shop.11st.co.kr/storesAjax/StoreListingAjaxAction.tmall?method=StoreSearchListingAjax'
headers = {
    'Accept': 'application/json, text/javascript, */*; q=0.01',
    'Accept-Encoding': 'gzip, deflate',
    'Accept-Language': 'zh-CN,zh;q=0.9',
    'Cache-Control': 'no-cache',
    'Connection': 'keep-alive',
    'Content-Length': '137',
    'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
    'Cookie': 'WMONID=zErTjIsNZ9l; PCID=15879009766040741912504; XSRF-TOKEN=2b308af8-e144-4c2f-216c-c6a9e24bad38; TD=TB_ACT_DATA%7C0%3AN%3A-1%3A0%3AN%3A-1; TP=scrnChk%7CY%23TB_DATA_CHK%7CN%3AY; _ga=GA1.3.406950129.1587900979; _gid=GA1.3.567725348.1587900979; PCID_FRV=true; RAKE_SID=15879009798878793472311; RAKE_SID_XSITE=15879009798878793472311; recopick_uid=41130255.1587901035216; plab.uid=753a5979-7ecb-491e-9bc2-12c09575be23; plab.h.11st_web=; _ascend_uid=3724501418_1587901035:1587901035331; DMP_UID=(DMPC)20a03a6b-3f3c-4e03-b9db-e1eeebe83f29; AUID=AUID_KkOd8Kfs5H-UandUvp-ohg; JSESSIONID=JXi2XhQJcEyhR3CjZDzRtp5E5rk7o1YWd2xdNh9DJP_M1x_aSW_I!-1198550641',
    'Host': 'shop.11st.co.kr',
    'Origin': 'http://shop.11st.co.kr',
    'Pragma': 'no-cache',
    'X-Requested-With': 'XMLHttpRequest',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36',
    'Referer': 'http://shop.11st.co.kr/stores/522047/category',
}
data = {
    'searchKwd': '',
    'storeId': '522047',
    'storeNo': '522047',
    'encSellerNo': '19wqwPhwf0bYTT5rhUwvVA==',
    'sortCd': 'NP',
    'filter': '',
    'pageTypeCd': '02',
    'trTypeCd': 'STP06',
}
# '': '2'



# 获取url内容
def get_urls():
    response = process.get(url)
    response.encoding = 'utf-8'
    # '<a href="http://www.11st.co.kr/product/.*?" id=".*?" data-ga-event-category'
    # for index in range(0, len(urls)):
    #    urls[index] = 'http://www.11st.co.kr/product/SellerProductDetail.tmall?method=getSellerProductDetail&prdNo=' + urls[index]
    #    print(urls[index])
    for i in range(2, 301):
        data['pageNo'] = str(i)
        response = process.post(url1, data=data, headers=headers)
        urls = re.findall('prdNo=.*?&trTypeCd=STP06', response.text)
        for index in range(0, len(urls)):
            urls[index] = 'http://www.11st.co.kr/product/SellerProductDetail.tmall?method=getSellerProductDetail&prdNo=' + urls[index]
            print(urls[index])



get_urls()

 

 

 

  • 11
    点赞
  • 29
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

sh_c_1314

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值