requests_html中文文档,requestshtml找不到页面元素

最新推荐文章于 2023-01-14 12:34:00 发布

爱生活的马克君

最新推荐文章于 2023-01-14 12:34:00 发布

阅读量268

点赞数

文章标签： requests_html中文文档

在使用requests模块和BeautifulSoup登录之后，可以使用我在注释中建议的链接来解析json中可用的所需数据。下面的脚本应该得到您的名称，数量，价格和相关产品的链接。你只能得到21个产品使用下面的脚本。在这个json内容中有一个分页选项。您可以通过使用分页来获得所有产品。在import json

import requests

from bs4 import BeautifulSoup

baseurl = 'https://www.instacart.com/store/'

data_url = "https://www.instacart.com/v3/retailers/159/module_data/dynamic_item_lists/cart_starters/storefront_canonical?origin_source_type=store_root_department&tracking.page_view_id=b974d56d-eaa4-4ce2-9474-ada4723fc7dc&source=web&cache_key=df535d-6863-f-1cd&per=30"

data = {"user": {"email": "alexanderjbusch@gmail.com", "password": "password"},

"authenticity_token": ""}

headers = {

'user-agent':'Mozilla/5.0',

'x-requested-with': 'XMLHttpRequest'

}

with requests.Session() as s:

res = s.get('https://www.instacart.com/',headers={'user-agent':'Mozilla/5.0'})

soup = BeautifulSoup(res.text, 'lxml')

token = soup.select_one("[name='csrf-token']").get('content')

data["authenticity_token"] = token

s.post("https://www.instacart.com/accounts/login",json=data,headers=headers)

resp = s.get(data_url, headers=headers)

for item in resp.json()['module_data']['items']:

name = item['name']

quantity = item['size']

price = item['pricing']['price']

product_page = baseurl + item['click_action']['data']['container']['path']

print(f'{name}\n{quantity}\n{price}\n{product_page}\n')

部分输出：

^{pr2}$

爱生活的马克君

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
requests_html中文文档,requestshtml找不到页面元素

在使用requests模块和BeautifulSoup登录之后，可以使用我在注释中建议的链接来解析json中可用的所需数据。下面的脚本应该得到您的名称，数量，价格和相关产品的链接。你只能得到21个产品使用下面的脚本。在这个json内容中有一个分页选项。您可以通过使用分页来获得所有产品。在import jsonimport requestsfrom bs4 import BeautifulSoupb...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。