requestes爬虫中xpath的使用方法

requests爬虫正常情况下是使用beautifulsoup或者正则表达式进行解析,但除此之外也可以用xpath进行解析,具体使用方法为如下:
1.倒包路径:

from lxml import etree

2.解析的方法:
2.1创建etree对象

tree=etree.HTML('response')

2.2对etree对象进行解析

work_list=tree.xpath('//*[@id="main"]/div/div[2]/ul')

示例应用如下:

import requests
from lxml import etree

# 获取响应
url='https://www.zhipin.com/c100010000/?query=python&page=2&ka=page-2'
headers={
    # 'cookie':'acw_tc=0bdd34f616432872078207682e019f0d9aba3c40cef3a9ae8631ef43f89386; sid=sem_pz_bdpc_dasou_title; __g=sem_pz_bdpc_dasou_title; __l=l=%2Fwww.zhipin.com%2F%3Fsid%3Dsem_pz_bdpc_dasou_title&r=https%3A%2F%2Fwww.baidu.com%2Fother.php%3Fsc.K60000aOlgZHDaBZv8QHqwYKQVi9d0wd5nv7u3Bp4Rf4AsBH5WQ_FNT3bZ-qnlXfjPYNcr3J1hfh3pn4oUIeq8Sa67_kWNLhMbnNqQux-URLfx1lrdC3m52cmUMzfFy_EEsx698eOkCguASCCb4XkpKSepqra6Gq8tgsbYY_Xqxk6knHXwQEJVibpdk9ClwHlAMFYfMkbgOV-XITJo9GZLWO8EFH.7D_NR2Ar5Od663rj6t8AGSPticrtXFBPrM-kt5QxIW94UhmLmry6S9wiGyAp7BEIu80.TLFWgv-b5HDkrfK1ThPGujYknHb0THY0IAYqmhq1TqpkkoB4vTL30ZN1ugFxIZ-suHYs0A7bgLw4TARqnsKLULFb5yFETL5y_Tp38IMPS0KzmLmqnfKdThkxpyfqnHR1njmsrH63r0KVINqGujYkPjfLP1DYr0KVgv-b5HDknHRYnWf40AdYTAkxpyfqnHczP1n0TZuxpyfqn0KGuAnqiD4a0ZKGujYY0APGujY4nfKWThnqPW6kP0%26ck%3D842.0.28.335.165.NaN.NaN.0%26dt%3D1643287197%26wd%3Dboss%25E7%259B%25B4%25E8%2581%2598%25E5%25AE%2598%25E7%25BD%2591%26tpl%3Dtpl_12826_26568_0%26l%3D1530609888%26us%3DlinkType%253D&g=%2Fwww.zhipin.com%2F%3Fsid%3Dsem_pz_bdpc_dasou_title&s=3&friend_source=0; Hm_lvt_194df3105ad7148dcf2b98a91b5e727a=1643287225; lastCity=100010000; toUrl=https%3A%2F%2Fwww.zhipin.com%2F; __c=1643287225; __a=66541452.1643287225..1643287225.6.1.6.6; Hm_lpvt_194df3105ad7148dcf2b98a91b5e727a=1643287472; __zp_stoken__=dc95daWsReXo3fGB6GCxMAFh2XiNjXAdjNFg4aVs6bVd5VAUmOy93BX4fZE1WM0xNJmRuemY6YBY3dC8NDTtKNRQfOAwsQgARVhoDQjglUnhSAyo6Q3VpOEguB1kLXBEGZF1cDjU8EHR1RyE%3D',
    # 'cookie': 'acw_tc=0bdd34f616432872078207682e019f0d9aba3c40cef3a9ae8631ef43f89386; sid=sem_pz_bdpc_dasou_title; __g=sem_pz_bdpc_dasou_title; __l=l=%2Fwww.zhipin.com%2F%3Fsid%3Dsem_pz_bdpc_dasou_title&r=https%3A%2F%2Fwww.baidu.com%2Fother.php%3Fsc.K60000aOlgZHDaBZv8QHqwYKQVi9d0wd5nv7u3Bp4Rf4AsBH5WQ_FNT3bZ-qnlXfjPYNcr3J1hfh3pn4oUIeq8Sa67_kWNLhMbnNqQux-URLfx1lrdC3m52cmUMzfFy_EEsx698eOkCguASCCb4XkpKSepqra6Gq8tgsbYY_Xqxk6knHXwQEJVibpdk9ClwHlAMFYfMkbgOV-XITJo9GZLWO8EFH.7D_NR2Ar5Od663rj6t8AGSPticrtXFBPrM-kt5QxIW94UhmLmry6S9wiGyAp7BEIu80.TLFWgv-b5HDkrfK1ThPGujYknHb0THY0IAYqmhq1TqpkkoB4vTL30ZN1ugFxIZ-suHYs0A7bgLw4TARqnsKLULFb5yFETL5y_Tp38IMPS0KzmLmqnfKdThkxpyfqnHR1njmsrH63r0KVINqGujYkPjfLP1DYr0KVgv-b5HDknHRYnWf40AdYTAkxpyfqnHczP1n0TZuxpyfqn0KGuAnqiD4a0ZKGujYY0APGujY4nfKWThnqPW6kP0%26ck%3D842.0.28.335.165.NaN.NaN.0%26dt%3D1643287197%26wd%3Dboss%25E7%259B%25B4%25E8%2581%2598%25E5%25AE%2598%25E7%25BD%2591%26tpl%3Dtpl_12826_26568_0%26l%3D1530609888%26us%3DlinkType%253D&g=%2Fwww.zhipin.com%2F%3Fsid%3Dsem_pz_bdpc_dasou_title&s=3&friend_source=0; Hm_lvt_194df3105ad7148dcf2b98a91b5e727a=1643287225; lastCity=100010000; toUrl=https%3A%2F%2Fwww.zhipin.com%2F; __c=1643287225; __a=66541452.1643287225..1643287225.8.1.8.8; Hm_lpvt_194df3105ad7148dcf2b98a91b5e727a=1643288595; __zp_stoken__=dc95daWsReXo3fF87Sl9kAFh2XiNEOH9YS1g4aVs6bWRbMHsRby93BX4fNCEJbzJNJmRuemY6YAw9BiUNACU6a1dpdBxXQxUcBScIQjglUnhSAypEHyoFaEguB1kLCBEGZF1cDjU8EHR1RyE%3D',
    'Cookie': 'sid=sem_pz_bdpc_dasou_title; __g=sem_pz_bdpc_dasou_title; __l=l=%2Fwww.zhipin.com%2F%3Fsid%3Dsem_pz_bdpc_dasou_title&r=https%3A%2F%2Fwww.baidu.com%2Fother.php%3Fsc.K60000aOlgZHDaBZv8QHqwYKQVi9d0wd5nv7u3Bp4Rf4AsBH5WQ_FNT3bZ-qnlXfjPYNcr3J1hfh3pn4oUIeq8Sa67_kWNLhMbnNqQux-URLfx1lrdC3m52cmUMzfFy_EEsx698eOkCguASCCb4XkpKSepqra6Gq8tgsbYY_Xqxk6knHXwQEJVibpdk9ClwHlAMFYfMkbgOV-XITJo9GZLWO8EFH.7D_NR2Ar5Od663rj6t8AGSPticrtXFBPrM-kt5QxIW94UhmLmry6S9wiGyAp7BEIu80.TLFWgv-b5HDkrfK1ThPGujYknHb0THY0IAYqmhq1TqpkkoB4vTL30ZN1ugFxIZ-suHYs0A7bgLw4TARqnsKLULFb5yFETL5y_Tp38IMPS0KzmLmqnfKdThkxpyfqnHR1njmsrH63r0KVINqGujYkPjfLP1DYr0KVgv-b5HDknHRYnWf40AdYTAkxpyfqnHczP1n0TZuxpyfqn0KGuAnqiD4a0ZKGujYY0APGujY4nfKWThnqPW6kP0%26ck%3D842.0.28.335.165.NaN.NaN.0%26dt%3D1643287197%26wd%3Dboss%25E7%259B%25B4%25E8%2581%2598%25E5%25AE%2598%25E7%25BD%2591%26tpl%3Dtpl_12826_26568_0%26l%3D1530609888%26us%3DlinkType%253D&g=%2Fwww.zhipin.com%2F%3Fsid%3Dsem_pz_bdpc_dasou_title&s=3&friend_source=0; Hm_lvt_194df3105ad7148dcf2b98a91b5e727a=1643287225; lastCity=100010000; toUrl=https%3A%2F%2Fwww.zhipin.com%2F; Hm_lpvt_194df3105ad7148dcf2b98a91b5e727a=1643288621; acw_tc=0a099d6e16432893325356865e01a2af12c51f69939635b2f9e20e41108fcb; __zp_sseed__=F/QGiAAzcyE+Ne3Cbi5cdjiUalxhVT1YUgrsdYguALg=; __zp_sname__=67d705e6; __zp_sts__=1643289332544; wd_guid=86b15c28-7f21-423f-9ba8-b70906a950ce; historyState=state; __c=1643287225; __a=66541452.1643287225..1643287225.10.1.10.10; __fid=4a82335ddeb8371b57f51fa9754e5bb9; __zp_stoken__=dc95daWsReXo3fEpGVkVpAFh2XiM2DXNcfxc4aVs6bVpIUmklZS93BX4fekxLMS9NJmRuemY6YAhrVV0NBUcoVnYBdkFaQDQhEBoMQjglUnhSAypZQWhoJkguB1kLAhEGZF1cDjU8EHR1RyE%3D',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36'
}
response=requests.get(url=url,headers=headers)
response.encoding='UTF-8'
# 解析响应
tree=etree.HTML(response.text)
work_list=tree.xpath('.//*[@id="main"]/div/div[2]/ul/li[1]/div/div[1]/div[1]/div/div[1]/span[1]/a/text()')
for list in work_list:
    print(list_all)

注意:cookie要根据自己的访问cookie设置

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值