【求教】python爬取到的智联网址链接不全

最新推荐文章于 2023-05-21 19:03:46 发布

在路上_LL

最新推荐文章于 2023-05-21 19:03:46 发布

阅读量500

点赞数

分类专栏： Python学习文章标签：爬虫 python

本文链接：https://blog.csdn.net/weixin_42459561/article/details/119867051

版权

【求教】python爬取到的智联网址链接不全

提取 @href属性

HTML

部分Python代码

    html = requests.get(url,headers=headers_1)
    selector = etree.HTML(html.text)
    infos = selector.xpath('//div[@class="joblist-box__item clearfix"]')

    for info in infos:
        list = info.xpath('a/@href')
        url_lists.append(list)
        print(len(url_lists))
        job_name = info.xpath('./a/div[1]/div[1]/span[1]/span/text()')
        print(job_name)
        print('*'*50)
print(url_lists)

输出结果

[[‘http://jobs.zhaopin.com/CC318353680J40161893401.htm?refcode=4019&srccode=&preactionid=’], [‘http://jobs.zhaopin.com/CC711995980J40155443811.htm?refcode=4019&srccode=&prea