python怎么爬简历手把手教你利用Python爬取简历模板

zhouhongen

已于 2024-02-16 14:56:31 修改

阅读量116

点赞数

文章标签： python 开发语言爬虫

于 2023-09-19 13:23:27 首次发布

本文链接：https://blog.csdn.net/zhouhongen/article/details/133024572

版权

import requests
from lxml import etree
headers={
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36 Edg/117.0.2045.9'
}
p = input("请数入要爬取的页数 页数要>=2\n")
url='https://sc.chinaz.com/tag_jianli/hulianwang_'+p+'.html'
text = requests.get(url=url, headers=headers).text
g = etree.HTML(text).xpath('//div[@class="sc_warp  mt20"]/div/div/div')
for s in g:
     # b=s.xpath('./a/img/@src')[0]
    # c=s.xpath('./a/img/@alt')[0]    [0]是去掉列表中的[]取里面的数据源。
    img_alt_ = s.xpath('./a/img/@alt')[0].encode('iso-8859-1').decode('utf-8') + '.zip'#防爬虫乱码的代码。
    d=s.xpath('./a/@href')[0]   #返回的是一个列表就可以得到。
    name='https://sc.chinaz.com/'+d
    Name='简历/'+img_alt_   #持久化存储的路径。
    y = requests.get(url=name, headers=headers).text
    z = etree.HTML(y).xpath('//div[@class="down_wrap"]/div[2]/ul/li')
    for t in z:
     u = t.xpath('./a/@href')[0]
    b= requests.get(url=u, headers=headers).content
    with open(Name, 'wb')as fp:
         fp.write(b)
         print(Name + "下载完成")

爬取的数据格式为zip格式

以上是爬取的代码。