闲着无事逛逛招聘网站,无意看到了爬虫岗位的薪资,发现真香,今天决定爬取下来并进行分析
PS:如有需要Python学习资料的小伙伴可以加点击下方链接自行获取
首先,确定目标网站:
https://jobs.51job.com/pachongkaifa
1.开始
打开pycharm,新建文件->导入必备的库->加入常用的请求头header
-
# 导入requests包
-
import requests
-
from lxml import etree
-
# 网页链接
-
url = "https://jobs.51job.com/pachongkaifa/p1/"
-
# 请求头
-
headers = {
-
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
-
"Accept-Encoding": "gzip, deflate, br",
-
"Accept-Language": "zh-CN,zh;q=0.9",
-
"Connection": "keep-alive",
-
"Cookie": "guid=7e8a970a750a4e74ce237e74ba72856b; partner=blog_csdn_net",
-
"Host": "jobs.51job.com",
-
"Sec-Fetch-Dest": "document",
-
"Sec-Fetch-Mode": "navigate",
-
"Sec-Fetch-Site": "none",
-
"Sec-Fetch-User": "?1",
-
"Upgrade-Insecure-Requests": "1",
-
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36"
-
}
2.分析目标网站的标签,发现想要的字段(岗位、公司名称、城市、薪资)都在p标签里面,如下图
<p class="info"&