没有网络怎么学网络爬虫之爬取智联招聘网python就业招聘信息存入Excel表格

最新推荐文章于 2024-05-01 22:08:22 发布

阿优乐扬

最新推荐文章于 2024-05-01 22:08:22 发布

阅读量866

点赞数

分类专栏： Python实战文章标签：没有网络爬虫学习智联招聘网 BeautifulSoup实战爬取本地网页

本文链接：https://blog.csdn.net/ayouleyang/article/details/96642789

版权

Python实战专栏收录该内容

22 篇文章 4 订阅

订阅专栏

没有网络可以练习网络爬虫？当然可以啦，但是必须先找个有网络的地方，打开你要爬取的网页，找的你要获取的内容，我将要在智联招聘网上获取招聘python的相关信息，如（工作名称、公司名称、薪资待遇、地址、经验、学历、公司性质、招聘人数、公司福利等）

1、爬虫前步骤

（1）找个有网的地方，打开需要爬取网页。
（2）找到需要获取的内容。
在这里插入图片描述
（3）保存源码到本地文件，我们没有必要全部保存，最好选取需要的部分进行保存，智联招聘网python有两页，我把它一起保存在G:/20190720_zhilianzhaopin.html中，可以直接复制粘贴需要部分。

（4）新建记事本，ctrl+v粘贴刚复制的内容

<html>
<head>
<title>智联招聘网python信息</title>
</head>
<body>
这里是复制进来的内容，可以多个页面和为一个html

</body>
</html>

(5)现在直接访问本地文件了，想去哪里玩爬虫都可以了，无需网络！！！
在这里插入图片描述

2、爬取网页代码

#!/usr/bin/env python2
# -*- coding: utf-8 -*-
import urllib2
from bs4 import BeautifulSoup
import xlwt

url = 'file:///G:/20190720_zhilianzhaopin.html'#本地网页路径
html = urllib2.urlopen(url).read()
soup = BeautifulSoup(html,"html.parser")
all_page=[]

#爬虫函数
for tag in soup.find_all(attrs = {"class":"contentpile__content__wrapper__item clearfix"}):
    print u'工作名称：',tag.span.get('title')
    gzmc = tag.span.get('title')
    
    for d in tag.find_all(attrs = {"class":"contentpile__content__wrapper__item__info__box__cname__viplevel is_vipLevel"}):
        print u'公司名称：',d.get('alt')
        gsmc = d.get('alt')
        
    for p in tag.find_all(attrs = {"class":"contentpile__content__wrapper__item__info__box__job__saray"}):
        print u'薪资待遇：',p.get_text()
        xzdy = p.get_text()

    #公司要求
    for ul in tag.find_all(attrs = {"class":"contentpile__content__wrapper__item__info__box__job__demand"}):
        print u'地址：',ul.find_all('li')[0].get_text()
        print u'经验：',ul.find_all('li')[1].get_text().replace("\n","").replace(" ","")
        print u'学历：',ul.find_all('li')[-1].get_text()
        dz = ul.find_all('li')[0].get_text()
        jl = ul.find_all('li')[1].get_text().replace("\n","").replace(" ","")
        xl = ul.find_all('li')[-1].get_text()

    for comdec in tag.find_all(attrs = {"class":"contentpile__content__wrapper__item__info__box__job__comdec"}):
        print u'公司性质：',comdec.find_all('span')[0].get_text()
        print u'招聘人数：',comdec.find_all('span')[-1].get_text()
        gsxz = comdec.find_all('span')[0].get_text()
        zprs = comdec.find_all('span')[-1].get_text()
        
    for welfare in tag.find_all(attrs = {"class":"contentpile__content__wrapper__item__info__box__welfare job_welfare"}):
        print u'公司福利：',welfare.get_text()
        gsfl = welfare.get_text()
        print " "
        
    page = [gzmc,gsmc,xzdy,dz,jl,xl,gsxz,zprs,gsfl]
    all_page.append(page)

    book = xlwt.Workbook(encoding='utf-8')
    sheet = book.add_sheet('python就业情况表')
    head = ['工作名称','公司名称','薪资待遇','地址','经验','学历','公司性质','招聘人数','公司福利']
    for h in range(len(head)):
        sheet.write(0,h,head[h])

    j = 1
    for list in all_page:
        k = 0
        for data in list:
            sheet.write(j,k,data)
            k = k+1
        j = j+1
    book.save('D:/Python/智联招聘python就业公司情况.xls')

运行结果：
在这里插入图片描述
Excel结果：

阿优乐扬

关注

0
点赞
踩
5

收藏

觉得还不错? 一键收藏
2
评论
没有网络怎么学网络爬虫之爬取智联招聘网python就业招聘信息存入Excel表格

没有网络可以练习网络爬虫？当然可以啦，但是必须先找个有网络的地方，打开你要爬取的网页，找的你要获取的内容，我将要在智联招聘网上获取招聘python的相关信息，如（工作名称、公司名称、薪资待遇、地址、经验、学历、公司性质、招聘人数、公司福利等）1、爬虫前步骤（1）找个有网的地方，打开需要爬取网页。（2）找到需要获取的内容。（3）保存源码到本地文件，我们没有必要全部保存，最好选取需要的部分进...
复制链接

扫一扫