【转载】Python爬虫+可视化分析技术实现招聘网站岗位数据抓取与分析推荐系统

程序主要采用Python 爬虫+flask框架+html+javascript实现岗位推荐分析可视化系统,实现工作岗位的实时发现,推荐检索,快速更新以及工作类型的区域分布效果,关键词占比分析等。

程序模块实现

工作范围分布

在这里插入图片描述

岗位区域分布

在这里插入图片描述

岗位技术情况

在这里插入图片描述

岗位招聘统计

在这里插入图片描述

招聘关键词分析

在这里插入图片描述

源码地址

Python爬虫设计

本次毕设系统在Python爬虫模块设计中,主要采用51Job作为数据收集来源,利用Python Request模块实现对站点岗位数据的收集与去重,动态过滤种子URL地址,写入Mysql数据库,完成工作岗位数据的采集与分析。

爬虫程序实现

部分核心代码

 
class HubTaskWorkSpider:
    """
    51 job 网站爬虫类
    """
def <span class="token function">__init__</span><span class="token punctuation">(</span>self<span class="token punctuation">)</span><span class="token operator">:</span>
    self<span class="token punctuation">.</span>count <span class="token operator">=</span> <span class="token number">1</span>  # 记录当前爬第几条数据
    self<span class="token punctuation">.</span>company <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token punctuation">]</span>
    self<span class="token punctuation">.</span>desc_url_queue <span class="token operator">=</span> <span class="token class-name">Queue</span><span class="token punctuation">(</span><span class="token punctuation">)</span>  # 线程池队列
    self<span class="token punctuation">.</span>pool <span class="token operator">=</span> <span class="token class-name">Pool</span><span class="token punctuation">(</span>POOL_MAXSIZE<span class="token punctuation">)</span>  # 线程池管理线程<span class="token punctuation">,</span>最大协程数

def <span class="token function">work_spider</span><span class="token punctuation">(</span>self<span class="token punctuation">)</span><span class="token operator">:</span>
    <span class="token triple-quoted-string string">"""
    爬虫入口
    """</span>
    urls <span class="token operator">=</span> <span class="token punctuation">[</span>START_URL<span class="token punctuation">.</span><span class="token function">format</span><span class="token punctuation">(</span>p<span class="token punctuation">)</span> <span class="token keyword">for</span> p in <span class="token function">range</span><span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">16</span><span class="token punctuation">)</span><span class="token punctuation">]</span>
    <span class="token keyword">for</span> url in urls<span class="token operator">:</span>
        logger<span class="token punctuation">.</span><span class="token function">info</span><span class="token punctuation">(</span><span class="token string">"爬取第 {} 页"</span><span class="token punctuation">.</span><span class="token function">format</span><span class="token punctuation">(</span>urls<span class="token punctuation">.</span><span class="token function">index</span><span class="token punctuation">(</span>url<span class="token punctuation">)</span> <span class="token operator">+</span> <span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
        html <span class="token operator">=</span> requests<span class="token punctuation">.</span><span class="token function">get</span><span class="token punctuation">(</span>url<span class="token punctuation">,</span> headers<span class="token operator">=</span>HEADERS<span class="token punctuation">)</span><span class="token punctuation">.</span>content<span class="token punctuation">.</span><span class="token function">decode</span><span class="token punctuation">(</span><span class="token string">"gbk"</span><span class="token punctuation">)</span>
        bs <span class="token operator">=</span> <span class="token class-name">BeautifulSoup</span><span class="token punctuation">(</span>html<span class="token punctuation">,</span> <span class="token string">"lxml"</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">find</span><span class="token punctuation">(</span><span class="token string">"div"</span><span class="token punctuation">,</span> class_<span class="token operator">=</span><span class="token string">"dw_table"</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">find_all</span><span class="token punctuation">(</span>
            <span class="token string">"div"</span><span class="token punctuation">,</span> class_<span class="token operator">=</span><span class="token string">"el"</span>
        <span class="token punctuation">)</span>
        <span class="token keyword">for</span> b in bs<span class="token operator">:</span>
            <span class="token keyword">try</span><span class="token operator">:</span>
                href<span class="token punctuation">,</span> post <span class="token operator">=</span> b<span class="token punctuation">.</span><span class="token function">find</span><span class="token punctuation">(</span><span class="token string">"a"</span><span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token string">"href"</span><span class="token punctuation">]</span><span class="token punctuation">,</span> b<span class="token punctuation">.</span><span class="token function">find</span><span class="token punctuation">(</span><span class="token string">"a"</span><span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token string">"title"</span><span class="token punctuation">]</span>
                locate <span class="token operator">=</span> b<span class="token punctuation">.</span><span class="token function">find</span><span class="token punctuation">(</span><span class="token string">"span"</span><span class="token punctuation">,</span> class_<span class="token operator">=</span><span class="token string">"t3"</span><span class="token punctuation">)</span><span class="token punctuation">.</span>text
                salary <span class="token operator">=</span> b<span class="token punctuation">.</span><span class="token function">find</span><span class="token punctuation">(</span><span class="token string">"span"</span><span class="token punctuation">,</span> class_<span class="token operator">=</span><span class="token string">"t4"</span><span class="token punctuation">)</span><span class="token punctuation">.</span>text
                item <span class="token operator">=</span> <span class="token punctuation">{<!-- --></span>
                    <span class="token string">"href"</span><span class="token operator">:</span> href<span class="token punctuation">,</span> <span class="token string">"post"</span><span class="token operator">:</span> post<span class="token punctuation">,</span> <span class="token string">"locate"</span><span class="token operator">:</span> locate<span class="token punctuation">,</span> <span class="token string">"salary"</span><span class="token operator">:</span> salary
                <span class="token punctuation">}</span>
                self<span class="token punctuation">.</span>desc_url_queue<span class="token punctuation">.</span><span class="token function">put</span><span class="token punctuation">(</span>href<span class="token punctuation">)</span>  # 岗位详情链接加入队列
                self<span class="token punctuation">.</span>company<span class="token punctuation">.</span><span class="token function">append</span><span class="token punctuation">(</span>item<span class="token punctuation">)</span>
            except <span class="token class-name">Exception</span><span class="token operator">:</span>
                pass
    # 打印队列长度<span class="token punctuation">,</span>即多少条岗位详情 url
    logger<span class="token punctuation">.</span><span class="token function">info</span><span class="token punctuation">(</span><span class="token string">"队列长度为 {} "</span><span class="token punctuation">.</span><span class="token function">format</span><span class="token punctuation">(</span>self<span class="token punctuation">.</span>desc_url_queue<span class="token punctuation">.</span><span class="token function">qsize</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">)</span>

@staticmethod
def insert_into_db():
“”"
插入数据到数据库

    create table jobpost(
        j_salary float(3, 1),
        j_locate text,
        j_post text
    );
    """</span>
    conn <span class="token operator">=</span> pymysql<span class="token punctuation">.</span><span class="token function">connect</span><span class="token punctuation">(</span>
        host<span class="token operator">=</span><span class="token string">"****"</span><span class="token punctuation">,</span>
        port<span class="token operator">=</span><span class="token operator">*</span><span class="token operator">*</span><span class="token operator">*</span><span class="token operator">*</span><span class="token punctuation">,</span>
        user<span class="token operator">=</span><span class="token string">"root"</span><span class="token punctuation">,</span>
        paswd<span class="token operator">=</span><span class="token string">"****"</span><span class="token punctuation">,</span>
        db<span class="token operator">=</span><span class="token string">"AAAA"</span><span class="token punctuation">,</span>
        charset<span class="token operator">=</span><span class="token string">"utf8"</span><span class="token punctuation">,</span>
    <span class="token punctuation">)</span>
    cur <span class="token operator">=</span> conn<span class="token punctuation">.</span><span class="token function">cursor</span><span class="token punctuation">(</span><span class="token punctuation">)</span>
    <span class="token keyword">with</span> <span class="token keyword">open</span><span class="token punctuation">(</span>os<span class="token punctuation">.</span>path<span class="token punctuation">.</span><span class="token function">join</span><span class="token punctuation">(</span><span class="token string">"data"</span><span class="token punctuation">,</span> <span class="token string">"post_salary.csv"</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token string">"r"</span><span class="token punctuation">,</span> encoding<span class="token operator">=</span><span class="token string">"utf-8"</span><span class="token punctuation">)</span> as f<span class="token operator">:</span>
        f_csv <span class="token operator">=</span> csv<span class="token punctuation">.</span><span class="token function">reader</span><span class="token punctuation">(</span>f<span class="token punctuation">)</span>
        sql <span class="token operator">=</span> <span class="token string">"insert into jobpost(j_salary, j_locate, j_post) values(%s, %s, %s)"</span>
        <span class="token keyword">for</span> row in f_csv<span class="token operator">:</span>
            value <span class="token operator">=</span> <span class="token punctuation">(</span>row<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">,</span> row<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">,</span> row<span class="token punctuation">[</span><span class="token number">2</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
            <span class="token keyword">try</span><span class="token operator">:</span>
                cur<span class="token punctuation">.</span><span class="token function">execute</span><span class="token punctuation">(</span>sql<span class="token punctuation">,</span> value<span class="token punctuation">)</span>
                conn<span class="token punctuation">.</span><span class="token function">commit</span><span class="token punctuation">(</span><span class="token punctuation">)</span>
            except <span class="token class-name">Exception</span> as e<span class="token operator">:</span>
                logger<span class="token punctuation">.</span><span class="token function">error</span><span class="token punctuation">(</span>e<span class="token punctuation">)</span>
    cur<span class="token punctuation">.</span><span class="token function">close</span><span class="token punctuation">(</span><span class="token punctuation">)</span>

def run(self):
“”"
多线程爬取数据
“”"

self.job_spider()
self.execute_more_tasks(self.post_require)
self.desc_url_queue.join() # 主线程阻塞,等待队列清空

def <span class="token function">execute_more_tasks</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> target<span class="token punctuation">)</span><span class="token operator">:</span>
    <span class="token triple-quoted-string string">"""
    协程池接收请求任务,可以扩展把解析,存储耗时操作加入各自队列,效率最大化

    :param target: 任务函数
    :param count: 启动线程数量
    """</span>
    <span class="token keyword">for</span> i in <span class="token function">range</span><span class="token punctuation">(</span>POOL_MAXSIZE<span class="token punctuation">)</span><span class="token operator">:</span>
        self<span class="token punctuation">.</span>pool<span class="token punctuation">.</span><span class="token function">apply_async</span><span class="token punctuation">(</span>target<span class="token punctuation">)</span>

if name == main:
spider = JobSpider()

start <span class="token operator">=</span> time<span class="token punctuation">.</span><span class="token function">time</span><span class="token punctuation">(</span><span class="token punctuation">)</span>
spider<span class="token punctuation">.</span><span class="token function">run</span><span class="token punctuation">(</span><span class="token punctuation">)</span>
logger<span class="token punctuation">.</span><span class="token function">info</span><span class="token punctuation">(</span><span class="token string">"总耗时 {} 秒"</span><span class="token punctuation">.</span><span class="token function">format</span><span class="token punctuation">(</span>time<span class="token punctuation">.</span><span class="token function">time</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">-</span> start<span class="token punctuation">)</span><span class="token punctuation">)</span>

本系统整体难度较低,主要包括三个步骤:收集招聘岗位数据,整理数据分析统计维度,结合echarts图表实现动态展示及推荐等。本系统采用Python语言开发,所用开发工具有pycharm 2022、visual studio code、在线uml制作工具process on、Mysql5.7、 插件包含Resharper、SQL Prompt等。
源码地址

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值