互联网招聘数据分析处理与可视化

最新推荐文章于 2024-07-06 17:03:10 发布

dfesge

最新推荐文章于 2024-07-06 17:03:10 发布

阅读量3.7k

点赞数 6

分类专栏： python科学计算文章标签： python 数据可视化数据分析

本文链接：https://blog.csdn.net/qq_44245656/article/details/112103869

版权

该博客深入分析了Python和JAVA在互联网招聘中的数据，包括岗位数量、薪资范围、行业分布、城市热度等。通过数据预处理和统计分析，揭示了工作年限、学历、公司规模与薪资之间的关联，并探讨了城市和公司规模对薪资及就业机会的影响。同时，使用Web可视化技术展示了这些关系。

摘要由CSDN通过智能技术生成

数据分析与处理

数据预处理

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

plt.rcParams['font.sans-serif']=['Simhei']#用来显示汉字
plt.rcParams['axes.unicode_minus']=False #用来显示负号

df=pd.read_excel('D:\\大学\\2020数据处理综合实训\\数据处理综合实训\jobs.xls')

数据分析

PYTHON

df1=df[df['job']=='python']
df1

	id	positionID	positionName	longitude	latitude	workYear	education	salary	city	jobNature	...	industryField	companyShortName	companySize	companyLabelList	positionAdvantage	label_2	label_3	label_4	job	district
0	1	1	python	113.264434	23.129162	1-3年	本科	10-15K	广州	全职	...	消费生活	省省回头车	150-500人	“五险一金年底双薪”	Golang	GO	NaN	NaN	python	广州
1	2	2	python	113.264434	23.129162	3-5年	本科	15-25K	广州	全职	...	消费生活	省省回头车	150-500人	“五险一金年底双薪”	Golang	GO	NaN	NaN	python	广州
2	3	3	Python开发工程师	113.264434	23.129162	1-3年	本科	15-25K	广州	全职	...	移动互联网	悦谦科技	50-150人	“双休，发展空间大，团队氛围好，扁平化管理”	金融	Python	数据挖掘	图像算法	python	广州
3	4	4	Python开发工程师	113.264434	23.129162	3-5年	本科	15-25K	广州	全职	...	移动互联网	悦谦科技	50-150人	“双休扁平化管理发展平台好团队氛围好”	移动互联网	互联网金融	Python	NaN	python	广州
4	5	5	python开发工程师	113.264434	23.129162	1-3年	大专	10-15K	广州	全职	...	移动互联网	广州游爱	500-2000人	“双休,五险一金,包三餐,年底双薪”	后端	Python	NaN	NaN	python	广州
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
38210	38211	38211	高级Python开发工程师	121.473701	31.230416	3-5年	本科	25K以上	上海	全职	...	企业服务	CloudChef	50-150人	“带薪年假,五险一金,团建旅游,补充公积金”	Python	Linux/Unix	MySQL	云计算	python	上海
38211	38212	38212	中级Python开发工程师	121.473701	31.230416	3-5年	本科	15-25K	上海	全职	...	移动互联网	NextTao 互道信息	50-150人	“技术氛围浓郁团队氛围轻松发展空间大”	新零售	企业服务	后端	分布式	python	上海
38212	38213	38213	Python高级开发工程师	121.473701	31.230416	3-5年	本科	25K以上	上海	全职	...	移动互联网	NextTao 互道信息	50-150人	“互联网零售革命的推动者”	新零售	Python	NaN	NaN	python	上海
38213	38214	38214	Python开发工程师	120.155070	30.274084	3-5年	本科	15-25K	杭州	全职	...	移动互联网	智云健康	500-2000人	“前景行业，待遇丰厚”	python爬虫	NaN	NaN	NaN	python	杭州
38214	38215	38215	Python开发工程师（兼职）	113.264434	23.129162	应届	本科	5K以下	广州	兼职	...	移动互联网	微宽信息	15-50人	“兼职”	python爬虫	NaN	NaN	NaN	python	广州

38215 rows × 23 columns

df1.loc[df1['workYear']==' 不限','workYear']='不限'
df1.loc[df1['workYear']=='不限 ','workYear']='不限'
df1.loc[df1['workYear']==' 应届','workYear']='应届'

c:\users\13530\appdata\local\programs\python\python37\lib\site-packages\pandas\core\indexing.py:1765: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  isetter(loc, value)

Python岗位按照城市数量统计

citys=df1['city'].value_counts()
citys

北京    7341
上海    6391
深圳    5519
成都    4895
广州    3889
      ... 
盐城       1
贵港       1
钦州       1
南通       1
日照       1
Name: city, Length: 70, dtype: int64

plt.figure(figsize=(15,10))
citys=citys[citys.values>300]
citys_num=len(citys)
plt.barh(range(citys_num),citys.values,alpha=0.8)
plt.yticks(range(citys_num),list(citys.index))
plt.title('Python岗位按照城市数量统计')
for x,y in enumerate(citys):
    plt.text(y + 0.2, x - 0.1, '%s' % y)
plt.show()

Python岗位按照薪水范围数量统计

Fare=df1['salary'].value_counts()
Fare

15-25K    14652
10-15K    11322
25K以上      6929
5-10K      4070
5K以下       1242
Name: salary, dtype: int64

plt.figure(figsize=(15,10))
Fare_num=len(Fare)
plt.barh(range(Fare_num),Fare.values,alpha=0.8)
plt.yticks(range(Fare_num),list(Fare.index))
plt.title('Python岗位按照薪水范围数量统计')
for x,y in enumerate(Fare):
    plt.text(y + 0.2, x - 0.1, '%s' % y)
plt.show()

Python岗位按照行业领域数量统计

industryField=df1['industryField'].value_counts()
industryField

移动互联网        15145
企业服务          3624
数据服务          2655
金融            2449
电商            2124
信息安全          1950
文娱            1522
消费生活          1161
人工智能          1087
社交             958
游戏             900
教育             650
医疗             578
其他             544
硬件             544
通讯电子           497
软件开发           388
旅游             209
电子商务           165
物流             164
体育             154
工具             126
汽车             113
大数据             87
广告营销            81
房产家居            72
不限              62
区块链             56
电商、广告营销         53
物联网             40
企业服务、软件开发       13
金融、企业服务         11
数据服务、软件开发       10
物联网、软件开发         4
电商、社交            2
金融、软件开发          2
信息安全、数据服务        2
物联网、教育           2
软件开发、人工智能        1
硬件、通讯电子          1
企业服务、数据服务        1
人工智能、其他          1
社交、软件开发          1
消费生活、电商          1
电商、企业服务          1
数据服务、教育          1
信息安全、人工智能        1
金融、电商            1
软件开发、其他          1
Name: industryField, dtype: int64

plt.figure(figsize=(15,10))
industryField=industryField[industryField.values>1000]
industryField_num=len(industryField)
plt.barh(range(industryField_num),industryField.values,alpha=0.8)
plt.yticks(range(industryField_num),list(industryField.index))
plt.title('Python岗位按照行业领域数量统计')
for x,y in enumerate(industryField):
    plt.text(y + 20, x - 0.1, '%s' % y)
plt.show()

Python岗位按照公司规模数量统计

companySize=df1['companySize'].value_counts()
companySize

 150-500人      9835
 50-150人       8433
 2000人以上       8064
 500-2000人     5926
 15-50人        4925
 少于15人         1017
 少于50人            9
 2000-5000人       3
5679              2
 1000-9999人       1
Name: companySize, dtype: int64

plt.figure(figsize=(15,10))
#companySize=companySize[companySize.values>1000]
companySize_num=len(companySize)
plt.barh(range(companySize_num),companySize.values,alpha=0.8)
plt.yticks(range(companySize_num),list(companySize.index))
plt.title('Python岗位按照公司规模数量统计')
for x,y in enumerate(companySize):
    plt.text(y + 20, x - 0.1, '%s' % y)
plt.show()

Python岗位按照公司规模数量统计

Python岗位按照学历数量统计

education=df1['education'].value_counts()
education

本科    31142
大专     3706
不限     2327
硕士     1009
博士       31
Name: education, dtype: int64

plt.figure(figsize=(15,10))
education_num=len(education)
plt.barh(range(education_num),education.values,alpha=0.8)
plt.yticks(range(education_num),list(education.index))
plt.title('Python岗位按照学历数量统计')
for x,y in enumerate(education):
    plt.text(y + 20, x - 0.1, '%s' % y)
plt.show()

在这里插入图片描述

Python岗位按照工作年限数量统计

workYear=df1['workYear'].value_counts()
workYear

3-5年     18069
1-3年     10624
5-10年     3632
不限        3618
应届        2104
1年以下       105
10年以上       63
Name: workYear, dtype: int64

plt.figure(figsize=(15,10))
workYear_num=len(workYear)
plt.barh(range(workYear_num),workYear.values,alpha=0.8)
plt.yticks(range(workYear_num),list(workYear.index))
plt.title('Python岗位按照工作年限数量统计')
for x,y in enumerate(workYear):
    plt.text(y + 20, x - 0.1, '%s' % y)
plt.show()

在这里插入图片描述

Python岗位按照岗位标签数量统计

plt.figure(figsize=(15,10))
label_1=df1['positionAdvantage'].value_counts()
label_2=df1['label_2'].value_counts()
label_3=df1['label_3'].value_counts()
label_4=df1['label_4'].value_counts()
label=label_1+label_2+label_3+label_4
label=label[label.values>500]
label=label.sort_values()
label_num=len(label)
plt.barh(range(label_num),label.values,alpha=0.8)
plt.yticks(range(label_num),list(label.index))
plt.title('Python岗位按照工作年限数量统计')
for x,y in enumerate(label):
    plt.text(y + 0.2, x - 0.1, '%s' % y)
plt.show()

在这里插入图片描述

Python工作年限与工资薪水之间的相关关系

a=df1[['workYear','salary']]
a.apply(lambda x: x.factorize()[0]).corr()

	workYear	salary
workYear	1.000000	0.299888
salary	0.299888	1.000000

sns.heatmap(pd.crosstab(a.workYear,a.salary),cmap='Blues')

<AxesSubplot:xlabel='salary', ylabel='workYear'>

在这里插入图片描述

Python学历要求与工资薪水之间的相关关系

b=df1[['education','salary']]
b.apply(lambda x: x.factorize()[0]).corr()

	education	salary
education	1.000000	0.022579
salary	0.022579	1.000000

sns.heatmap(pd.crosstab(b.education,b.salary),cmap='Blues')

<AxesSubplot:xlabel='salary', ylabel='education'>

在这里插入图片描述

Python公司规模与工资薪水之间的相关关系

c=df1[['companySize','salary']]
c.apply(lambda x: x.factorize()[0]).corr()

	companySize	salary
companySize	1.000000	0.137736
salary	0.137736	1.000000

sns.heatmap(pd.crosstab(c.companySize,c.salary),cmap='Blues')

<AxesSubplot:xlabel='salary', ylabel='companySize'>

在这里插入图片描述

Python行业领域与工资薪水之间的相关关系

e=df1[['industryField','salary']]
e.apply(lambda x: x.factorize()[0]).corr()

	industryField	salary
industryField	1.000000	0.060122
salary	0.060122	1.000000

f,ax=plt.subplots(figsize=(10,15))
sns.heatmap(pd.crosstab(e.industryField,e.salary),ax=ax,linewidths=0.01,linecolor='pink',cmap='Blues')

<AxesSubplot:xlabel='salary', ylabel='industryField'>

在这里插入图片描述

Python城市与工资薪水之间的相关关系

d=df1[['city','salary']]
d.apply(lambda x: x.factorize()[0]).corr()

	city	salary
city	1.000000	-0.028087
salary	-0.028087	1.000000

f,ax=plt.subplots(figsize=(10,15))
sns.heatmap(pd.crosstab(d.city,d.salary),ax=ax,linewidths=0.01,linecolor='pink',cmap='Blues')

<AxesSubplot:xlabel='salary', ylabel='city'>

在这里插入图片描述

公司融资轮数与工资薪水之间的相关关系

f=df1[['financeStage','salary']]
f.apply(lambda x: x.factorize()[0]).corr()

	financeStage	salary
financeStage	1.000000	0.125897
salary	0.125897	1.000000

sns.heatmap(pd.crosstab(f.financeStage,f.salary),cmap='Blues')

<AxesSubplot:xlabel='salary', ylabel='financeStage'>

最低0.47元/天解锁文章

dfesge

关注

6
点赞
踩
57

收藏

觉得还不错? 一键收藏
12
评论
互联网招聘数据分析处理与可视化

import numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport seaborn as snsplt.rcParams['font.sans-serif']=['Simhei']#用来显示汉字plt.rcParams['axes.unicode_minus']=False #用来显示负号df=pd.read_excel('D:\\大学\\2020数据处理综合实训\\数据处理综合实训\jobs.xls')P
复制链接

扫一扫

专栏目录