看一个部门的离职率的数据,简单分析一下,部门离职率与所做的工作的相关性:
因为数据太多,只能给出少量数据了,如果有想要数据的可以留言赠送
给定如下数据:
HR.csv:
satisfaction_level,last_evaluation,number_project,average_monthly_hours,time_spend_company,Work_accident,left,promotion_last_5years,department,salary 0.38,0.53,2,157,3,0,1,0,sales,low 0.8,0.86,5,262,6,0,1,0,sales,medium 0.11,0.88,7,272,4,0,1,0,sales,medium 0.72,0.87,5,223,5,0,1,0,sales,low 0.37,0.52,2,159,3,0,1,0,sales,low 0.41,0.5,2,153,3,0,1,0,sales,low 0.1,0.77,6,247,4,0,1,0,sales,low 0.92,0.85,5,259,5,0,1,0,sales,low 0.89,1,5,224,5,0,1,0,sales,low 0.42,0.53,2,142,3,0,1,0,sales,low 0.45,0.54,2,135,3,0,1,0,sales,low 0.11,0.81,6,305,4,0,1,0,sales,low 0.84,0.92,4,234,5,0,1,0,sales,low 0.41,0.55,2,148,3,0,1,0,sales,low 0.36,0.56,2,137,3,0,1,0,sales,low 0.38,0.54,2,143,3,0,1,0,sales,low 0.45,0.47,2,160,3,0,1,0,sales,low 0.78,0.99,4,255,6,0,1,0,sales,low 0.45,0.51,2,160,3,1,1,1,sales,low 0.76,0.89,5,262,5,0,1,0,sales,low 0.11,0.83,6,282,4,0,1,0,sales,low 0.38,0.55,2,147,3,0,1,0,sales,low 0.09,0.95,6,304,4,0,1,0,sales,low 0.46,0.57,2,139,3,0,1,0,sales,low 0.4,0.53,2,158,3,0,1,0,sales,low 0.89,0.92,5,242,5,0,1,0,sales,low 0.82,0.87,4,239,5,0,1,0,sales,low 0.4,0.49,2,135,3,0,1,0,sales,low 0.41,0.46,2,128,3,0,1,0,accounting,low 0.38,0.5,2,132,3,0,1,0,accounting,low 0.09,0.62,6,294,4,0,1,0,accounting,low 0.45,0.57,2,134,3,0,1,0,hr,low 0.4,0.51,2,145,3,0,1,0,hr,low 0.45,0.55,2,140,3,0,1,0,hr,low 0.84,0.87,4,246,6,0,1,0,hr,low 0.1,0.94,6,255,4,0,1,0,technical,low 0.38,0.46,2,137,3,0,1,0,technical,low 0.45,0.5,2,126,3,0,1,0,technical,low 0.11,0.89,6,306,4,0,1,0,technical,low 0.41,0.54,2,152,3,0,1,0,technical,low 0.87,0.88,5,269,5,0,1,0,technical,low 0.45,0.48,2,158,3,0,1,0,technical,low 0.4,0.46,2,127,3,0,1,0,technical,low 0.1,0.8,7,281,4,0,1,0,technical,low 0.09,0.89,6,276,4,0,1,0,technical,low 0.84,0.74,3,182,4,0,1,0,technical,low 0.4,0.55,2,147,3,0,1,0,support,low 0.57,0.7,3,273,6,0,1,0,support,low 0.4,0.54,2,148,3,0,1,0,support,low 0.43,0.47,2,147,3,0,1,0,support,low 0.13,0.78,6,152,2,0,1,0,support,low 0.44,0.55,2,135,3,0,1,0,support,low 0.38,0.55,2,134,3,0,1,0,support,low 0.39,0.54,2,132,3,0,1,0,support,low 0.1,0.92,7,307,4,0,1,0,support,low 0.37,0.46,2,140,3,0,1,0,support,low 0.11,0.94,7,255,4,0,1,0,support,low 0.1,0.81,6,309,4,0,1,0,technical,low 0.38,0.54,2,128,3,0,1,0,technical,low 0.85,1,4,225,5,0,1,0,technical,low 0.85,0.91,5,226,5,0,1,0,management,medium 0.11,0.93,7,308,4,0,1,0,IT,medium 0.1,0.95,6,244,5,0,1,0,IT,medium 0.36,0.56,2,132,3,0,1,0,IT,medium 0.11,0.94,6,286,4,0,1,0,IT,medium 0.81,0.7,6,161,4,0,1,0,IT,medium 0.43,0.54,2,153,3,0,1,0,product_mng,medium 0.9,0.98,4,264,6,0,1,0,product_mng,medium 0.76,0.86,5,223,5,1,1,0,product_mng,medium 0.43,0.5,2,135,3,0,1,0,product_mng,medium 0.74,0.99,2,277,3,0,1,0,IT,medium 0.09,0.77,5,275,4,0,1,0,product_mng,medium 0.45,0.49,2,149,3,0,1,0,product_mng,high 0.09,0.87,7,295,4,0,1,0,product_mng,low 0.11,0.97,6,277,4,0,1,0,product_mng,medium 0.11,0.79,7,306,4,0,1,0,product_mng,medium 0.1,0.83,6,295,4,0,1,0,product_mng,medium 0.4,0.54,2,137,3,0,1,0,marketing,medium 0.43,0.56,2,157,3,0,1,0,sales,low 0.39,0.56,2,142,3,0,1,0,accounting,low 0.45,0.54,2,140,3,0,1,0,support,low 0.38,0.49,2,151,3,0,1,0,technical,low 0.79,0.59,4,139,3,0,1,1,management,low 0.84,0.85,4,249,6,0,1,0,marketing,low 0.11,0.77,6,291,4,0,1,0,marketing,low 0.11,0.87,6,305,4,0,1,0,marketing,low 0.17,0.84,5,232,3,0,1,0,sales,low 0.44,0.45,2,132,3,0,1,0,sales,low 0.37,0.57,2,130,3,0,1,0,sales,low
简单 的代码实现过程:
import matplotlib.pyplot as plt import seaborn as sns import pandas as pd import numpy as np import scipy.stats as ss from pandas import DataFrame,Series #数据的路径 df = pd.read_csv("./skip_Data/HR.csv") #我想要看各个部门的离职分布 #通过 indices 拿到分组后的索引,通过部门进行分组 df_index = df.groupby(by="department").indices #拿到销售部门的离职值 sales_values = df['left'].iloc[df_index['sales']].values technical_values = df['left'].iloc[df_index['technical']].values #在python3中所有的keys加上一个list,然后才能转换为数组 dp_keys = list(df_index.keys()) #初始化一个矩阵 df_t_mat = np.zeros([len(dp_keys),len(dp_keys)]) for i in range(len(dp_keys)): for j in range(len(dp_keys)): p_value = ss.ttest_ind(df['left'].iloc[df_index[dp_keys[i]]].values,\ df['left'].iloc[df_index[dp_keys[j]]].values)[1] #对矩阵进行赋值 if p_value<0.05: df_t_mat[i][j] = -1 else : df_t_mat[i][j] = p_value sns.heatmap(df_t_mat,xticklabels=dp_keys,yticklabels=dp_keys) plt.savefig("1.jpg") plt.show()