员工离职原因分析_分析员工离职调查-CSDN博客

本文通过对员工离职调查的深入分析，探讨了导致员工离职的关键因素。通过数据科学的方法，揭示了影响员工满意度和离职率的潜在模式。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

员工离职原因分析

When analyzing employee sentiment data, which in our case is an employee exit survey, we have to look at four topics.

在分析员工情感数据(在我们的案例中是员工离职调查)时，我们必须关注四个主题。

Statistical rigor of the survey
调查的统计严谨性
Demographical composition of survey respondents
受访者的人口构成
Overall sentiment for defined latent constructs
定义的潜在构造的总体情绪
Sentiment scores by respondents’ characteristics (ie. gender, location, department, etc.)
根据受访者的特征(即性别，位置，部门等)进行情感评分

First, keeping to this methodology will enable us to determine how well our survey is measuring what it is meant to measure. Secondly, by understanding who answered the survey from a respondent characteristics perspective (ie. gender, departments, etc) we can provide context to our analysis and results. Thirdly, this methodology will help us determine the general sentiment of the responders. Last but not least, it will help us determine not only what organization initiatives might be useful to increase sentiment but also where these initiatives should be implemented.

首先，坚持这种方法将使我们能够确定调查在衡量其测量意图方面的程度。其次，通过从受访者特征角度(即性别，部门等)了解谁回答了调查，我们可以为我们的分析和结果提供背景。第三，这种方法将帮助我们确定响应者的总体情绪。最后但并非最不重要的一点是，它不仅可以帮助我们确定哪些组织举措可能有助于增加人们的情绪，还可以确定应该在哪里实施这些举措。

数据集 (Dataset)

The dataset we’ll be using is a fictional employee exit survey which asks the employee a series of questions regarding their organizational demographics (ie. department) and 5-point Likert (ie. Strongly Disagree, Disagree, Neutral, Agree, Strongly Agree) sentiment questions (ie. the organization offered plenty of promotional opportunities). No open-ended questions were utilized.

我们将使用的数据集是一个虚构的员工离职调查，该调查询问员工有关他们的组织人口统计特征(即部门)和五点李克特的一系列问题(即，强烈不同意，不同意，中立，同意，强烈同意)情绪问题(即组织提供了大量促销机会)。没有使用开放式问题。

数据处理 (Data Processing)

import pandas as pd
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
import seaborn as snsimport warnings
warnings.filterwarnings('ignore')
pd.set_option('display.max_columns', None)
%matplotlib inlinewith open('exit_data_final.csv') as f:
    df = pd.read_csv(f)
f.close()df.info()

We have 33 items or questions which were asked of the employees. Before we can begin our analysis we have a bit of data cleaning to perform.

我们有33个询问员工的项目或问题。在开始分析之前，我们需要执行一些数据清理。

df.drop('Unnamed: 0', axis=1, inplace=True)

Let’s drop this odd “Unnamed” column as it services no purpose.

让我们删除这个奇怪的“未命名”列，因为它毫无用处。

for var in df.columns:
    print(var, df[var].unique())

By examining the unique values for each item we can see a few issues.

通过检查每个项目的唯一值，我们可以看到一些问题。

Some items have missing values labeled correctly as np.nan but others are simply null.
有些项目缺少正确标记为np.nan的值，而另一些则只是空值。
Based on df.info() we need to transform the item types for our Likert items as they are currently formatted as ‘objects’.
基于df.info()，我们需要转换Likert项目的项目类型，因为它们当前被格式化为“对象”。
Finally, we need to transform some of the values in order to improve the readability of our visualizations.
最后，我们需要转换一些值，以提高可视化效果的可读性。

# Replacing nulls with np.nan
for var in df.columns:
    df[var].replace(to_replace=' ', value=np.nan, inplace=True)# Converting feature types
likert_items = df[['promotional_opportunities', 'performance_recognized',
       'feedback_offered', 'coaching_offered', 'mgmt_clear_mission',
       'mgmt_support_me', 'mgmt_support_team', 'mgmt_clear_comm',
       'direct_mgmt_satisfaction', 'job_stimulating', 'initiative_encouraged',
       'skill_variety', 'knowledge_variety', 'task_variety', 'fair_salary',
       'teamwork', 'team_support', 'team_comm', 'team_culture',
       'job_train_satisfaction', 'personal_train_satisfaction', 'org_culture',
       'grievances_resolution', 'co-worker_interaction',
       'workplace_conditions', 'job_stress', 'work/life_balance']]for col in likert_items:
    df[col] = pd.to_numeric(df[col], errors='coerce').astype('float64')# Discretization of tenure
bins = [0,4,9,14,19,24]
labels = ['0-4yrs', '5-9yrs', '10-14yrs', '15-19yrs', '20+yrs']
df['tenure'] = pd.cut(df['tenure'], bins = bins, labels=labels)

潜在构造的发展 (Development of Latent Constructs)

In my previous article, we reviewed the process of analyzing the statistical rigor (ie. validity, reliability, factor analysis) of our survey. Feel free to review the but let’s quickly review what latent survey constructs are and how they are derived.

在我的上一篇文章中，我们回顾了对调查的统计严谨性(即有效性，可靠性，因子分析)进行分析的过程。随时查看，但是让我们快速回顾一下潜在的调查构造及其衍生方式。

In order to develop survey items or questions which maintain good statistical rigor, we have to begin with scholarly literature. We want to find a theoretical model that describes the phenomena we wish to measure. For example, personality surveys very often will use the Big-5 model (ie. openness, conscientiousness, extraversion, agreeableness, and neuroticism) to develop the survey items. The survey developer will carefully craft 2–10 (depending on the length of the survey) items for each component of the model. The items which are meant to assess the same component are said to be measuring a “latent construct”. In order words, we are not measuring “extraversion” explicitly as that would be an “observed construct” but indirectly through the individual survey items. The survey is pilot tested with multiple samples of respondents until a certain level of rigor is attained. Once again, if you’re interested in the statistical analyses used to determine rigor take a look at my previous article.

为了开发能够保持良好统计严格性的调查项目或问题，我们必须从学术文献开始。我们想要找到一个描述我们要测量的现象的理论模型。例如，性格调查经常会使用Big-5模型(即开放性，尽责性，性格外向，好感和神经质)来开发调查项目。调查开发人员将为模型的每个组成部分精心制作2-10个项目(取决于调查的时间长短)。旨在评估同一组成部分的项目据说正在衡量“潜在构成”。换句话说，我们并不是在明确衡量“外向性”，因为这将是“观察到的结构”，而是通过各个调查项目间接衡量的。该调查是通过对多个受访者样本进行的先导测试，直到达到一定程度的严格性为止。再次，如果您对用于确定严谨性的统计分析感兴趣，请参阅我的上一篇文章。

# Calculating latent variables
df['employee_valued'] = np.nanmean(df[['promotional_opportunities',
                        'performance_recognized',
                        'feedback_offered',
                        'coaching_offered']], axis=1)df['mgmt_sati'] = np.nanmean(df[['mgmt_clear_mission',
                  'mgmt_support_me', 'mgmt_support_team', 'mgmt_clear_comm', 'direct_mgmt_satisfaction']], axis=1)df['job_satisfaction'] = np.nanmean(df[['job_stimulating',
'initiative_encouraged','skill_variety','knowledge_variety',
                                     'task_variety']], axis=1)df['team_satisfaction'] = np.nanmean(df[['teamwork','team_support',
                          'team_comm','team_culture']], axis=1)df['training_satisfaction'] = np.nanmean(df[['job_train_satisfaction',
'personal_train_satisfaction']], axis=1)df['org_environment'] = np.nanmean(df[['org_culture','grievances_resolution',
'co-worker_interaction','workplace_conditions']], axis=1)df['work_life_balance'] = np.nanmean(df[['job_stress','work/life_balance']], axis=1)df['overall_sati'] = np.nanmean(df[['promotional_opportunities', 'performance_recognized','feedback_offered', 'coaching_offered', 'mgmt_clear_mission','mgmt_support_me', 'mgmt_support_team', 'mgmt_clear_comm','direct_mgmt_satisfaction', 'job_stimulating', 'initiative_encouraged','skill_variety', 'knowledge_variety', 'task_variety', 'fair_salary','teamwork', 'team_support', 'team_comm', 'team_culture', 'job_train_satisfaction', 'personal_train_satisfaction', 'org_culture', 'grievances_resolution', 'co-worker_interaction', 'workplace_conditions', 'job_stress', 'work/life_balance']], axis=1)

Our exit survey has also been developed to assess certain latent constructs. Each survey item is averaged in accordance with the latent factor is it meant to measure. Finally, we have calculated an “overall_sati” feature which calculates the grand average across all items/latent factors for each respondent.

我们的出口调查也已经开发出来，以评估某些潜在的构造。每个调查项目均根据要测量的潜在因子进行平均。最后，我们计算了一个“ overall_sati”功能，该功能可以计算每个受访者所有项目/潜在因素的总计平均值。

Below is a list of the survey items and the latent construct they are meant to measure. Keep in mind each label for each item has been shortened significantly in order to help facilitate visualizations. You can imagine the items asking questions such as “On a scale of 1–5, I find my job stimulating”.

以下是调查项目和它们将要测量的潜在结构的列表。请记住，每个项目的每个标签都已大大缩短，以帮助促进可视化。您可以想象这些项目会问一些问题，例如“在1到5分之间，我发现我的工作很刺激”。

mappings = {1:'1) Dissatisfied', 2:'1) Dissatisfied', 3:'2) Neutral', 4:'3) Satisfied', 5:'3) Satisfied'}
likert = ['promotional_opportunities', 'performance_recognized',
       'feedback_offered', 'coaching_offered', 'mgmt_clear_mission',
       'mgmt_support_me', 'mgmt_support_team', 'mgmt_clear_comm',
       'direct_mgmt_satisfaction', 'job_stimulating', 'initiative_encouraged',
       'skill_variety', 'knowledge_variety', 'task_variety', 'fair_salary',
       'teamwork', 'team_support', 'team_comm', 'team_culture',
       'job_train_satisfaction', 'personal_train_satisfaction', 'org_culture',
       'grievances_resolution', 'co-worker_interaction',
       'workplace_conditions', 'job_stress', 'work/life_balance']for col in likert:
    df[col+'_short'] = d