Uda_BA_项目2_分析问卷数据_2020/09/17

最新推荐文章于 2024-08-09 23:22:16 发布

Freefish0704

最新推荐文章于 2024-08-09 23:22:16 发布

阅读量135

点赞数

分类专栏： Udacity_商业分析文章标签：数据分析

本文链接：https://blog.csdn.net/weixin_44715441/article/details/108593934

版权

Udacity_商业分析专栏收录该内容

5 篇文章 0 订阅

订阅专栏

项目提交文件：

记录你通过电子表格工作簿进行的任何数据清理工作

Column	Action
hours per week in consuming learning materials	hours range changed to 3
hours per week in applying what you've learnt	hours range changed to specific number of hours
age	add age column and remove outliers
On average, how many hours of sleep do you get per night?	remove outliers
On average, how many hours do you spend sitting per day?	remove outliers
all	replace space in columns names with underscore(_) for better data manipulation in python

使用工具：Excel，Python，Powerpoint
1. 参加各纳米学位的人数各有多少？

import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
%matplotlib inline
df=pd.read_csv(r'local_path\project data\band-surveydata-2\band-surveydata 2\surveydata.csv')

df1 = df[['Business_Analyst','Data_Analyst','Machine_Learning_Engineer','Artificial_Intelligence', 'Deep_Learning_Foundations','Self-Driving_Car_Engineer', 'Robotics']].count()
df1

Business_Analyst              18
Data_Analyst                 148
Machine_Learning_Engineer    213
Artificial_Intelligence      102
Deep_Learning_Foundations    275
Self-Driving_Car_Engineer     15
Robotics                       8
dtype: int64

# 画柱状图，将X轴标签名简化,并添加标题及x和y轴名称
fig1,ax1 = plt.subplots()
df1.plot(kind='bar')
ax1.set_title('No.of Enrollees by Nano Degree')
ax1.set_xticklabels(['Business_Analyst','Data_Analyst','Machine_Learning','Artificial_Intelligence', 'Deep_Learning','Self-Driving_Car', 'Robotics'],rotation=70)
ax1.set_xlabel('Nano Degree')
ax1.set_ylabel('No.of Enrollees')

2. 参家纳米学位的年龄的分布情况？

df['Age'].describe()

count    701.000000
mean      35.680456
std        8.351426
min       22.000000
25%       30.000000
50%       34.000000
75%       40.000000
max       81.000000
Name: Age, dtype: float64

fig2, ax2 = plt.subplots()
ax2.boxplot(df['Age'],showmeans=True,vert=False)# 以点的形式显示均值
# meanprops = {'marker':'D','markerfacecolor':'yellow'} # 设置均值点的属性，点的形状、填充色
# medianprops = {'linestyle':'--','color':'red'} # 设置中位数线的属性，线的类型和颜色)
ax2.set_title('Age Distribution of Nano Degree Enrollees')
ax2.set_xlabel('Age')

3. 和数据分析相关的纳米学位（Business Analyst，Data Analyst）的行业分布情况？

df2 = df.query('Business_Analyst=="Business Analyst" or Data_Analyst=="Data Analyst"')['What_industry_do_you_work_in?'].value_counts()
df2

Technology & Internet                       29
Education                                   14
Healthcare and Pharmaceuticals              10
Entertainment & Leisure                      7
Insurance                                    7
Retail & Consumer Durables                   6
Advertising & Marketing                      5
Manufacturing                                5
Business Support & Logistics                 4
Telecommunications                           4
Automotive                                   4
Real Estate                                  3
Government                                   3
Electronics                                  2
Transportation & Delivery                    2
Food & Beverages                             2
Utilities, Energy and Extraction             2
Airlines & Aerospace (including Defense)     1
Name: What_industry_do_you_work_in?, dtype: int64

fig3,ax3=plt.subplots()
df2.plot(kind = "barh")
df2.plot(kind = "barh")
df2.plot(kind = "barh")
ax3.set_title("Industry Distribution of Data Analysis Related Nano Degrees")
ax3.set_xlabel('No. of Enrollees')
ax3.set_ylabel('Industry')

4. 就业情况不同，愿意将课程推荐给其他人的程度有何不同？

fig4=plt.figure(figsize=(10,6))
ax4 = fig4.add_subplot(1,2,1)
ax5 = fig4.add_subplot(1,2,2)
df3 = df[df['Are_you_employed?'] == 1]['How_likely_is_it_that_you_would_recommend_Udacity_to_a_friend_or_colleague?']
df4 = df[df['Are_you_employed?'] == 0]['How_likely_is_it_that_you_would_recommend_Udacity_to_a_friend_or_colleague?']
ax4.hist(df3)
ax5.hist(df4)
ax4.set_title("Enployed")
ax4.set_xlabel("Rating of Willingness to Recommend")
ax4.set_ylabel("No.of Enrollees")
ax5.set_title("Unemployed")
ax5.set_xlabel("Rating of Willingness to Recommend")
ax5.set_ylabel("No.of Enrollees")

print("employed:\n",df3.describe())
print("unemployed:\n",df4.describe())

employed:
 count    578.000000
mean       9.036332
std        1.254619
min        2.000000
25%        8.000000
50%       10.000000
75%       10.000000
max       10.000000
Name: How_likely_is_it_that_you_would_recommend_Udacity_to_a_friend_or_colleague?, dtype: float64
unemployed:
 count    123.000000
mean       8.918699
std        1.555495
min        0.000000
25%        8.000000
50%        9.000000
75%       10.000000
max       10.000000
Name: How_likely_is_it_that_you_would_recommend_Udacity_to_a_friend_or_colleague?, dtype: float64

最终PPT
参考内容 https://matplotlib.org/api/_as_gen/matplotlib.pyplot.boxplot.html?highlight=boxplot#matplotlib.pyplot.boxplot

Freefish0704

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Uda_BA_项目2_分析问卷数据_2020/09/17

项目提交文件：记录你通过电子表格工作簿进行的任何数据清理工作 Column Action hours per week in consuming learning materials hours range changed to 3 hours per week in applying what you've learnt hours range changed to specific number of hours age add age co.
复制链接

扫一扫