项目提交文件:
- 记录你通过电子表格工作簿进行的任何数据清理工作
-
Column Action hours per week in consuming learning materials hours range changed to 3 hours per week in applying what you've learnt hours range changed to specific number of hours age add age column and remove outliers On average, how many hours of sleep do you get per night? remove outliers On average, how many hours do you spend sitting per day? remove outliers all replace space in columns names with underscore(_) for better data manipulation in python - 使用工具:Excel,Python,Powerpoint
- 1. 参加各纳米学位的人数各有多少?
-
import pandas as pd import numpy as np import matplotlib.pyplot as plt %matplotlib inline df=pd.read_csv(r'local_path\project data\band-surveydata-2\band-surveydata 2\surveydata.csv')
df1 = df[['Business_Analyst','Data_Analyst','Machine_Learning_Engineer','Artificial_Intelligence', 'Deep_Learning_Foundations','Self-Driving_Car_Engineer', 'Robotics']].count() df1
Business_Analyst 18 Data_Analyst 148 Machine_Learning_Engineer 213 Artificial_Intelligence 102 Deep_Learning_Foundations 275 Self-Driving_Car_Engineer 15 Robotics 8 dtype: int64
-
# 画柱状图,将X轴标签名简化,并添加标题及x和y轴名称 fig1,ax1 = plt.subplots() df1.plot(kind='bar') ax1.set_title('No.of Enrollees by Nano Degree') ax1.set_xticklabels(['Business_Analyst','Data_Analyst','Machine_Learning','Artificial_Intelligence', 'Deep_Learning','Self-Driving_Car', 'Robotics'],rotation=70) ax1.set_xlabel('Nano Degree') ax1.set_ylabel('No.of Enrollees')
- 2. 参家纳米学位的年龄的分布情况?
df['Age'].describe()
count 701.000000 mean 35.680456 std 8.351426 min 22.000000 25% 30.000000 50% 34.000000 75% 40.000000 max 81.000000 Name: Age, dtype: float64
-
fig2, ax2 = plt.subplots() ax2.boxplot(df['Age'],showmeans=True,vert=False)# 以点的形式显示均值 # meanprops = {'marker':'D','markerfacecolor':'yellow'} # 设置均值点的属性,点的形状、填充色 # medianprops = {'linestyle':'--','color':'red'} # 设置中位数线的属性,线的类型和颜色) ax2.set_title('Age Distribution of Nano Degree Enrollees') ax2.set_xlabel('Age')
- 3. 和数据分析相关的纳米学位(Business Analyst,Data Analyst)的行业分布情况?
-
df2 = df.query('Business_Analyst=="Business Analyst" or Data_Analyst=="Data Analyst"')['What_industry_do_you_work_in?'].value_counts() df2
Technology & Internet 29 Education 14 Healthcare and Pharmaceuticals 10 Entertainment & Leisure 7 Insurance 7 Retail & Consumer Durables 6 Advertising & Marketing 5 Manufacturing 5 Business Support & Logistics 4 Telecommunications 4 Automotive 4 Real Estate 3 Government 3 Electronics 2 Transportation & Delivery 2 Food & Beverages 2 Utilities, Energy and Extraction 2 Airlines & Aerospace (including Defense) 1 Name: What_industry_do_you_work_in?, dtype: int64
fig3,ax3=plt.subplots() df2.plot(kind = "barh") df2.plot(kind = "barh") df2.plot(kind = "barh") ax3.set_title("Industry Distribution of Data Analysis Related Nano Degrees") ax3.set_xlabel('No. of Enrollees') ax3.set_ylabel('Industry')
4. 就业情况不同,愿意将课程推荐给其他人的程度有何不同?
-
fig4=plt.figure(figsize=(10,6)) ax4 = fig4.add_subplot(1,2,1) ax5 = fig4.add_subplot(1,2,2) df3 = df[df['Are_you_employed?'] == 1]['How_likely_is_it_that_you_would_recommend_Udacity_to_a_friend_or_colleague?'] df4 = df[df['Are_you_employed?'] == 0]['How_likely_is_it_that_you_would_recommend_Udacity_to_a_friend_or_colleague?'] ax4.hist(df3) ax5.hist(df4) ax4.set_title("Enployed") ax4.set_xlabel("Rating of Willingness to Recommend") ax4.set_ylabel("No.of Enrollees") ax5.set_title("Unemployed") ax5.set_xlabel("Rating of Willingness to Recommend") ax5.set_ylabel("No.of Enrollees")
-
print("employed:\n",df3.describe()) print("unemployed:\n",df4.describe())
-
employed: count 578.000000 mean 9.036332 std 1.254619 min 2.000000 25% 8.000000 50% 10.000000 75% 10.000000 max 10.000000 Name: How_likely_is_it_that_you_would_recommend_Udacity_to_a_friend_or_colleague?, dtype: float64 unemployed: count 123.000000 mean 8.918699 std 1.555495 min 0.000000 25% 8.000000 50% 9.000000 75% 10.000000 max 10.000000 Name: How_likely_is_it_that_you_would_recommend_Udacity_to_a_friend_or_colleague?, dtype: float64
- 最终PPT
- 参考内容 https://matplotlib.org/api/_as_gen/matplotlib.pyplot.boxplot.html?highlight=boxplot#matplotlib.pyplot.boxplot