Uda_BA_项目2_分析问卷数据_2020/09/17

项目提交文件:

  • 记录你通过电子表格工作簿进行的任何数据清理工作
  • ColumnAction
    hours per week in consuming learning materials hours range changed to 3
    hours per week in applying what you've learnthours range changed to specific number of hours
    ageadd age column and remove outliers
    On average, how many hours of sleep do you get per night?remove outliers
    On average, how many hours do you spend sitting per day?remove outliers
    allreplace space in columns names with underscore(_) for better data manipulation in python
  • 使用工具:Excel,Python,Powerpoint
  • 1. 参加各纳米学位的人数各有多少?
  • import pandas as pd
    import numpy as np 
    import matplotlib.pyplot as plt
    %matplotlib inline
    df=pd.read_csv(r'local_path\project data\band-surveydata-2\band-surveydata 2\surveydata.csv')
    df1 = df[['Business_Analyst','Data_Analyst','Machine_Learning_Engineer','Artificial_Intelligence', 'Deep_Learning_Foundations','Self-Driving_Car_Engineer', 'Robotics']].count()
    df1
    Business_Analyst              18
    Data_Analyst                 148
    Machine_Learning_Engineer    213
    Artificial_Intelligence      102
    Deep_Learning_Foundations    275
    Self-Driving_Car_Engineer     15
    Robotics                       8
    dtype: int64
  • # 画柱状图,将X轴标签名简化,并添加标题及x和y轴名称
    fig1,ax1 = plt.subplots()
    df1.plot(kind='bar')
    ax1.set_title('No.of Enrollees by Nano Degree')
    ax1.set_xticklabels(['Business_Analyst','Data_Analyst','Machine_Learning','Artificial_Intelligence', 'Deep_Learning','Self-Driving_Car', 'Robotics'],rotation=70)
    ax1.set_xlabel('Nano Degree')
    ax1.set_ylabel('No.of Enrollees')  

  • 2. 参家纳米学位的年龄的分布情况?
    df['Age'].describe()
    count    701.000000
    mean      35.680456
    std        8.351426
    min       22.000000
    25%       30.000000
    50%       34.000000
    75%       40.000000
    max       81.000000
    Name: Age, dtype: float64
  • fig2, ax2 = plt.subplots()
    ax2.boxplot(df['Age'],showmeans=True,vert=False)# 以点的形式显示均值
    # meanprops = {'marker':'D','markerfacecolor':'yellow'} # 设置均值点的属性,点的形状、填充色
    # medianprops = {'linestyle':'--','color':'red'} # 设置中位数线的属性,线的类型和颜色)
    ax2.set_title('Age Distribution of Nano Degree Enrollees')
    ax2.set_xlabel('Age')
    

  • 3. 和数据分析相关的纳米学位(Business Analyst,Data Analyst)的行业分布情况?
  • df2 = df.query('Business_Analyst=="Business Analyst" or Data_Analyst=="Data Analyst"')['What_industry_do_you_work_in?'].value_counts()
    df2
    Technology & Internet                       29
    Education                                   14
    Healthcare and Pharmaceuticals              10
    Entertainment & Leisure                      7
    Insurance                                    7
    Retail & Consumer Durables                   6
    Advertising & Marketing                      5
    Manufacturing                                5
    Business Support & Logistics                 4
    Telecommunications                           4
    Automotive                                   4
    Real Estate                                  3
    Government                                   3
    Electronics                                  2
    Transportation & Delivery                    2
    Food & Beverages                             2
    Utilities, Energy and Extraction             2
    Airlines & Aerospace (including Defense)     1
    Name: What_industry_do_you_work_in?, dtype: int64
    fig3,ax3=plt.subplots()
    df2.plot(kind = "barh")
    df2.plot(kind = "barh")
    df2.plot(kind = "barh")
    ax3.set_title("Industry Distribution of Data Analysis Related Nano Degrees")
    ax3.set_xlabel('No. of Enrollees')
    ax3.set_ylabel('Industry')
    4. 就业情况不同,愿意将课程推荐给其他人的程度有何不同?
  • fig4=plt.figure(figsize=(10,6))
    ax4 = fig4.add_subplot(1,2,1)
    ax5 = fig4.add_subplot(1,2,2)
    df3 = df[df['Are_you_employed?'] == 1]['How_likely_is_it_that_you_would_recommend_Udacity_to_a_friend_or_colleague?']
    df4 = df[df['Are_you_employed?'] == 0]['How_likely_is_it_that_you_would_recommend_Udacity_to_a_friend_or_colleague?']
    ax4.hist(df3)
    ax5.hist(df4)
    ax4.set_title("Enployed")
    ax4.set_xlabel("Rating of Willingness to Recommend")
    ax4.set_ylabel("No.of Enrollees")
    ax5.set_title("Unemployed")
    ax5.set_xlabel("Rating of Willingness to Recommend")
    ax5.set_ylabel("No.of Enrollees")

  • print("employed:\n",df3.describe())
    print("unemployed:\n",df4.describe())
  • employed:
     count    578.000000
    mean       9.036332
    std        1.254619
    min        2.000000
    25%        8.000000
    50%       10.000000
    75%       10.000000
    max       10.000000
    Name: How_likely_is_it_that_you_would_recommend_Udacity_to_a_friend_or_colleague?, dtype: float64
    unemployed:
     count    123.000000
    mean       8.918699
    std        1.555495
    min        0.000000
    25%        8.000000
    50%        9.000000
    75%       10.000000
    max       10.000000
    Name: How_likely_is_it_that_you_would_recommend_Udacity_to_a_friend_or_colleague?, dtype: float64
  • 最终PPT
  • 参考内容 https://matplotlib.org/api/_as_gen/matplotlib.pyplot.boxplot.html?highlight=boxplot#matplotlib.pyplot.boxplot
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值