Python3 - Pandas Exercise 练习

这次的练习是用到这个csv文件 (大小为16.24 MB):(只是显示一部分)
在这里插入图片描述
代码:

import pandas as pd

# Read Salaries.csv as a dataframe called sal.
sal = pd.read_csv('Salaries.csv')

# Check the head of the dataframe
print("*** CSV 文件里的头几个内容 ****")
print(sal.head())
print('\n')

# Use the .info() method to find out how many entries there are.
print("*** 获取有多少个entries ****")
print(sal.info())
print('\n')

# What is the average BasePay?
print("** 获取平均的BasePay是多少 ***")
print(sal['BasePay'].mean())
print('\n')

# What is the highest amount of OvertimePay in the dataset?
print("*** 获取最高数目的OvertimePay ****")
print(sal['OvertimePay'].max())
print('\n')

# What is the job title of JOSEPH DRISCOLL?
print("*** 获取员工Joseph Driscoll的职位名称 ****")
print(sal[sal['EmployeeName'] == 'JOSEPH DRISCOLL']['JobTitle'])
print('\n')

# How much does JOSEPH DRISCOLL make (including benefits)?
print("*** 获取JOSEPH DRISCOLL 工资 ****")
print(sal[sal['EmployeeName'] == 'JOSEPH DRISCOLL']['TotalPayBenefits'])
print('\n')

# What is the name of highest paid person (including benefits)?
print("*** 获取谁的工资最高 ****")
print(sal[sal['TotalPayBenefits'] == sal['TotalPayBenefits'].max()]['EmployeeName'])
print('\n 另一种方法')
print(sal.loc[sal['TotalPayBenefits'].idxmax()])
print('\n 另一种方法')
print(sal.iloc[sal['TotalPayBenefits'].argmax()])
print('\n')

# What is the name of lowest paid person (including benefits)?
print("*** 找出哪个员工工资最少 ****")
print(sal.iloc[sal['TotalPayBenefits'].argmin()])
print('\n 另一种方法')
print(sal[sal['TotalPayBenefits'] == sal['TotalPayBenefits'].min()]['EmployeeName'])
print('\n')

# What was the average (mean) BasePay of all employees per year? (2011 - 2014)?
print("*** 在2011 - 2014年间,员工的基础平均工资是多少 ****")
print(sal.groupby('Year').mean())
print('\n')
print(sal.groupby('Year').mean()['BasePay'])
print('\n')

# How many unique job titles are there?
print("*** 一共有多少独一无二的job titles ****")
print("数目:", len(sal['JobTitle'].unique()))
print('\n 另一种写法')
print("数目:", sal['JobTitle'].nunique())
print('\n')

# What are the top 5 most common jobs?
print("** 前五名的说欢迎的职位 ***")
print(sal['JobTitle'].value_counts().head(5))
print('\n')

# How many Job Titles were represented by only one person in 2013?
print("*** 只有一个人的职位 ***")
print(sum(sal[sal['Year']==2013]['JobTitle'].value_counts() == 1))
print('\n')

# How many people have the word Chief in their job title?
print('*** 获取一共职位名称包含Chief ****')
def chief_string(title):
    if 'chief' in title.lower().split():
        return True
    else:
        return False

print(sum(sal['JobTitle'].apply(lambda x: chief_string(x))))

print('\n')

# Is there a correlation between length of the Job Title string and Salary?
print("**** 职务的名称长度跟薪水有关联吗?****")
sal['title_len'] = sal['JobTitle'].apply(len)
sal[['title_len','TotalPayBenefits']]
print(sal[['TotalPayBenefits','title_len']].corr()) # corr() is used to find the pairwise correlation of all columns in the dataframe.

结果如下:
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
如果觉得不错,就点赞或者关注或者留言~~
谢谢~ ~

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值