这次的练习是用到这个csv文件 (大小为16.24 MB):(只是显示一部分)
代码:
import pandas as pd
# Read Salaries.csv as a dataframe called sal.
sal = pd.read_csv('Salaries.csv')
# Check the head of the dataframe
print("*** CSV 文件里的头几个内容 ****")
print(sal.head())
print('\n')
# Use the .info() method to find out how many entries there are.
print("*** 获取有多少个entries ****")
print(sal.info())
print('\n')
# What is the average BasePay?
print("** 获取平均的BasePay是多少 ***")
print(sal['BasePay'].mean())
print('\n')
# What is the highest amount of OvertimePay in the dataset?
print("*** 获取最高数目的OvertimePay ****")
print(sal['OvertimePay'].max())
print('\n')
# What is the job title of JOSEPH DRISCOLL?
print("*** 获取员工Joseph Driscoll的职位名称 ****")
print(sal[sal['EmployeeName'] == 'JOSEPH DRISCOLL']['JobTitle'])
print('\n')
# How much does JOSEPH DRISCOLL make (including benefits)?
print("*** 获取JOSEPH DRISCOLL 工资 ****")
print(sal[sal['EmployeeName'] == 'JOSEPH DRISCOLL']['TotalPayBenefits'])
print('\n')
# What is the name of highest paid person (including benefits)?
print("*** 获取谁的工资最高 ****")
print(sal[sal['TotalPayBenefits'] == sal['TotalPayBenefits'].max()]['EmployeeName'])
print('\n 另一种方法')
print(sal.loc[sal['TotalPayBenefits'].idxmax()])
print('\n 另一种方法')
print(sal.iloc[sal['TotalPayBenefits'].argmax()])
print('\n')
# What is the name of lowest paid person (including benefits)?
print("*** 找出哪个员工工资最少 ****")
print(sal.iloc[sal['TotalPayBenefits'].argmin()])
print('\n 另一种方法')
print(sal[sal['TotalPayBenefits'] == sal['TotalPayBenefits'].min()]['EmployeeName'])
print('\n')
# What was the average (mean) BasePay of all employees per year? (2011 - 2014)?
print("*** 在2011 - 2014年间,员工的基础平均工资是多少 ****")
print(sal.groupby('Year').mean())
print('\n')
print(sal.groupby('Year').mean()['BasePay'])
print('\n')
# How many unique job titles are there?
print("*** 一共有多少独一无二的job titles ****")
print("数目:", len(sal['JobTitle'].unique()))
print('\n 另一种写法')
print("数目:", sal['JobTitle'].nunique())
print('\n')
# What are the top 5 most common jobs?
print("** 前五名的说欢迎的职位 ***")
print(sal['JobTitle'].value_counts().head(5))
print('\n')
# How many Job Titles were represented by only one person in 2013?
print("*** 只有一个人的职位 ***")
print(sum(sal[sal['Year']==2013]['JobTitle'].value_counts() == 1))
print('\n')
# How many people have the word Chief in their job title?
print('*** 获取一共职位名称包含Chief ****')
def chief_string(title):
if 'chief' in title.lower().split():
return True
else:
return False
print(sum(sal['JobTitle'].apply(lambda x: chief_string(x))))
print('\n')
# Is there a correlation between length of the Job Title string and Salary?
print("**** 职务的名称长度跟薪水有关联吗?****")
sal['title_len'] = sal['JobTitle'].apply(len)
sal[['title_len','TotalPayBenefits']]
print(sal[['TotalPayBenefits','title_len']].corr()) # corr() is used to find the pairwise correlation of all columns in the dataframe.
结果如下:
如果觉得不错,就点赞或者关注或者留言~~
谢谢~ ~