Python3 - Pandas Exercise 练习

最新推荐文章于 2021-10-20 18:58:11 发布

CHNMSCS

最新推荐文章于 2021-10-20 18:58:11 发布

阅读量538

点赞数 1

分类专栏： Python3 文章标签： python csv pandas 数据科学 data science

本文链接：https://blog.csdn.net/BSCHN123/article/details/111714750

版权

Python3 专栏收录该内容

64 篇文章 2 订阅

订阅专栏

这次的练习是用到这个csv文件 (大小为16.24 MB)：（只是显示一部分）
在这里插入图片描述
代码：

import pandas as pd

# Read Salaries.csv as a dataframe called sal.
sal = pd.read_csv('Salaries.csv')

# Check the head of the dataframe
print("*** CSV 文件里的头几个内容 ****")
print(sal.head())
print('\n')

# Use the .info() method to find out how many entries there are.
print("*** 获取有多少个entries ****")
print(sal.info())
print('\n')

# What is the average BasePay?
print("** 获取平均的BasePay是多少 ***")
print(sal['BasePay'].mean())
print('\n')

# What is the highest amount of OvertimePay in the dataset?
print("*** 获取最高数目的OvertimePay ****")
print(sal['OvertimePay'].max())
print('\n')

# What is the job title of JOSEPH DRISCOLL?
print("*** 获取员工Joseph Driscoll的职位名称 ****")
print(sal[sal['EmployeeName'] == 'JOSEPH DRISCOLL']['JobTitle'])
print('\n')

# How much does JOSEPH DRISCOLL make (including benefits)?
print("*** 获取JOSEPH DRISCOLL 工资 ****")
print(sal[sal['EmployeeName'] == 'JOSEPH DRISCOLL']['TotalPayBenefits'])
print('\n')

# What is the name of highest paid person (including benefits)?
print("*** 获取谁的工资最高 ****")
print(sal[sal['TotalPayBenefits'] == sal['TotalPayBenefits'].max()]['EmployeeName'])
print('\n 另一种方法')
print(sal.loc[sal['TotalPayBenefits'].idxmax()])
print('\n 另一种方法')
print(sal.iloc[sal['TotalPayBenefits'].argmax()])
print('\n')

# What is the name of lowest paid person (including benefits)?
print("*** 找出哪个员工工资最少 ****")
print(sal.iloc[sal['TotalPayBenefits'].argmin()])
print('\n 另一种方法')
print(sal[sal['TotalPayBenefits'] == sal['TotalPayBenefits'].min()]['EmployeeName'])
print('\n')

# What was the average (mean) BasePay of all employees per year? (2011 - 2014)?
print("*** 在2011 - 2014年间，员工的基础平均工资是多少 ****")
print(sal.groupby('Year').mean())
print('\n')
print(sal.groupby('Year').mean()['BasePay'])
print('\n')

# How many unique job titles are there?
print("*** 一共有多少独一无二的job titles ****")
print("数目：", len(sal['JobTitle'].unique()))
print('\n 另一种写法')
print("数目：", sal['JobTitle'].nunique())
print('\n')

# What are the top 5 most common jobs?
print("** 前五名的说欢迎的职位 ***")
print(sal['JobTitle'].value_counts().head(5))
print('\n')

# How many Job Titles were represented by only one person in 2013?
print("*** 只有一个人的职位 ***")
print(sum(sal[sal['Year']==2013]['JobTitle'].value_counts() == 1))
print('\n')

# How many people have the word Chief in their job title?
print('*** 获取一共职位名称包含Chief ****')
def chief_string(title):
    if 'chief' in title.lower().split():
        return True
    else:
        return False

print(sum(sal['JobTitle'].apply(lambda x: chief_string(x))))

print('\n')

# Is there a correlation between length of the Job Title string and Salary?
print("**** 职务的名称长度跟薪水有关联吗？****")
sal['title_len'] = sal['JobTitle'].apply(len)
sal[['title_len','TotalPayBenefits']]
print(sal[['TotalPayBenefits','title_len']].corr()) # corr() is used to find the pairwise correlation of all columns in the dataframe.

结果如下：
在这里插入图片描述

如果觉得不错，就点赞或者关注或者留言~~
谢谢~ ~

CHNMSCS

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
2
评论
Python3 - Pandas Exercise 练习

这次的练习是用到这个csv文件 (大小为16.24 MB)：（只是显示一部分）代码：import pandas as pd# Read Salaries.csv as a dataframe called sal.sal = pd.read_csv('Salaries.csv')# Check the head of the dataframeprint("*** CSV 文件里的头几个内容 ****")print(sal.head())print('\n')# Use the .
复制链接

扫一扫