从pandas_exercises里学到的

最新推荐文章于 2024-08-10 08:21:43 发布

soul booster

最新推荐文章于 2024-08-10 08:21:43 发布

阅读量491

点赞数

分类专栏：学习文章标签： python

本文链接：https://blog.csdn.net/qq_41967784/article/details/108224131

版权

学习专栏收录该内容

37 篇文章 0 订阅

订阅专栏

孤寡

前言
读数据
计算占比
div函数

前言

github上找的练习pandas的一个项目，记录下学到了啥。

项目
https://github.com/guipsamora/pandas_exercises

下载到本地用Jupyter notebook打开就好啦。

读数据

众所周知，读取数据直接用pandas.read_(table,csv,sql,excel)就好，需要注意的是read_table和read_csv里默认的参数sep并不同，前者是\t，而后者是, 。

计算占比

计算占比时总会agg两个函数，搞的列名会多一层就很烦：

# 各职业里面男性的占比
def m_num(x):
    return x[x.values=='M'].count()
c = users.groupby('occupation').agg({'gender':['count', m_num]}).droplevel(axis=1, level=0)
c['Male ratio'] = c['m_num'] / c['count']

新学到一个思路就是利用value_counts函数，取两个Series直接计算：

c = users.groupby('occupation').agg({'gender':m_num}).gender
r = c / users.occupation.value_counts()

div函数

# create a data frame and apply count to gender
gender_ocup = users.groupby(['occupation', 'gender']).agg({'gender': 'count'})

# create a DataFrame and apply count for each occupation
occup_count = users.groupby(['occupation']).agg('count')

# divide the gender_ocup per the occup_count and multiply per 100
occup_gender = gender_ocup.div(occup_count, level = "occupation") * 100