kaggle练习题、pandas练习题(Exercise: Grouping and Sorting)

xxY0_0Yxx

已于 2023-09-05 10:30:32 修改

阅读量147

点赞数

分类专栏：数据分析 python基础文章标签： pandas python

于 2023-09-04 16:46:23 首次发布

本文链接：https://blog.csdn.net/weixin_46395175/article/details/132668114

版权

python基础同时被 2 个专栏收录

6 篇文章 0 订阅

订阅专栏

数据分析

3 篇文章 0 订阅

订阅专栏

group这个函数可以在python中实现类似excel表格中数据透视表的可视化效果

ignore_index和reset_index()的区别：
前者是方法中的一个参数，后者是一个方法。
前者是新dataframe从0开始计数，后者是将分组后的数据(数据透视表形式)划为面板数据
idxmax()这个方法可以找到当前列最大值对应的索引
9.5号发现个set_index()方法，这个方法是设置索引方便join进行表连接。reset_index()和set_index()是完全不同是两个方法。

方法介绍

group函数用法

1.实现value_counts()的效果

reviews.groupby('points').points.count()	#计算不同数据的出现次数

在这里插入图片描述

2.计算分组情况下各组price的最小值是多少

reviews.groupby('points').price.min()

![在这里插入图片描述](https://img-blog.csdnimg.cn/0046276056164db3824d9d1b3346c30d.pngpic_center

3.分组后通过lambda函数计算各组title的第一行数据

reviews.groupby('winery').apply(lambda df: df.title.iloc[0])

在这里插入图片描述

4.分组后通过lambda函数计算各组中points最高的index对应的行

reviews.groupby(['country', 'province']).apply(lambda df: df.loc[df.points.idxmax()])

在这里插入图片描述

5.功能类似于describe()函数的agg()函数的用法

长度、最小值、最大值、分位数等

reviews.groupby(['country']).price.agg([len, min, max])

在这里插入图片描述

6.多重索引以及reset_index()转为面板数据

mutil-index其实就是分组了，通过reset_index()方法可以让分组后(形似数据透视表)的数据转为面板数据

countries_reviewed = reviews.groupby(['country', 'province']).description.agg([len])
countries_reviewed

在这里插入图片描述

mi = countries_reviewed.index
type(mi)
#pandas.core.indexes.multi.MultiIndex
countries_reviewed.reset_index()

在这里插入图片描述

sort_values和sort_index排序函数

countries_reviewed.sort_values(by='len')

在这里插入图片描述

countries_reviewed.sort_values(by='len', ascending=False)

在这里插入图片描述

countries_reviewed.sort_index()

在这里插入图片描述

countries_reviewed.sort_values(by=['country', 'len'])

在这里插入图片描述

习题

1.Who are the most common wine reviewers in the dataset? Create a Series whose index is the taster_twitter_handle category from the dataset, and whose values count how many reviews each person wrote.

reviews_written = reviews.groupby('taster_twitter_handle').size()
#或者
reviews_written = reviews.groupby('taster_twitter_handle').taster_twitter_handle.count()

2.What is the best wine I can buy for a given amount of money? Create a Series whose index is wine prices and whose values is the maximum number of points a wine costing that much was given in a review. Sort the values by price, ascending (so that 4.0 dollars is at the top and 3300.0 dollars is at the bottom).

best_rating_per_price = reviews.groupby('price').apply(lambda df : df.loc[df.points.idxmax()])
best_rating_per_price = pd.Series(best_rating_per_price['points'],index=best_rating_per_price['price']

3.What is the best wine I can buy for a given amount of money? Create a Series whose index is wine prices and whose values is the maximum number of points a wine costing that much was given in a review. Sort the values by price, ascending (so that 4.0 dollars is at the top and 3300.0 dollars is at the bottom).

price_extremes = reviews.groupby('variety').price.agg([min,max])

4.What are the most expensive wine varieties? Create a variable sorted_varieties containing a copy of the dataframe from the previous question where varieties are sorted in descending order based on minimum price, then on maximum price (to break ties).

sorted_varieties = price_extremes.sort_values(by=['min','max'],ascending=False)

5.Create a Series whose index is reviewers and whose values is the average review score given out by that reviewer. Hint: you will need the taster_name and points columns.

reviewer_mean_ratings = reviews.groupby('taster_name').points.apply(lambda x:sum(x)/len(x))
#或者
reviewer_mean_ratings = reviews.groupby('taster_name').points.mean()

6.What combination of countries and varieties are most common? Create a Series whose index is a MultiIndexof {country, variety} pairs. For example, a pinot noir produced in the US should map to {“US”, “Pinot Noir”}. Sort the values in the Series in descending order based on wine count.

country_variety_counts = reviews.groupby(['country','variety']).variety.count().sort_values(ascending = False)

xxY0_0Yxx

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
kaggle练习题、pandas练习题(Exercise: Grouping and Sorting)

通过group和sort函数以及apply等函数的叠加使用，可以让python实现类似excel表格中数据透视表的数据分析展示
复制链接

扫一扫