kaggle练习题、pandas练习题(Exercise: Grouping and Sorting)

group这个函数可以在python中实现类似excel表格中数据透视表的可视化效果

ignore_indexreset_index()的区别:
前者是方法中的一个参数,后者是一个方法。
前者是新dataframe从0开始计数,后者是将分组后的数据(数据透视表形式)划为面板数据
idxmax()这个方法可以找到当前列最大值对应的索引
9.5号发现个set_index()方法,这个方法是设置索引方便join进行表连接。reset_index()和set_index()是完全不同是两个方法。

方法介绍

group函数用法

1.实现value_counts()的效果

reviews.groupby('points').points.count()	#计算不同数据的出现次数

在这里插入图片描述

2.计算分组情况下各组price的最小值是多少

reviews.groupby('points').price.min()  	

![在这里插入图片描述](https://img-blog.csdnimg.cn/0046276056164db3824d9d1b3346c30d.pngpic_center

3.分组后通过lambda函数计算各组title的第一行数据

reviews.groupby('winery').apply(lambda df: df.title.iloc[0])

在这里插入图片描述

4.分组后通过lambda函数计算各组中points最高的index对应的行

reviews.groupby(['country', 'province']).apply(lambda df: df.loc[df.points.idxmax()])  	

在这里插入图片描述

5.功能类似于describe()函数的agg()函数的用法

长度、最小值、最大值、分位数等

reviews.groupby(['country']).price.agg([len, min, max])

在这里插入图片描述

6.多重索引以及reset_index()转为面板数据

mutil-index其实就是分组了,通过reset_index()方法可以让分组后(形似数据透视表)的数据转为面板数据

countries_reviewed = reviews.groupby(['country', 'province']).description.agg([len])
countries_reviewed

在这里插入图片描述

mi = countries_reviewed.index
type(mi)
#pandas.core.indexes.multi.MultiIndex
countries_reviewed.reset_index()

在这里插入图片描述

sort_values和sort_index排序函数

countries_reviewed.sort_values(by='len')

在这里插入图片描述

countries_reviewed.sort_values(by='len', ascending=False)

在这里插入图片描述

countries_reviewed.sort_index()

在这里插入图片描述

countries_reviewed.sort_values(by=['country', 'len'])

在这里插入图片描述

习题

1.Who are the most common wine reviewers in the dataset? Create a Series whose index is the taster_twitter_handle category from the dataset, and whose values count how many reviews each person wrote.

reviews_written = reviews.groupby('taster_twitter_handle').size()
#或者
reviews_written = reviews.groupby('taster_twitter_handle').taster_twitter_handle.count()

2.What is the best wine I can buy for a given amount of money? Create a Series whose index is wine prices and whose values is the maximum number of points a wine costing that much was given in a review. Sort the values by price, ascending (so that 4.0 dollars is at the top and 3300.0 dollars is at the bottom).

best_rating_per_price = reviews.groupby('price').apply(lambda df : df.loc[df.points.idxmax()])
best_rating_per_price = pd.Series(best_rating_per_price['points'],index=best_rating_per_price['price'] 

3.What is the best wine I can buy for a given amount of money? Create a Series whose index is wine prices and whose values is the maximum number of points a wine costing that much was given in a review. Sort the values by price, ascending (so that 4.0 dollars is at the top and 3300.0 dollars is at the bottom).

price_extremes = reviews.groupby('variety').price.agg([min,max])

4.What are the most expensive wine varieties? Create a variable sorted_varieties containing a copy of the dataframe from the previous question where varieties are sorted in descending order based on minimum price, then on maximum price (to break ties).

sorted_varieties = price_extremes.sort_values(by=['min','max'],ascending=False)

5.Create a Series whose index is reviewers and whose values is the average review score given out by that reviewer. Hint: you will need the taster_name and points columns.

reviewer_mean_ratings = reviews.groupby('taster_name').points.apply(lambda x:sum(x)/len(x))
#或者
reviewer_mean_ratings = reviews.groupby('taster_name').points.mean()

6.What combination of countries and varieties are most common? Create a Series whose index is a MultiIndexof {country, variety} pairs. For example, a pinot noir produced in the US should map to {“US”, “Pinot Noir”}. Sort the values in the Series in descending order based on wine count.

country_variety_counts = reviews.groupby(['country','variety']).variety.count().sort_values(ascending = False)
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
很抱歉,我无法回答关于kaggle python练习题的问题,因为我无法访问外部链接并查看具体的练习内容。但是,我可以帮助您解答关于Python编程的其他问题。请告诉我您有关Python的具体问题,我将尽力为您提供帮助。 #### 引用[.reference_title] - *1* [kaggle入门——Python篇(二)--练习:语法,变量和数字](https://blog.csdn.net/qq_40276310/article/details/81180038)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^control_2,239^v3^insert_chatgpt"}} ] [.reference_item] - *2* [【Python|Kaggle】机器学习系列之Pandas基础练习题(一)](https://blog.csdn.net/weixin_44225182/article/details/119683550)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^control_2,239^v3^insert_chatgpt"}} ] [.reference_item] - *3* [【Python|Kaggle】机器学习系列之Pandas基础练习题(四)](https://blog.csdn.net/weixin_44225182/article/details/119730890)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^control_2,239^v3^insert_chatgpt"}} ] [.reference_item] [ .reference_list ]

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值