Kaggle3- pandas(2)

最新推荐文章于 2024-10-11 17:30:36 发布

weixin_30843605

最新推荐文章于 2024-10-11 17:30:36 发布

阅读量107

点赞数

文章标签： python

原文链接：http://www.cnblogs.com/liu247/p/11115348.html

版权

# 设置一次最多显示几行
pd.set_option("display.max_rows", 5)
1.Grouping
# 分组后并且查看组内的数量
reviews.groupby('points').points.count()

# 分组后查看，查看组内的最小值
reviews.groupby('points').price.min()

points
80      5.0
81      5.0
       ... 
99     44.0
100    80.0

# 查看每个酒坊的地一个酒 --> 相当与返回了许多的DataFrame
reviews.groupby('winery').apply(lambda df: df.title.iloc[0])

# 挑选出每个国家中每个省中评分最高的酒(分了两次组) 国家在前，省份在后

reviews.groupby(['country', 'province']).apply(lambda df: df.loc[df.points.idxmax()])

# 值得一提的是，他允许你同时使用多个函数，来方便我们得统计

reviews.groupby(['country']).price.agg([len, min, max])

             len        min    max
country            
Argentina    3800.0    4.0    230.0
Armenia      2.0       14.0   15.0
Australia    2329.0    5.0    850.0
Austria      3345.0    7.0    1100.0

# 对数据重置索引 ----> 默认保留以前的下标

countries_reviewed.reset_index()

Sort

# 以那一列为标准，进行排序 ---> 默认为从小到大 ascending=True

countries_reviewed.sort_values(by='len')

# 以两列为标准

countries_reviewed.sort_values(by=['country', 'len'])

# 做题

best_rating_per_price = reviews.groupby('price')['points'].max().sort_index()

---------------> 先按‘price'分组，然后挑选出points最大的来，最后在从小到大排序

price_extremes = reviews.groupby('variety').price.agg([min,max])

---------------> 先按‘variety'分组，返回每组的最大值和最小值

country_variety_counts = reviews.groupby(['country','variety']).title.count().sort_values(ascending=False)

----------------> 先按国家和种类分组，然后计算每个酒的数量，最后逆序排序

Data types and missing data reference

# 查看某一行的格式

reviews.price.dtype

# 转换格式

reviews.points.astype('float64')

# 把国家是空的行挑出来

reviews[reviews.country.isnull()]

# 将NAN的值进行填充

reviews.region_2.fillna("1")

# 对某列得特定值进行替换

reviews.taster_name.replace("@kerinokeefe", "@kerino")

# 填充后对每个值计数

reviews.region_1.fillna('Unknow').value_counts()

# 对数据进行重命名

reviews.rename(columns={'points': 'score'})

# 第二种方式

reviews.rename(index={0: 'firstEntry', 1: 'secondEntry'})

# 重命名一个轴

reviews.rename_axis('wines', axis='rows')

# 最简单的合并方式

pd.concat([canadian_youtube, british_youtube])

# 另一个合成的方式由于有相同的列名，所以lsuffix 加以区分

left.join(right, lsuffix='_CAN', rsuffix='_UK')

转载于:https://www.cnblogs.com/liu247/p/11115348.html

weixin_30843605

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫