4 pandas排序+ 问题

Michael_Flemming

已于 2022-07-22 18:26:30 修改

阅读量388

点赞数

分类专栏： pandas记录文章标签： pandas python 开发语言

于 2022-07-22 15:09:33 首次发布

本文链接：https://blog.csdn.net/weixin_44360866/article/details/125932797

版权

pandas记录专栏收录该内容

5 篇文章 0 订阅

订阅专栏

1.索引排序

先使用set_index把某一列设置成dataframe的索引，然后根据索引排序。
set_index这个方法有inplace。
索引排序方法是sort_index，可以设置ascending参数，默认True是升序。

df = pd.read_csv(r'data\table.csv')
print(df.head(), '\n')
# 1.索引排序, 有inplace
print(df.set_index('Math').head(), '\n')  # 这只是设置索引
# 按照索引排序，可以设置ascending参数，默认True是升序
print(df.set_index('Math').sort_index(ascending=True).head())

  School Class    ID Gender   Address  Height  Weight  Math Physics
0    S_1   C_1  1101      M  street_1     173      63  34.0      A+
1    S_1   C_1  1102      F  street_2     192      73  32.5      B+
2    S_1   C_1  1103      M  street_2     186      82  87.2      B+
3    S_1   C_1  1104      F  street_2     167      81  80.4      B-
4    S_1   C_1  1105      F  street_4     159      64  84.8      B+ 

     School Class    ID Gender   Address  Height  Weight Physics
Math                                                            
34.0    S_1   C_1  1101      M  street_1     173      63      A+
32.5    S_1   C_1  1102      F  street_2     192      73      B+
87.2    S_1   C_1  1103      M  street_2     186      82      B+
80.4    S_1   C_1  1104      F  street_2     167      81      B-
84.8    S_1   C_1  1105      F  street_4     159      64      B+ 

     School Class    ID Gender   Address  Height  Weight Physics
Math                                                            
31.5    S_1   C_3  1301      M  street_4     161      68      B+
32.5    S_1   C_1  1102      F  street_2     192      73      B+
32.7    S_2   C_3  2302      M  street_5     171      88       A
33.8    S_1   C_2  1204      F  street_5     162      63       B
34.0    S_1   C_1  1101      M  street_1     173      63      A+

2.值排序

方法是sort_values。
单值排序：选定一列，根据这一列的值进行排序。
多值排序：先对第一层排，在第一层相同的情况下对第二层排序

print('============================')
print(df.sort_values(by='Class').head())  # 根据Class这一列的值排序
# 多值排序
# 先对第一层排，在第一层相同的情况下对第二层排序
print('====================================')
print(df.sort_values(by=['Address', 'Height']).head())

  School Class    ID Gender   Address  Height  Weight  Math Physics
0     S_1   C_1  1101      M  street_1     173      63  34.0      A+
19    S_2   C_1  2105      M  street_4     170      81  34.2       A
18    S_2   C_1  2104      F  street_5     159      97  72.2      B+
16    S_2   C_1  2102      F  street_6     161      61  50.6      B+
15    S_2   C_1  2101      M  street_7     174      84  83.3       C
====================================
   School Class    ID Gender   Address  Height  Weight  Math Physics
0     S_1   C_1  1101      M  street_1     173      63  34.0      A+
11    S_1   C_3  1302      F  street_1     175      57  87.7      A-
23    S_2   C_2  2204      M  street_1     175      74  47.2      B-
33    S_2   C_4  2404      F  street_2     160      84  67.7       B
3     S_1   C_1  1104      F  street_2     167      81  80.4      B-

问题

value_counts()方法不会计算缺省值。
如果有多个索引同时取到最大值，idxmax不会返回所有这些索引，返回的是最靠前的那个索引。
axis=0，表示跨行，行动起来。axis=1表示跨列。比如mean默认axis=0，就会每一列返回一个均值。
对值进行排序后，相同的值次序由原索引顺序决定。

import pandas as pd
import numpy as np
import operator

game_throne = pd.read_csv(r'data\Game_of_Thrones_Script.csv')
print(game_throne.head(), '\n')
print(game_throne.columns, '\n')
print('gt_shape:', game_throne.shape, '\n')
print("一共出现的人物数量是：")
print(game_throne['Name'].nunique(), '\n')

print("说话最多的人：")
print(game_throne['Name'].value_counts().index[0],'\n')


# 感觉apply比较好用，先添加一列，算每一行的单词数
game_throne['new_nwords'] = game_throne['Sentence'].apply(lambda x: len(x.split(' ')))
# 先将需要的两列提取出来
name_words = list(zip(game_throne['Name'], game_throne['new_nwords']))
# 遍历上面的列表，将每个人的单词书存在字典里
words_count = {}
for x in name_words:
    words_count[x[0]] = words_count.get(x[0], 0) + x[1]
# 对字典进行排序
words_man = sorted(words_count.items(), key=operator.itemgetter(1), reverse=True)[0][0]
print("说单词最多的人：")
print(words_man)

0    2011/4/17  ...  What do you expect? They're savages. One lot s...
1    2011/4/17  ...  I've never seen wildlings do a thing like this...
2    2011/4/17  ...                             How close did you get?
3    2011/4/17  ...                            Close as any man would.
4    2011/4/17  ...                   We should head back to the wall.

[5 rows x 6 columns] 

Index(['Release Date', 'Season', 'Episode', 'Episode Title', 'Name',
       'Sentence'],
      dtype='object') 

gt_shape: (23911, 6) 

一共出现的人物数量是：
564 

说话最多的人：
tyrion lannister 

说单词最多的人：
tyrion lannister