pandas分组:
groupby方法
按指定列分组,返回每个组对应数的索引
data = {
'name':['zhangsan','lisi','wangwu','liliu','tom','john'],
'class':['one','two','three','one','two','three'],
'score':[54,56,57,58,59,80]
}
df = pd.DataFrame(data)
print(df)
'''
name class score
0 zhangsan one 54
1 lisi two 56
2 wangwu three 57
3 liliu one 58
4 tom two 59
5 john three 80
'''
print(df.groupby('class').groups)
'''
{'one': [0, 3], 'three': [2, 5], 'two': [1, 4]}
'''
df.agg()方法对分组聚合
print(df.groupby('class').agg(np.mean))
'''
score
class
one 56.0
three 68.5
two 57.5
'''
transform方法,与原数据形状一直的聚合数据
df['score_percent'] = df.groupby('class')['score'].transform(np.sum)
df['score_percent'] = df['score']/df['score_percent']
print(df)
'''
name class score score_percent
0 zhangsan one 54 0.482143
1 lisi two 56 0.486957
2 wangwu three 57 0.416058
3 liliu one 58 0.517857
4 tom two 59 0.513043
5 john three 80 0.583942
'''
pandas连接合并
pandas数据帧的连接,与数据库中表的连接方式类似
水平连接
merge方法:on参数来设置连接条件
data = {
'name':['zhangsan','lisi','wangwu','liliu','tom','john'],
'class':['one','two','three','one','two','three'],
'score':[54,56,57,58,59,80]
}
df = pd.DataFrame(data)
data1 = {
'name':['zhangsan','lisi','wangwu','liliu','tom','john'],
'math_score':[80,80,87,56,90,70]
}
df1 = pd.DataFrame(data1)
#将df与df1连接,连接条件'name'
df = df.merge(df1,on='name')
print(df)
'''
name class score score_percent math_score
0 zhangsan one 54 0.482143 80
1 lisi two 56 0.486957 80
2 wangwu three 57 0.416058 87
3 liliu one 58 0.517857 56
4 tom two 59 0.513043 90
5 john three 80 0.583942 70
'''
通过merge方法中的how参数,'left','right','outer',设置左连接,右连接,外连接
垂直连接:
concat方法
df = pd.concat([df,df])
print(df)
'''
name class score score_percent math_score
0 zhangsan one 54 0.482143 80
1 lisi two 56 0.486957 80
2 wangwu three 57 0.416058 87
3 liliu one 58 0.517857 56
4 tom two 59 0.513043 90
5 john three 80 0.583942 70
0 zhangsan one 54 0.482143 80
1 lisi two 56 0.486957 80
2 wangwu three 57 0.416058 87
3 liliu one 58 0.517857 56
4 tom two 59 0.513043 90
5 john three 80 0.583942 70
'''