数据的合并:
join:默认情况下把行索引相同的数据合并到一起
import pandas as pd
import numpy as np
n1 = pd.DataFrame(np.arange(12).reshape(4, 3), columns=list('abc'))
n2 = pd.DataFrame(np.arange(10).reshape(5, 2), columns=list('XY'))
print(n1)
print('*'*20)
print(n2)
print('*'*20)
print(n1.join(n2))
print('*'*20)
print(n2.join(n1))
a b c
0 0 1 2
1 3 4 5
2 6 7 8
3 9 10 11
********************
X Y
0 0 1
1 2 3
2 4 5
3 6 7
4 8 9
********************
a b c X Y
0 0 1 2 0 1
1 3 4 5 2 3
2 6 7 8 4 5
3 9 10 11 6 7
********************
X Y a b c
0 0 1 0.0 1.0 2.0
1 2 3 3.0 4.0 5.0
2 4 5 6.0 7.0 8.0
3 6 7 9.0 10.0 11.0
4 8 9 NaN NaN NaN
merge:按照指定的列把数据按照一定的方式合并到一起
inner为并集(默认),outer为交集nan补全,left以左边为准nan补全,right以右边为准nan补全
n3 = n1.merge(n2, left_on='a', right_on='X', how='inner')
print(n3)
a b c X Y
0 0 1 2 0 1
1 6 7 8 6 7
n3 = n1.merge(n2, left_on='a', right_on='X', how='outer')
print(n3)
a b c X Y
0 0.0 1.0 2.0 0.0 1.0
1 3.0 4.0 5.0 NaN NaN
2 6.0 7.0 8.0 6.0 7.0
3 9.0 10.0 11.0 NaN NaN
4 NaN NaN NaN 2.0 3.0
5 NaN NaN NaN 4.0 5.0
6 NaN NaN NaN 8.0 9.0
n3 = n1.merge(n2, left_on='a', right_on='X', how='left')
print(n3)
a b c X Y
0 0 1 2 0.0 1.0
1 3 4 5 NaN NaN
2 6 7 8 6.0 7.0
3 9 10 11 NaN NaN
n3 = n1.merge(n2, left_on='a', right_on='X', how='right')
print(n3)
a b c X Y
0 0.0 1.0 2.0 0 1
1 NaN NaN NaN 2 3
2 NaN NaN NaN 4 5
3 6.0 7.0 8.0 6 7
4 NaN NaN NaN 8 9
数据的分组聚合:
可以用.groupby()分组,然后用.count()计数
同时根据多个对象分组时,可以用.groupby([])
索引方法和属性:
获取index:.index
指定index :.index = []
重新设置index : .reindex()
指定某一列作为index :.set_index()
返回index的唯一值:.set_index().index.unique()
可以用.swaplevel()调换index的位置