python分层索引
frame=pd.DataFrame({'a':range(7),'b':range(7,0,-1),'c':['one','one','one','two','two''two','two'],'d':[0,1,2,0,1,2,3]})
# set_index:分层索引(类比透视表中的行标签)
frame1=frame.set_index(['c','d'])
print(frame1)
# 与上一个的区别在于,drop=False保留了c,d的值
frame2=frame.set_index(['c','d'],drop=False)
print(frame2)
# reset_index:是set_index的反函数,只作用于没有drop=False的set_index操作
frame3=frame1.reset_index()
print(frame3)
frame
a b c d
0 0 7 one 0
1 1 6 one 1
2 2 5 one 2
3 3 4 two 0
4 4 3 two 1
5 5 2 two 2
6 6 1 two 3
##################
frame1
a b
c d
one 0 0 7
1 1 6
2 2 5
two 0 3 4
1 4 3
2 5 2
3 6 1
a b c d
#################
frame2
c d
one 0 0 7 one 0
1 1 6 one 1
2 2 5 one 2
two 0 3 4 two 0
1 4 3 two 1
2 5 2 two 2
3 6 1 two 3
##################
frame3:
c d a b
0 one 0 0 7
1 one 1 1 6
2 one 2 2 5
3 two 0 3 4
4 two 1 4 3
5 two 2 5 2
6 two 3 6 1
Process finished with exit code 0
根据索引合并
# 根据索引合并
left1=pd.DataFrame({'key':['a','b','a','a','b','c'],'value':range(6)})
right1=pd.DataFrame({'group_val':[3.5,7]},index=['a','b'])
print(left1)
print(right1)
pds=pd.merge(left1,right1,left_on='key',right_index=True)
print(pds)
key value
0 a 0
1 b 1
2 a 2
3 a 3
4 b 4
5 c 5
group_val
a 3.5
b 7.0
key value group_val
0 a 0 3.5
2 a 2 3.5
3 a 3 3.5
1 b 1 7.0
4 b 4 7.0
Process finished with exit code 0
联合与合并数据集
df3=pd.DataFrame({'lkey':['b','b','a','c','a','a','b'],'data1':range(7)})
df4=pd.DataFrame({'rkey':['a','b','d'],'data2':range(3)})
print(df3)
print(df4)
# 如果每个对象的列名是不同的,可以分别为它们指定列名
pf5=pd.merge(df3,df4,left_on='lkey',right_on='rkey')
print(pf5)
data1 lkey
0 0 b
1 1 b
2 2 a
3 3 c
4 4 a
5 5 a
6 6 b
data2 rkey
0 0 a
1 1 b
2 2 d
data1 lkey data2 rkey
0 0 b 1 b
1 1 b 1 b
2 6 b 1 b
3 2 a 0 a
4 4 a 0 a
5 5 a 0 a
Process finished with exit code 0
沿轴向连接
arr=np.arange(12).reshape((3,4))
print(arr)
print(np.concatenate([arr,arr],axis=1))
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[ 0 1 2 3 0 1 2 3]
[ 4 5 6 7 4 5 6 7]
[ 8 9 10 11 8 9 10 11]]
Process finished with exit code 0
重塑和透视
data=pd.DataFrame(np.arange(6).reshape((2,3)),index=pd.Index(['Ohio','Colorado'],name='state'),columns=pd.Index(['one','two','three'],name='number'))
print(data)
result=data.stack()
print(result)
# stack是将列名转到行名上去,就像透视表那样变成多个行标签;unstack是反步骤。里面可以说具体的索引名,对齐进行操作,一边默认的是最靠近数据的索引,0表示对最外层(第一层)操作。
print(result.unstack())
print(result.unstack(0))
df=pd.DataFrame({'left':result,'right':result+5},columns=pd.Index(['left','right'],name='side'))
print(df)
print(df.unstack('state'))
print(df.unstack('state').stack('side'))
# 你也可以指定列的子集作为值列
dd=pd.melt(df,id_vars=['key'],value_vars=['A','B'])
print(dd)
number one two three
state
Ohio 0 1 2
Colorado 3 4 5
state number
Ohio one 0
two 1
three 2
Colorado one 3
two 4
three 5
dtype: int32
number one two three
state
Ohio 0 1 2
Colorado 3 4 5
state Ohio Colorado
number
one 0 3
two 1 4
three 2 5
Process finished with exit code 0
数据集变长变短
df=pd.DataFrame({'key':['foo','bar','baz'],'A':[1,2,3],'B':[4,5,6],'C':[7,8,9]})
print(df)
# pd.melt:将多列合并成一列,产生一个新的DataFrame,其长度输入更长
melted=pd.melt(df,['key'])
print(melted)
# pivot是melt的反操作
reshaped=melted.pivot('key','variable','value')
print(reshaped)
# 还需要一步操作
print(reshaped.reset_index())
A B C key
0 1 4 7 foo
1 2 5 8 bar
2 3 6 9 baz
key variable value
0 foo A 1
1 bar A 2
2 baz A 3
3 foo B 4
4 bar B 5
5 baz B 6
6 foo C 7
7 bar C 8
8 baz C 9
variable A B C
key
bar 2 5 8
baz 3 6 9
foo 1 4 7
variable key A B C
0 bar 2 5 8
1 baz 3 6 9
2 foo 1 4 7
#你也可以指定列的子集作为值列
dd=pd.melt(df,id_vars=[‘key’],value_vars=[‘A’,‘B’])
print(dd)
key variable value
0 foo A 1
1 bar A 2
2 baz A 3
3 foo B 4
4 bar B 5
5 baz B 6
Process finished with exit code 0