本文是学习《利用Python进行数据分析》的部分笔记,在这里感谢作者
1,使用DataFrame的列:
DataFrame的set_index函数会将其一个或者多个列转换为行索引,并创建一个新的DataFrame:
frame=DataFrame({'a':range(7),'b':range(7,0,-1),'c':['one','one','one','two','two','two','two'],'d':[0,1,2,0,1,2,3]})
frame
Out[70]:
a b c d
0 0 7 one 0
1 1 6 one 1
2 2 5 one 2
3 3 4 two 0
4 4 3 two 1
5 5 2 two 2
6 6 1 two 3
frame2=frame.set_index(['c','d'])
frame2
Out[72]:
a b
c d
one 0 0 7
1 1 6
2 2 5
two 0 3 4
1 4 3
2 5 2
3 6 1
默认情况下,那些列会从DataFrame中移除,但也可以将其保留下来:
frame.set_index(['c','d'],drop=False)
Out[74]:
a b c d
c d
one 0 0 7 one 0
1 1 6 one 1
2 2 5 one 2
two 0 3 4 two 0
1 4 3 two 1
2 5 2 two 2
3 6 1 two 3
reset_index的功能与set_index刚好相反,层次化索引的级别会被转移到列里面:
frame2.reset_index()
Out[75]:
c d a b
0 one 0 0 7
1 one 1 1 6
2 one 2 2 5
3 two 0 3 4
4 two 1 4 3
5 two 2 5 2
6 two 3 6 1
轴向连接:
s1=Series([0,1],index=['a','b'])
s2=Series([2,3,4],index=['c','d','e'])
s3=Series([5,6],index=['f','g'])
pd.concat([s1,s2,s3])
Out[79]:
a 0
b 1
c 2
d 3
e 4
f 5
g 6
dtype: int64
默认情况下,concat是在axis=0上进行工作的,最终产生一个新的Series,如果传入axis=1,则结果就会变成一个DataFrame(axis=1是列)
pd.concat([s1,s2,s3],axis=1)
Out[80]:
0 1 2
a 0.0 NaN NaN
b 1.0 NaN NaN
c NaN 2.0 NaN
d NaN 3.0 NaN
e NaN 4.0 NaN
f NaN NaN 5.0
g NaN NaN 6.0
传入join=‘inner’即可得到他们的交集:
s4=pd.concat([s1*5,s3])
s4
Out[82]:
a 0
b 5
f 5
g 6
dtype: int64
pd.concat([s1,s4],axis=1,join='inner')
Out[83]:
0 1
a 0 0
b 1 5
stack:将数据的列旋转为行
unstack:将数据的行旋转为列
替换值:
replace函数提供了一种实现替换功能的方法:
data=Series([1,-999,2,-999,-1000,3])
data.replace(-999,np.nan)
Out[86]:
0 1.0
1 NaN
2 2.0
3 NaN
4 -1000.0
5 3.0
dtype: float64
如果希望进行不同的替换,则传入的是两个替换关系组成的列表:
data.replace([-999,1000],[np.nan,0])
Out[88]:
0 1.0
1 NaN
2 2.0
3 NaN
4 -1000.0
5 3.0
dtype: float64
哑编码:
df=DataFrame({'key':['b','b','a','c','a','b'],'data1':range(6)})
df
Out[93]:
data1 key
0 0 b
1 1 b
2 2 a
3 3 c
4 4 a
5 5 b
pd.get_dummies(df['key'],prefix='key')
Out[95]:
key_a key_b key_c
0 0.0 1.0 0.0
1 0.0 1.0 0.0
2 1.0 0.0 0.0
3 0.0 0.0 1.0
4 1.0 0.0 0.0
5 0.0 1.0 0.0