pandas数据操作
字符串方法
Series对象在其str属性中配备了一组字符串处理方法,可以很容易的应用到数组中的每个元素
import numpy as np
import pandas as pd
t = pd.Series(['a_b_c_d','c_d_e',np.nan,'f_g_h'])
t
0 a_b_c_d
1 c_d_e
2 NaN
3 f_g_h
dtype: object
t.str.cat(['A','B','C','D'],sep=',')
0 a_b_c_d,A
1 c_d_e,B
2 NaN
3 f_g_h,D
dtype: object
t.str.split('_')
0 [a, b, c, d]
1 [c, d, e]
2 NaN
3 [f, g, h]
dtype: object
t.str.get(0)
0 a
1 c
2 NaN
3 f
dtype: object
t.str.replace("_", ".")
0 a.b.c.d
1 c.d.e
2 NaN
3 f.g.h
dtype: object
t.str.pad(10, fillchar="?")
0 ???a_b_c_d
1 ?????c_d_e
2 NaN
3 ?????f_g_h
dtype: object
t.str.pad(10, side="right", fillchar="?")
0 a_b_c_d???
1 c_d_e?????
2 NaN
3 f_g_h?????
dtype: object
t.str.center(10, fillchar="?")
0 ?a_b_c_d??
1 ??c_d_e???
2 NaN
3 ??f_g_h???
dtype: object
t.str.find('d')
0 6.0
1 2.0
2 NaN
3 -1.0
dtype: float64
t.str.rfind('d')
0 6.0
1 2.0
2 NaN
3 -1.0
dtype: float64
数据转置(行列转换)
dates = pd.date_range('20130101',periods=10)
dates
DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
'2013-01-05', '2013-01-06', '2013-01-07', '2013-01-08',
'2013-01-09', '2013-01-10'],
dtype='datetime64[ns]', freq='D')
df = pd.DataFrame(np.random.randn(10,4),index=dates,columns=['A','B','C','D'])
df.head()
| A | B | C | D |
---|
2013-01-01 | -0.665173 | 0.516813 | 0.745156 | -0.303295 |
---|
2013-01-02 | -0.953574 | 2.125147 | 0.238382 | -0.400209 |
---|
2013-01-03 | -0.233966 | 2.066662 | 0.331000 | -2.802471 |
---|
2013-01-04 | 2.038273 | 0.982127 | -1.096000 | -1.051818 |
---|
2013-01-05 | -1.438657 | -1.208042 | -0.375673 | 0.384522 |
---|
df.head().T
| 2013-01-01 00:00:00 | 2013-01-02 00:00:00 | 2013-01-03 00:00:00 | 2013-01-04 00:00:00 | 2013-01-05 00:00:00 |
---|
A | -0.665173 | -0.953574 | -0.233966 | 2.038273 | -1.438657 |
---|
B | 0.516813 | 2.125147 | 2.066662 | 0.982127 | -1.208042 |
---|
C | 0.745156 | 0.238382 | 0.331000 | -1.096000 | -0.375673 |
---|
D | -0.303295 | -0.400209 | -2.802471 | -1.051818 | 0.384522 |
---|
对数据应用function
df.head().apply(np.cumsum)
| A | B | C | D |
---|
2013-01-01 | -0.665173 | 0.516813 | 0.745156 | -0.303295 |
---|
2013-01-02 | -1.618747 | 2.641960 | 0.983537 | -0.703504 |
---|
2013-01-03 | -1.852713 | 4.708622 | 1.314537 | -3.505975 |
---|
2013-01-04 | 0.185560 | 5.690749 | 0.218537 | -4.557793 |
---|
2013-01-05 | -1.253098 | 4.482707 | -0.157135 | -4.173271 |
---|
频率
计算值出现的次数,类似直方图
s = pd.Series(np.random.randint(0, 7, size=10))
s
0 3
1 3
2 1
3 6
4 3
5 3
6 5
7 2
8 1
9 0
dtype: int32
s.value_counts()
3 4
1 2
6 1
5 1
2 1
0 1
dtype: int64