pandas中数据可以分为series,dataframe,panel分别表示一维至三维数据。
其中在构造时,index表示行名,columns表示列名
series:
构造方式
s = pd.Series(data, index=index)
s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])
或者以字典的形式
In [7]: d = {'a' : 0., 'b' : 1., 'c' : 2.}
In [8]: pd.Series(d)
Out[8]:
a 0.0
b 1.0
c 2.0
dtype: float64
In [9]: pd.Series(d, index=['b', 'c', 'd', 'a'])
Out[9]:
b 1.0
c 2.0
d NaN
a 0.0
dtype: float64
series的提取方式
In [11]: s[0] #提取一个
Out[11]: 0.46911229990718628
In [12]: s[:3] #提起开始至第三行
Out[12]:
a 0.4691
b -0.2829
c -1.5091
dtype: float64
In [13]: s[s > s.median()] #按要求提取
Out[13]:
a 0.4691
e 1.2121
dtype: float64
In [14]: s[[4, 3, 1]]
Out[14]:
e 1.2121
d -1.1356
b -0.2829
dtype: float64
In [15]: np.exp(s)
Out[15]:
a 1.5986
b 0.7536
c 0.2211
d 0.3212
e 3.3606
dtype: float64
dataframe:
构造方式
In [32]: d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']),
....: 'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
....:
In [33]: df = pd.DataFrame(d)
In [34]: df
Out[34]:
one two
a 1.0 1.0
b 2.0 2.0
c 3.0 3.0
d NaN 4.0
In [35]: pd.DataFrame(d, index=['d', 'b', 'a'])
Out[35]:
one two
d NaN 4.0
b 2.0 2.0
a 1.0 1.0
In [36]: pd.DataFrame(d, index=['d', 'b', 'a'], columns=['two', 'three'])
Out[36]:
two three
d 4.0 NaN
b 2.0 NaN
a 1.0 NaN
提取或按要求添加字段
In [56]: df['one']
Out[56]:
a 1.0
b 2.0
c 3.0
d NaN
Name: one, dtype: float64
In [57]: df['three'] = df['one'] * df['two']
In [58]: df['flag'] = df['one'] > 2
In [59]: df
Out[59]:
one two three flag
a 1.0 1.0 1.0 False
b 2.0 2.0 4.0 False
c 3.0 3.0 9.0 True
d NaN 4.0 NaN False