一、用字典表创建Series数据一维
d = {'a': 0, 'b': 1, 'd':3}
s = pd.Series(d, index=list('abcd'))
特性:
可以进行索引
print(s[0])
print(s[:2])
print(s[1:3])
-0.887761065211812
0 -0.887761
1 0.904833
dtype: float64
1 0.904833
2 -0.525255
dtype: float64
标签对齐
s1 = pd.Series(np.random.randn(3), index=['a', 'c', 'e'])
s2 = pd.Series(np.random.randn(3), index=['a', 'd', 'e'])
print('{0}\n\n{1}'.format(s1,s2))
a 0.737636
c 1.285566
e -0.011916
dtype: float64
a -1.239792
d 0.393916
e 1.061057
将s1+s2,a与a相加,e与e相加,但没有相同的c,d,就返回NAN
a -1.515469
c NaN
d NaN
e -1.398320
二、DataFrame可以想象成一个字典,每一行或者每一列都是一个Series二维
d = {'one': pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two': pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
修改行索引
df = pd.DataFrame(d, index=['a', 'd', 'c'])
修改列索引
df = pd.DataFrame(d, columns=['two', 'four'])
由列表构成的结构数据
个数必须一样
d = {'one': [1, 2, 3, 4],
'two': [21, 22, 23, 24]}
df = pd.DataFrame(d)
one two
0 1 21
1 2 22
2 3 23
3 4 24
由元组构成的结构数据
d = [(1, 2.2, 'Hello'), (2, 3., "world")]
df = pd.DataFrame(d)
0 1 2
0 1 2.2 Hello
1 2 3.0 world
创建复杂的数据结构
d = {('a', 'b'): {('A', 'B'): 1, ('A', 'C'): 2},
('a', 'a'): {('A', 'C'): 3, ('A', 'B'): 4},
('a', 'c'): {('A', 'B'): 5, ('A', 'C'): 6},
('b', 'a'): {('A', 'C'): 7, ('A', 'B'): 10}}
df = pd.DataFrame(d)
print(df)
a b
b a c a
A B 1 4 5 10
C 2 3 6 7
三、
df = pd.DataFrame(pd.DataFrame(np.random.randn(6, 4), columns=['one', 'two', 'three', 'four']))
将第一列替换成,第二列与第四列相加的值
df['one']=df['two']+df['four']
删除一列
del df['three']
print(df)
one two four
0 1.051110 1.415449 -0.364339
1 1.344168 0.992644 0.351524
2 -1.459153 -0.801384 -0.657768
3 1.092571 1.420621 -0.328050
4 1.119014 0.678703 0.440311
5 -3.670993 -1.872445 -1.798548
增加一列,大于0.2
df['flag'] = df['one'] > 0.2
one two three four flag
0 -1.501718 0.178474 2.304388 -1.680193 False
1 1.554432 1.467871 -1.819724 0.086561 True
2 -1.961002 0.095922 -1.456745 -2.056924 False
3 -0.314674 0.066831 -1.394350 -0.381506 False
4 -0.375624 -0.669872 -1.059675 0.294248 False
5 0.528540 0.625411 -1.994278 -0.096871 True
df['five'] =5
one two three four five
0 -1.692373 -0.793184 -1.022882 -0.899188 5
1 -2.281554 -0.200473 1.978368 -2.081081 5
2 2.077127 0.431241 0.361226 1.645886 5
3 1.991769 1.382926 -0.584086 0.608842 5
4 0.446861 0.076911 -1.591722 0.369950 5
5 1.833202 0.089428 -0.883797 1.743774 5
使用pop的方法删除
df.pop('four')
one two three
0 2.895448 1.925215 -1.078841
1 -0.204535 -1.123640 0.509472
2 0.542504 0.501191 -0.399165
3 -3.501155 -1.851473 0.259770
4 -0.622400 -0.326245 1.275971
5 2.400736 -0.090141 -0.078183
在一列的位置插入一列,就是第二列
df.insert(1, 'bar', df['one']+df['two'])
print(df)
one bar two three four
0 -0.101852 0.333351 0.435203 2.111786 0.643502
1 -0.123245 0.047569 0.170814 -1.519405 -1.135779
2 -1.913974 -2.248223 -0.334249 0.912003 -0.364429
3 -0.852416 -0.821420 0.030996 0.124609 -0.096879
4 1.445873 2.279212 0.833339 -0.606428 1.170697
5 1.511618 4.677563 3.165944 0.486133 -0.490859
用assign增加新的一列,它是复制一个副本,原本的df没有改变
print(df.assign(Ratio= df['one']/df['two']))
one two three four Ratio
0 1.674716 0.350798 0.561203 0.177807 4.774023
1 0.309234 0.509702 0.394534 -1.411585 0.606696
2 0.534071 -0.457463 0.224305 -1.268856 -1.167464
3 0.442194 0.816737 -0.315137 0.772761 0.541415
4 1.477292 -0.456929 -1.425932 0.172979 -3.233087
5 0.362322 -1.688482 0.160745 2.358249 -0.214584
用函数进行计算
print(df.assign(Ratio = lambda x: x.one-x.two))
三、panel 三维数据结构
d = {'Item1': pd.DataFrame(np.random.randn(4, 3)),
'Item2': pd.DataFrame(np.random.randn(4, 2))}
pn = pd.Panel(d)
print(pn['Item2'])