1.简介
DataFrame提供的是一个类似表的结构,由多个Series组成,而Series在DataFrame中叫columns,即:DataFrame提供的是非一维矩阵。如图所示index:索引;a,b均为column,等同于两个个Series组成该矩阵。小白一枚,理解有错望指导改正。
2.部分操作
2.1 numpy中的array与pandas中的Series得到的结果是一致。
程序:
#coding:utf-8
import pandas as pd
import numpy as np
s1 = np.array([2,3,4,5])
s2 = np.array([5,6,7,8])
print pd.DataFrame([s1,s2])
print '*'*40
t1 = pd.Series([2,3,4,5])
t2 = pd.Series([5,6,7,8])
print pd.DataFrame([t1,t2])
结果:
2.2 value为Series的字典结构;
程序:
#coding:utf-8
import pandas as pd
import numpy as np
s1 = np.array([2,3,4,5])
s2 = np.array([5,6,7,8])
print pd.DataFrame({'A':s1,'B':s2})
结果:
注:若创建使用的参数中,array、Series长度不一样时,对应index的value值若不存在则为NaN
2.3 if-then操作 (.ix[]):
.ix[条件,then执行的区域]
eg:
#coding:utf-8
import pandas as pd
import numpy as np
s1 = np.array([2,3,4,5])
s2 = np.array([5,6,7,8])
m = pd.DataFrame({'A':s1,'B':s2},index=['A','B','C','D'])
m.ix[m.A>2,'B'] = -2
print m
2.4 numpy.where()操作
numpy.where(条件,then,else)
eg:
#coding:utf-8
import pandas as pd
import numpy as np
s1 = np.array([2,3,4,5])
s2 = np.array([5,6,7,8])
m = pd.DataFrame({'A':s1,'B':s2},index=['A','B','C','D'])
m["then"] = np.where(m.A>2,5,0)
print m
2.5 根据条件选择DataFrame
2.5.1 直接取值
eg:
#coding:utf-8
import pandas as pd
import numpy as np
s1 = np.array([2,3,4,5])
s2 = np.array([5,6,7,8])
m = pd.DataFrame({'A':s1,'B':s2},index=['A','B','C','D'])
t = m[m.A>=3]
s = m.loc[m.A>=3]
print t
print '@'*20
print s
s和t显示的是结果一致。
2.6 groupby 形成group
d = pd.DataFrame({'animal': 'cat dog cat fish dog cat cat'.split(),
'size': list('SSMMMLL'),
'weight': [8, 10, 11, 1, 20, 12, 12],
'adult' : [False] * 5 + [True] * 2});
#列出动物中weight最大的对应size
group=d.groupby("animal").apply(lambda subf: subf['size'][subf['weight'].idxmax()])
print group
df = pd.DataFrame({'animal': 'cat dog cat fish dog cat dog'.split(),
'size': list('SSMMMLL'),
'weight': [8, 10, 11, 1, 20, 12, 12],
'adult' : [False] * 4 + [True] * 3});
group=df.groupby("animal")
dog = group.get_group("dog")
print dog