DataFrame

DataFrame官网参考API资料

DataFrame

DataFrame 是一个表格型的数据结构,它含有一组有序的列,每列可以是不同的值类型(数值,字符串,布尔值等)
DataFrame 即有行索引也有列索引,它可以被看作由Series组成的字典(共用一个索引)

创建 DataFrame

from pandas import DataFrame
data = {'state':['Ohio','Ohio','Ohio','Nevada','Nevada'],
        'year':[2000,2001,2002,2001,2002],
        'pop':[1.5,1.7,3.6,2.4,2.9]}
frame = DataFrame(data)
frame
popstateyear
01.5Ohio2000
11.7Ohio2001
23.6Ohio2002
32.4Nevada2001
42.9Nevada2002

DataFrame的列按照指定顺序进行排序

DataFrame(data,columns=['year','state','pop'])
yearstatepop
02000Ohio1.5
12001Ohio1.7
22002Ohio3.6
32001Nevada2.4
42002Nevada2.9

索引重命名

DataFrame(data,columns=['year','state','pop'],index=['one','two','three','four','five'])
yearstatepop
one2000Ohio1.5
two2001Ohio1.7
three2002Ohio3.6
four2001Nevada2.4
five2002Nevada2.9

创建空列

传入的列在数据中找不到,会产生NaN值

DataFrame(data,columns=['year','state','pop','debt'],index=['one','two','three','four','five'])
yearstatepopdebt
one2000Ohio1.5NaN
two2001Ohio1.7NaN
three2002Ohio3.6NaN
four2001Nevada2.4NaN
five2002Nevada2.9NaN

列之间进行对比,创建 布尔值列

frame = DataFrame(data,columns=['year','state','pop','debt'],index=['one','two','three','four','five'])
frame['eastern'] = frame.state == 'Ohio'
frame
yearstatepopdebteastern
one2000Ohio1.5NaNTrue
two2001Ohio1.7NaNTrue
three2002Ohio3.6NaNTrue
four2001Nevada2.4NaNFalse
five2002Nevada2.9NaNFalse

通过字典嵌套(字典的字典) 进行创建

外层字典的键作为列,内层键作为行

pop = {'Nevada':{2001:2.4,2002:2.9},
      'Ohio':{2000:1.5,2001:1.7,2002:3.6}}
frame = DataFrame(pop)
frame
NevadaOhio
2000NaN1.5
20012.41.7
20022.93.6
pop = {'Nevada':{2001:2.4,2002:2.9},
      'Ohio':{2000:1.5,2001:1.7,2002:3.6}}
frame = DataFrame(pop,index=[2001,2002,2003])
frame
NevadaOhio
20012.41.7
20022.93.6
2003NaNNaN
pop = {'Nevada':{2001:2.4,2002:2.9},
      'Ohio':{2000:1.5,2001:1.7,2002:3.6}}
frame = DataFrame(pop)
pdata = {'Ohio':frame['Ohio'][:-1],
         'Nevada':frame['Nevada'][:2]}
DataFrame(pdata)
NevadaOhio
2000NaN1.5
20012.41.7

给索引 赋名

pop = {'Nevada':{2001:2.4,2002:2.9},
      'Ohio':{2000:1.5,2001:1.7,2002:3.6}}
frame = DataFrame(pop)
frame.index.name = 'year'
frame
NevadaOhio
year
2000NaN1.5
20012.41.7
20022.93.6

给列 赋名

frame.columns.name = 'state'
frame
stateNevadaOhio
year
2000NaN1.5
20012.41.7
20022.93.6

转置

pop = {'Nevada':{2001:2.4,2002:2.9},
      'Ohio':{2000:1.5,2001:1.7,2002:3.6}}
frame = DataFrame(pop)
frame.T
200020012002
NevadaNaN2.42.9
Ohio1.51.73.6

.values 属性以二维ndarray形式返回DataFrame中数据

pop = {'Nevada':{2001:2.4,2002:2.9},
      'Ohio':{2000:1.5,2001:1.7,2002:3.6}}
frame = DataFrame(pop)
frame.values
array([[nan, 1.5],
       [2.4, 1.7],
       [2.9, 3.6]])

删除列值 del

pop = {'Nevada':{2001:2.4,2002:2.9},
      'Ohio':{2000:1.5,2001:1.7,2002:3.6}}
frame = DataFrame(pop)
del frame['Ohio']
frame
Nevada
2000NaN
20012.4
20022.9

索取

获取列值

frame = DataFrame(data,columns=['year','state','pop','debt'],index=['one','two','three','four','five'])
frame['state']

one        Ohio
two        Ohio
three      Ohio
four     Nevada
five     Nevada
Name: state, dtype: object
frame.year
one      2000
two      2001
three    2002
four     2001
five     2002
Name: year, dtype: int64

获取所有列名 .columns

frame.columns 
Index(['year', 'state', 'pop', 'debt'], dtype='object')

获取所有索引名 .index

frame.index
Index(['one', 'two', 'three', 'four', 'five'], dtype='object')

获取行值

frame = DataFrame(data,columns=['year','state','pop','debt'],index=['one','two','three','four','five'])
frame.ix['three']
/Users/wuyihong/anaconda2/envs/python35/lib/python3.5/site-packages/ipykernel/__main__.py:2: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  from ipykernel import kernelapp as app





year     2002
state    Ohio
pop       3.6
debt      NaN
Name: three, dtype: object

赋值

frame = DataFrame(data,columns=['year','state','pop','debt'],index=['one','two','three','four','five'])
frame['debt'] = 16.5
frame
yearstatepopdebt
one2000Ohio1.516.5
two2001Ohio1.716.5
three2002Ohio3.616.5
four2001Nevada2.416.5
five2002Nevada2.916.5
import numpy as np
frame['debt']=np.arange(5)
frame
yearstatepopdebt
one2000Ohio1.50
two2001Ohio1.71
three2002Ohio3.62
four2001Nevada2.43
five2002Nevada2.94

将 Series 赋值给 DataFrame

赋值的是一个Series,会精确匹配DataFrame的索引,所有的空位都将被填上缺失值

from pandas import Series 
val = Series([-1.2,-1.5,-1.7],index=['two','four','five'])
frame['debt'] = val
frame
yearstatepopdebt
one2000Ohio1.5NaN
two2001Ohio1.7-1.2
three2002Ohio3.6NaN
four2001Nevada2.4-1.5
five2002Nevada2.9-1.7

索引对象

Index 对象是不可修改的,这样才能使Index对象在多个数据结构之间安全共享

from pandas import Seriesies
obj = Series(range(3),index=['a','b','c'])
obj
a    0
b    1
c    2
dtype: int64
index = obj.index 
index
Index(['a', 'b', 'c'], dtype='object')
index[1:]
Index(['b', 'c'], dtype='object')

pd.Index()

最泛化的Index对象,将轴标签表示为一个由python对象组成的NumPy数组

import numpy as np
import pandas as pd
pd.Index(np.arange(3))
Int64Index([0, 1, 2], dtype='int64')
index = pd.Index(np.arange(3))
obj = Series([1.5,-2.5,0],index=index)
obj
0    1.5
1   -2.5
2    0.0
dtype: float64
obj.index is index
True

用逻辑变量 返回索引所包含的数据

from pandas import DataFrame,Series
pop = {'Nevada':{2001:2.4,2002:2.9},
      'Ohio':{2000:1.5,2001:1.7,2002:3.6}}
frame = DataFrame(pop)
frame.index.name = 'year'
frame.columns.name = 'state'
frame
stateNevadaOhio
year
2000NaN1.5
20012.41.7
20022.93.6
'Ohio' in frame.columns
True
2003 in frame.index
False

基本功能

.reindex

其作用是创建一个适应新索引的新对象
参数

index用作索引的新序列。即可以是index实例,也可以是其他序列型的python数据结构。
index会被完全使用,就像没有任何复制一样
method 插值(填充)方法
fill_value 在重索引的过程中,需要引入缺失值时使用的代替值
limit 前向或后向填充时的最大填充量
level在MultiIndex的指定级别上匹配简答索引,否则选取其子集
copy 默认为True,无论如何都复制;如果为False,则新旧相等就不复制
obj = Series([4.5,7.2,-5.3,3.6],index=['d','b','a','c'])
obj
d    4.5
b    7.2
a   -5.3
c    3.6
dtype: float64
obj = obj.reindex(['a','b','c','d','e'])
obj
a   -5.3
b    7.2
c    3.6
d    4.5
e    NaN
dtype: float64
fill_value 参数
obj = Series([4.5,7.2,-5.3,3.6],index=['d','b','a','c'])
obj.reindex(['a','b','c','d','e'],fill_value=0)
a   -5.3
b    7.2
c    3.6
d    4.5
e    0.0
dtype: float64
method方法

ffill或pad 前向填充(或搬运)值

bfill或backfill 后向填充(或搬运)值

obj = Series(['blue','purple','yellow'],index=[0,2,4])
obj
0      blue
2    purple
4    yellow
dtype: object
obj.reindex(range(6),method='ffill')
0      blue
1      blue
2    purple
3    purple
4    yellow
5    yellow
dtype: object
frame = DataFrame(np.arange(9).reshape((3,3)),
                  columns=['Ohio','Texas','California'],index=['a','c','d'])
frame
OhioTexasCalifornia
a012
c345
d678
frame.reindex(['a','b','c','d'])
OhioTexasCalifornia
a0.01.02.0
bNaNNaNNaN
c3.04.05.0
d6.07.08.0
states = ['Texas','Utah','California']
frame.reindex(columns=states)
TexasUtahCalifornia
a1NaN2
c4NaN5
d7NaN8
frame = DataFrame(np.arange(9).reshape((3,3)),
                  columns=['Ohio','Texas','California'],index=['a','c','d'])
states = ['Texas','Utah','California']
frame = frame.reindex(columns=states)
frame.reindex(index=['a','b','c','d'],method='ffill',columns=states)
TexasUtahCalifornia
a1NaN2
b1NaN2
c4NaN5
d7NaN8
.ix
frame = DataFrame(np.arange(9).reshape((3,3)),
                  columns=['Ohio','Texas','California'],index=['a','c','d'])
states = ['Texas','Utah','California']
frame = frame.reindex(columns=states)
frame.reindex(index=['a','b','c','d'],columns=states)
TexasUtahCalifornia
a1.0NaN2.0
bNaNNaNNaN
c4.0NaN5.0
d7.0NaN8.0

丢弃指定轴上的项

drop 方法返回的是一个在指定轴上删除了指定值的新对象

obj = Series(np.arange(5),index=['a','b','c','d','e'])
obj
a    0
b    1
c    2
d    3
e    4
dtype: int64
obj.drop('c')
a    0
b    1
d    3
e    4
dtype: int64
obj.drop(['d','c'])
a    0
b    1
e    4
dtype: int64
data = DataFrame(np.arange(16).reshape((4,4)),
                index=['Ohio','Colorado','Utah','New York'],columns=['one','two','three','four'])
data
onetwothreefour
Ohio0123
Colorado4567
Utah891011
New York12131415

删除行

data.drop(['Colorado','Ohio'])
onetwothreefour
Utah891011
New York12131415

删除列

data.drop('two',axis=1)
onethreefour
Ohio023
Colorado467
Utah81011
New York121415
data.drop(['two','four'],axis=1)
onethree
Ohio02
Colorado46
Utah810
New York1214
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值