python的pandas库的应用设计_python数据分析之pandas库的DataFrame应用二

最新推荐文章于 2024-02-02 09:43:50 发布

weixin_40006965

最新推荐文章于 2024-02-02 09:43:50 发布

阅读量85

点赞数

文章标签： python的pandas库的应用设计

本节介绍Series和DataFrame中的数据的基本手段

重新索引

pandas对象的一个重要方法就是reindex,作用是创建一个适应新索引的新对象

'''Created on 2016-8-10

@author: xuzhengzhu'''

'''Created on 2016-8-10

@author: xuzhengzhu'''

from pandas import *

print "--------------obj result:-----------------"obj=Series([4.5,7.2,-5.3,3.6],index=['d','b','a','c'])printobjprint "--------------obj2 result:-----------------"obj2=obj.reindex(['a','b','c','d','e'])printobj2print "--------------obj3 result:-----------------"obj3=obj.reindex(['a','b','c','d','e'],fill_value=0)print obj3

reindex

#reindex对索引值进行重排，如果当前索引值不存在，就引入缺失值

#可以指定fill_value=0来进行缺失值的替换

--------------obj result:-----------------d4.5b7.2a-5.3c3.6dtype: float64--------------obj2 result:-----------------a-5.3b7.2c3.6d4.5e NaN

dtype: float64--------------obj3 result:-----------------a-5.3b7.2c3.6d4.5e0.0dtype: float64

reindex_index

2.插值

对于时间序列这样的有序数据，重新索引时可能需要做一些插值处理，method选项即可达到此目的：

method参数介绍

参数

说明

ffill或pad

前向填充

bfill或backfill

后向填充

'''Created on 2016-8-10

@author: xuzhengzhu'''

from pandas import *

print "--------------obj3 result:-----------------"obj3=Series(['blue','red','yellow'],index=[0,2,4])printobj3print "--------------obj4 result:-----------------"obj4=obj3.reindex(range(6),method='ffill')print obj4

ffill前向填充

--------------obj3 result:-----------------0 blue2red4yellow

dtype: object--------------obj4 result:-----------------0 blue1blue2red3red4yellow5yellow

dtype: object

ffill结果：

对于DataFrame数据类型，reindex可以修改行与列索引，但如果仅传入一个序列，则优先重新索引行：

'''Created on 2016-8-10

@author: xuzhengzhu'''

from pandas import *

print "--------------frame result:-----------------"frame=DataFrame(np.arange(9).reshape((3,3)),index=['a','c','d'],columns=['ohio','texas','california'])printframeprint "--------------frame2 result:-----------------"frame2=frame.reindex(['a','b','c','d'])printframe2print "--------------frame3 result:-----------------"frame3=frame.reindex(columns=['texas','utah','california'])printframe3print "--------------frame3 result:-----------------"frame4=frame.ix[['a','b','c','d'],['texas','utah','california']]print frame4

reindex_dataframe

--------------frame result:-----------------ohio texas california

a 01 2c3 4 5d6 7 8

--------------frame2 result:-----------------ohio texas california

a0.0 1.0 2.0b NaN NaN NaN

c3.0 4.0 5.0d6.0 7.0 8.0

--------------frame3 result:-----------------texas utah california

a1 NaN 2c4 NaN 5d7 NaN 8

--------------frame3 result:-----------------texas utah california

a1.0 NaN 2.0b NaN NaN NaN

c4.0 NaN 5.0d7.0 NaN 8.0

reindex结果：

3.指定轴上的项

'''Created on 2016-8-10

@author: xuzhengzhu'''

from pandas import *

print "--------------Series drop item by index:-----------------"obj=Series(np.arange(3,8),index=['a','b','c','d','e'])printobj

obj1=obj.drop('c')printobj1print "--------------DataFrame drop item by index :-----------------"frame=DataFrame(np.arange(9).reshape((3,3)),index=['a','c','d'],columns=['ohio','texas','california'])printframe

frame1=frame.drop(['ohio'],axis=1)print frame1

指定轴上的项

--------------Series drop item by index:-----------------a3b4c5d6e7dtype: int32

a3b4d6e7dtype: int32--------------DataFrame drop item by index :-----------------ohio texas california

a 01 2c3 4 5d6 7 8texas california

a1 2c4 5d7 8

drop_item

#对于DataFrame，可以删除任意轴上的索引值

4.索引，选取和过滤

Series利用标签的切片运算与普通的python切片运算不同，其末端是包含的，

DataFrame进行索引就是获取一个或多个列

'''Created on 2016-8-10

@author: xuzhengzhu'''

from pandas import *

print "--------------DataFrame drop item by index :-----------------"frame=DataFrame(np.arange(9).reshape((3,3)),index=['a','c','d'],columns=['ohio','texas','california'])printframe

frame1=frame.drop(['ohio'],axis=1)printframe1print "--------------DataFrame filter item by index :-----------------"

#也可通过切片和布尔型来选取

print frame['ohio']print frame[:2]print frame[frame['ohio']>=3]print "--------------DataFrame filter item by index :-----------------"

#在DateFrame上进行标签索引，引入ix：注意行标签在前，列标签在后

print frame.ix['a',['ohio','texas']]

索引选取和过滤

--------------DataFrame drop item by index :-----------------ohio texas california

a 01 2c3 4 5d6 7 8texas california

a1 2c4 5d7 8

--------------DataFrame filter item by index :-----------------a 0

c3d6Name: ohio, dtype: int32

ohio texas california

a 01 2c3 4 5ohio texas california

c3 4 5d6 7 8

--------------DataFrame filter item by index :-----------------ohio 0

texas1Name: a, dtype: int32

结果：

5.算术运算和数据对齐

'''Created on 2016-8-10

@author: xuzhengzhu'''

from pandas import *

print "--------------DataFrame drop item by index :-----------------"s1=Series([7.3,-2.5,3.4,1.5],index=['a','c','d','e'])

s2=Series([-2.1,3.6,-1.5,4,3.1],index=['a','c','e','f','g'])print s1+s2

算术运算和数据对齐

--------------DataFrame drop item by index :-----------------a5.2c1.1d NaN

e0.0f NaN

g NaN

dtype: float64

结果：

'''Created on 2016-8-10

@author: xuzhengzhu'''

from pandas import *

print "--------------DataFrame drop item by index :-----------------"df1=DataFrame(np.arange(9).reshape((3,3)),columns=list('bcd'),index=['ohio','texas','colorado'])

df2=DataFrame(np.arange(12).reshape((4,3)),columns=list('bde'),index=['utah','ohio','texas','oregon'])printdf1print "--------------------"

printdf2#只返回行列均匹配的数值

print "-------df1+df2-------------"

print df1+df2#在对不同的索引对象进行算术运算时，当一个对象中某个轴标签在另一个对象中找不到时填充一个特殊值

print "-------df3-------------"df3=df1.add(df2,fill_value=0)print df3

对齐操作

--------------DataFrame drop item by index :-----------------b c d

ohio 01 2texas3 4 5colorado6 7 8

--------------------b d e

utah 01 2ohio3 4 5texas6 7 8oregon9 10 11

-------df1+df2-------------b c d e

colorado NaN NaN NaN NaN

ohio3.0 NaN 6.0NaN

oregon NaN NaN NaN NaN

texas9.0 NaN 12.0NaN

utah NaN NaN NaN NaN-------df3-------------b c d e

colorado6.0 7.0 8.0NaN

ohio3.0 1.0 6.0 5.0oregon9.0 NaN 10.0 11.0texas9.0 4.0 12.0 8.0utah0.0 NaN 1.0 2.0

结果：

weixin_40006965

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫