python中frameshape_python --panda(二)---DataFrame结构及日常操作

最新推荐文章于 2024-09-28 17:31:20 发布

weixin_39687990

最新推荐文章于 2024-09-28 17:31:20 发布

阅读量539

点赞数

文章标签： python中frameshape

继上一篇文章，这篇文章介绍一下Pandas模块里面的DataFrame结构

1. 介绍

DataFrame unifies two or more Series into a single data structure.Each Series then represents a named column of the DataFrame, and instead of each column having its own index, the DataFrame provides a single index and the data in all columns is aligned to the master index of the DataFrame.

这段话的意思是，DataFrame提供的是一个类似表的结构，由多个Series组成，而Series在DataFrame中叫columns(理解有错请指出，(逃~

2. 相关操作

a.create

pd.DataFrame()

参数：

1、二维array；

2、Series 列表；

3、value为Series的字典；

a.1、二维array

import pandas as pd

import numpy as np

s1=np.array([1,2,3,4])

s2=np.array([5,6,7,8])

df=pd.DataFrame([s1,s2])

print df

a.2、Series列表(效果与二维array相同)

import pandas as pd

import numpy as np

s1=pd.Series(np.array([1,2,3,4]))

s2=pd.Series(np.array([5,6,7,8]))

df=pd.DataFrame([s1,s2])

print df

a.3、value为Series的字典结构；

import pandas as pd

import numpy as np

s1=pd.Series(np.array([1,2,3,4]))

s2=pd.Series(np.array([5,6,7,8]))

df=pd.DataFrame({"a":s1,"b":s2});

print df

注：若创建使用的参数中，array、Series长度不一样时，对应index的value值若不存在则为NaN

b.属性

b.1 .columns :每个columns对应的keys

b.2 .shape:形状，(a，b),index长度为a,columns数为b

b.3 .index;.values:返回index列表；返回value二维array

b.4 .head();.tail();

c.if-then 操作

c.1使用.ix[]

df=pd.DataFrame({"A":[1,2,3,4],"B":[5,6,7,8],"C":[1,1,1,1]})

df.ix[df.A>1,'B']= -1

print df

df.ix[条件，then操作区域]

c.2使用numpy.where

df=pd.DataFrame({"A":[1,2,3,4],"B":[5,6,7,8],"C":[1,1,1,1]})

df["then"]=np.where(df.A<3,1,0)

print df

np.where(条件，then，else)

d.根据条件选择取DataFrame

d.1 直接取值df.[]

df=pd.DataFrame({"A":[1,2,3,4],"B":[5,6,7,8],"C":[1,1,1,1]})

df=df[df.A>=2]

print df

d.2 使用.loc[]

df=pd.DataFrame({"A":[1,2,3,4],"B":[5,6,7,8],"C":[1,1,1,1]})

df=df.loc[df.A>2]

print df

(还有很多种方法就不一一列举了)

e.Grouping

e.1groupby 形成group

df = pd.DataFrame({'animal': 'cat dog cat fish dog cat cat'.split(),

'size': list('SSMMMLL'),

'weight': [8, 10, 11, 1, 20, 12, 12],

'adult' : [False] * 5 + [True] * 2});

#列出动物中weight最大的对应size

group=df.groupby("animal").apply(lambda subf: subf['size'][subf['weight'].idxmax()])

print group

e.2 使用get_group 取出其中一分组

df = pd.DataFrame({'animal': 'cat dog cat fish dog cat cat'.split(),

'size': list('SSMMMLL'),

'weight': [8, 10, 11, 1, 20, 12, 12],

'adult' : [False] * 5 + [True] * 2});

group=df.groupby("animal")

cat=group.get_group("cat")

print cat

其他具体操作请参考CookBook

weixin_39687990

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。