Pandas之Dataframe

 1.Dataframe像是将字典形式的data,传入到DataFrame中:

import pandas as pd
import numpy as np
#DataFrame 表示的是矩阵的数据表,包含已排序的列集合,每一列可以是不同的值类型(数值、字符串、布尔值等)
data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada', 'Nevada'],
        'years': [2000, 2001, 2002, 2001, 2000, 2009],
        'pop': [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]}
frame = pd.DataFrame(data)
print(frame)
print(frame.head())#对于DataFrame,head()方法选出头部的五行
#可以指定列的顺序输出结果
frame2 = pd.DataFrame(data, columns=['years', 'state', 'pop', 'debt'], index=['one', 'two', 'three', 'four', 'five', 'six'])
#对行进行选取
print(frame2.loc['three'])
#给某一列修改
frame2['debt'] = 16.5
print(frame2)
frame2['debt'] = np.arange(6)
print(frame2)
#将series赋值给某一列时
val = pd.Series([-1.2, -1.5, -1.7], index=['two', 'four', 'five'])
frame2['debt'] = val
print(frame2)

结果:

   pop   state  years
0  1.5    Ohio   2000
1  1.7    Ohio   2001
2  3.6    Ohio   2002
3  2.4  Nevada   2001
4  2.9  Nevada   2000
5  3.2  Nevada   2009
   pop   state  years
0  1.5    Ohio   2000
1  1.7    Ohio   2001
2  3.6    Ohio   2002
3  2.4  Nevada   2001
4  2.9  Nevada   2000
years    2002
state    Ohio
pop       3.6
debt      NaN
Name: three, dtype: object
       years   state  pop  debt
one     2000    Ohio  1.5  16.5
two     2001    Ohio  1.7  16.5
three   2002    Ohio  3.6  16.5
four    2001  Nevada  2.4  16.5
five    2000  Nevada  2.9  16.5
six     2009  Nevada  3.2  16.5
       years   state  pop  debt
one     2000    Ohio  1.5     0
two     2001    Ohio  1.7     1
three   2002    Ohio  3.6     2
four    2001  Nevada  2.4     3
five    2000  Nevada  2.9     4
six     2009  Nevada  3.2     5
       years   state  pop  debt
one     2000    Ohio  1.5   NaN
two     2001    Ohio  1.7  -1.2
three   2002    Ohio  3.6   NaN
four    2001  Nevada  2.4  -1.5
five    2000  Nevada  2.9  -1.7
six     2009  Nevada  3.2   NaN

2.Dataframe中的del例子:

#首先增加一列,此处增加的列是个布尔值,判断条件是state列是否为 Ohio
data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada', 'Nevada'],
        'years': [2000, 2001, 2002, 2001, 2000, 2009],
        'pop': [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]}
frame2 = pd.DataFrame(data, columns=['years', 'state', 'pop', 'debt'], index=['one', 'two', 'three', 'four', 'five', 'six'])
frame2['eastern'] = frame2.state == 'Ohio'
print(frame2)

#用del方法移除列
del frame2['eastern']
print(frame2.columns)

结果:

       years   state  pop debt eastern
one     2000    Ohio  1.5  NaN    True
two     2001    Ohio  1.7  NaN    True
three   2002    Ohio  3.6  NaN    True
four    2001  Nevada  2.4  NaN   False
five    2000  Nevada  2.9  NaN   False
six     2009  Nevada  3.2  NaN   False
Index(['years', 'state', 'pop', 'debt'], dtype='object')

 从DataFrame中选取的列只是数据的视图,不是拷贝。

 3.DataFrame中另一种常用的数据形式是包含字典的嵌套字典:如果嵌套字典被复制给了DataFrame,pandas会将字典的键作为列,将内部字典的键作为行索引。

pop = {'Nevada': {2001: 2.4, 2002: 2.9},
       'Ohio': {2000: 1.5, 2001: 1.7, 2002: 3.6}}
frame3 = pd.DataFrame(pop)
print(frame3)
#可以利用Numpy中的数组倒置
print(frame3.T)

结果:

      Nevada  Ohio
2000     NaN   1.5
2001     2.4   1.7
2002     2.9   3.6
        2000  2001  2002
Nevada   NaN   2.4   2.9
Ohio     1.5   1.7   3.6

 4.包含series的字典也可以用于构造DataFrame:

pop = {'Nevada': {2001: 2.4, 2002: 2.9},
       'Ohio': {2000: 1.5, 2001: 1.7, 2002: 3.6}}
frame3 = pd.DataFrame(pop)

pdata = {'Ohio': frame3['Ohio'][:-1],
         'Nevada': frame3['Nevada'][:2]}
print(pd.DataFrame(pdata))

结果:

      Nevada  Ohio
2000     NaN   1.5
2001     2.4   1.7

 

5.DataFrame的values属性会将包含在DataFrame中的数据以二维表的形式返回
#DataFrame的values属性会将包含在DataFrame中的数据以二维表的形式返回
print(frame3.values)

结果: 

[[ nan  1.5]
 [ 2.4  1.7]
 [ 2.9  3.6]]

 

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值