1.Dataframe像是将字典形式的data,传入到DataFrame中:
import pandas as pd
import numpy as np
#DataFrame 表示的是矩阵的数据表,包含已排序的列集合,每一列可以是不同的值类型(数值、字符串、布尔值等)
data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada', 'Nevada'],
'years': [2000, 2001, 2002, 2001, 2000, 2009],
'pop': [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]}
frame = pd.DataFrame(data)
print(frame)
print(frame.head())#对于DataFrame,head()方法选出头部的五行
#可以指定列的顺序输出结果
frame2 = pd.DataFrame(data, columns=['years', 'state', 'pop', 'debt'], index=['one', 'two', 'three', 'four', 'five', 'six'])
#对行进行选取
print(frame2.loc['three'])
#给某一列修改
frame2['debt'] = 16.5
print(frame2)
frame2['debt'] = np.arange(6)
print(frame2)
#将series赋值给某一列时
val = pd.Series([-1.2, -1.5, -1.7], index=['two', 'four', 'five'])
frame2['debt'] = val
print(frame2)
结果:
pop state years
0 1.5 Ohio 2000
1 1.7 Ohio 2001
2 3.6 Ohio 2002
3 2.4 Nevada 2001
4 2.9 Nevada 2000
5 3.2 Nevada 2009
pop state years
0 1.5 Ohio 2000
1 1.7 Ohio 2001
2 3.6 Ohio 2002
3 2.4 Nevada 2001
4 2.9 Nevada 2000
years 2002
state Ohio
pop 3.6
debt NaN
Name: three, dtype: object
years state pop debt
one 2000 Ohio 1.5 16.5
two 2001 Ohio 1.7 16.5
three 2002 Ohio 3.6 16.5
four 2001 Nevada 2.4 16.5
five 2000 Nevada 2.9 16.5
six 2009 Nevada 3.2 16.5
years state pop debt
one 2000 Ohio 1.5 0
two 2001 Ohio 1.7 1
three 2002 Ohio 3.6 2
four 2001 Nevada 2.4 3
five 2000 Nevada 2.9 4
six 2009 Nevada 3.2 5
years state pop debt
one 2000 Ohio 1.5 NaN
two 2001 Ohio 1.7 -1.2
three 2002 Ohio 3.6 NaN
four 2001 Nevada 2.4 -1.5
five 2000 Nevada 2.9 -1.7
six 2009 Nevada 3.2 NaN
2.Dataframe中的del例子:
#首先增加一列,此处增加的列是个布尔值,判断条件是state列是否为 Ohio
data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada', 'Nevada'],
'years': [2000, 2001, 2002, 2001, 2000, 2009],
'pop': [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]}
frame2 = pd.DataFrame(data, columns=['years', 'state', 'pop', 'debt'], index=['one', 'two', 'three', 'four', 'five', 'six'])
frame2['eastern'] = frame2.state == 'Ohio'
print(frame2)
#用del方法移除列
del frame2['eastern']
print(frame2.columns)
结果:
years state pop debt eastern
one 2000 Ohio 1.5 NaN True
two 2001 Ohio 1.7 NaN True
three 2002 Ohio 3.6 NaN True
four 2001 Nevada 2.4 NaN False
five 2000 Nevada 2.9 NaN False
six 2009 Nevada 3.2 NaN False
Index(['years', 'state', 'pop', 'debt'], dtype='object')
从DataFrame中选取的列只是数据的视图,不是拷贝。
3.DataFrame中另一种常用的数据形式是包含字典的嵌套字典:如果嵌套字典被复制给了DataFrame,pandas会将字典的键作为列,将内部字典的键作为行索引。
pop = {'Nevada': {2001: 2.4, 2002: 2.9},
'Ohio': {2000: 1.5, 2001: 1.7, 2002: 3.6}}
frame3 = pd.DataFrame(pop)
print(frame3)
#可以利用Numpy中的数组倒置
print(frame3.T)
结果:
Nevada Ohio
2000 NaN 1.5
2001 2.4 1.7
2002 2.9 3.6
2000 2001 2002
Nevada NaN 2.4 2.9
Ohio 1.5 1.7 3.6
4.包含series的字典也可以用于构造DataFrame:
pop = {'Nevada': {2001: 2.4, 2002: 2.9},
'Ohio': {2000: 1.5, 2001: 1.7, 2002: 3.6}}
frame3 = pd.DataFrame(pop)
pdata = {'Ohio': frame3['Ohio'][:-1],
'Nevada': frame3['Nevada'][:2]}
print(pd.DataFrame(pdata))
结果:
Nevada Ohio
2000 NaN 1.5
2001 2.4 1.7
5.DataFrame的values属性会将包含在DataFrame中的数据以二维表的形式返回
#DataFrame的values属性会将包含在DataFrame中的数据以二维表的形式返回
print(frame3.values)
结果:
[[ nan 1.5]
[ 2.4 1.7]
[ 2.9 3.6]]