DataFrame数据结构
>>> import numpy as np
>>> import pandas as pd
>>> df = pd.DataFrame(data=np.array([(x, x+1, x+2, x+3, x+4) for x in range(0,25,5)],
dtype=[('col1', 'i4'), ('col2', 'i8'), ('col3', 'i4'), ('col4','f8'), ('col5', 'b')]))
>>> df
col1 col2 col3 col4 col5
0 0 1 2 3.0 4
1 5 6 7 8.0 9
2 10 11 12 13.0 14
3 15 16 17 18.0 19
4 20 21 22 23.0 24
一、DataFrame索引
DataFrame.get(key)
使用键来返回对象中对应的列,key列标签。DataFrame.at
使用单个值的行/列标签对,来返回对应位置元素。DataFrame.loc
类似于 DataFrame.at
,可以是标签的列表、布尔列表,切片等。DataFrame.iat
使用单个整数值来返回对应位置的元素。DataFrame.iloc
类似于 DataFrame.iat
,可以是整数列表、布尔列表,切片等。DataFrame.head(n)
返回前n行。DataFrame.tail(n)
返回后n行。DataFrame.pop(item)
返回并删除对应列。
>>> df.at[2, 'col3']
12
>>> df.loc[[2, 4], ['col1', 'col4']]
col1 col4
2 10 13.0
4 20 23.0
>>> df.iat[1, 3]
8.0
>>> df.iloc[[0, 1, 2], [2, 4]]
col3 col5
0 2 4
1 7 9
2 12 14
>>> df.pop('col5')
0 4
1 9
2 14
3 19
4 24
Name: col5, dtype: int8
>>> df
col1 col2 col3 col4
0 0 1 2 3.0
1 5 6 7 8.0
2 10 11 12 13.0
3 15 16 17 18.0
4 20 21 22 23.0
>>> df.insert(4, value=np.array([1, 2,3,4,5]), column='col5')
>>> df
col1 col2 col3 col4 col5
0 0 1 2 3.0 1
1 5 6 7 8.0 2
2 10 11 12 13.0 3
3 15 16 17 18.0 4
4 20 21 22 23.0 5
二、DataFrame迭代
DataFrame.__iter__()
迭代 列标签.DataFrame.items()
迭代 (col_name, Series)
。DataFrame.keys()
迭代列索引。DataFrame.iterrows()
迭代行 (index, Series)
。DataFrame.itertuples()
迭代行的命名元组。
>>> for item in iter(df):
print(item)
col1
col2
col3
col4
col5
>>> for column, series in df.items():
print(column, ':\n', series)
col1 :
0 0
1 5
2 10
3 15
4 20
Name: col1, dtype: int32
col2 :
0 1
1 6
2 11
3 16
4 21
Name: col2, dtype: int64
col3 :
0 2
1 7
2 12
3 17
4 22
Name: col3, dtype: int32
col4 :
0 3.0
1 8.0
2 13.0
3 18.0
4 23.0
Name: col4, dtype: float64
col5 :
0 4
1 9
2 14
3 19
4 24
Name: col5, dtype: int8
>>> for index in df.keys():
print(index)
col1
col2
col3
col4
col5
>>> for index, row in df.iterrows():
print(index, ":\n", row)
0 :
col1 0.0
col2 1.0
col3 2.0
col4 3.0
col5 4.0
Name: 0, dtype: float64
1 :
col1 5.0
col2 6.0
col3 7.0
col4 8.0
col5 9.0
Name: 1, dtype: float64
2 :
col1 10.0
col2 11.0
col3 12.0
col4 13.0
col5 14.0
Name: 2, dtype: float64
3 :
col1 15.0
col2 16.0
col3 17.0
col4 18.0
col5 19.0
Name: 3, dtype: float64
4 :
col1 20.0
col2 21.0
col3 22.0
col4 23.0
col5 24.0
Name: 4, dtype: float64
>>> for row in df.itertuples():
print(row)
Pandas(Index=0, col1=0, col2=1, col3=2, col4=3.0, col5=4)
Pandas(Index=1, col1=5, col2=6, col3=7, col4=8.0, col5=9)
Pandas(Index=2, col1=10, col2=11, col3=12, col4=13.0, col5=14)
Pandas(Index=3, col1=15, col2=16, col3=17, col4=18.0, col5=19)
Pandas(Index=4, col1=20, col2=21, col3=22, col4=23.0, col5=24)