Pandas 数据结构介绍——DataFrame索引迭代

最新推荐文章于 2024-05-11 12:43:27 发布

峡谷的小鱼

最新推荐文章于 2024-05-11 12:43:27 发布

阅读量3.2k

点赞数 2

分类专栏：数据分析 pandas 文章标签：机器学习 python 数据分析 pytorch

本文链接：https://blog.csdn.net/weixin_43276033/article/details/124065867

版权

数据分析 pandas 专栏收录该内容

9 篇文章 1 订阅

订阅专栏

DataFrame数据结构

# 介绍之前，定义一个DataFrame对象。
>>> import numpy as np
>>> import pandas as pd


>>> df = pd.DataFrame(data=np.array([(x, x+1, x+2, x+3, x+4) for x in range(0,25,5)],
                                dtype=[('col1', 'i4'), ('col2', 'i8'), ('col3', 'i4'), ('col4','f8'), ('col5', 'b')]))
>>> df
	col1	col2	col3	col4	col5
0	0	1	2	3.0	4
1	5	6	7	8.0	9
2	10	11	12	13.0	14
3	15	16	17	18.0	19
4	20	21	22	23.0	24

一、DataFrame索引

DataFrame.get(key)
使用键来返回对象中对应的列，key列标签。
DataFrame.at
使用单个值的行/列标签对，来返回对应位置元素。
DataFrame.loc
类似于 DataFrame.at，可以是标签的列表、布尔列表，切片等。
DataFrame.iat
使用单个整数值来返回对应位置的元素。
DataFrame.iloc
类似于 DataFrame.iat，可以是整数列表、布尔列表，切片等。
DataFrame.head(n)
返回前n行。
DataFrame.tail(n)
返回后n行。
DataFrame.pop(item)
返回并删除对应列。

>>> df.at[2, 'col3']
12
>>> df.loc[[2, 4], ['col1', 'col4']]
col1	col4
2	10	13.0
4	20	23.0
>>> df.iat[1, 3]
8.0
>>> df.iloc[[0, 1, 2], [2, 4]]
	col3	col5
0	2	4
1	7	9
2	12	14
>>> df.pop('col5')
0     4
1     9
2    14
3    19
4    24
Name: col5, dtype: int8
>>> df
	col1	col2	col3	col4
0	0	1	2	3.0
1	5	6	7	8.0
2	10	11	12	13.0
3	15	16	17	18.0
4	20	21	22	23.0
>>> df.insert(4, value=np.array([1, 2,3,4,5]), column='col5')
>>> df
	col1	col2	col3	col4	col5
0	0	1	2	3.0	1
1	5	6	7	8.0	2
2	10	11	12	13.0	3
3	15	16	17	18.0	4
4	20	21	22	23.0	5

二、DataFrame迭代

DataFrame.__iter__()
迭代列标签.
DataFrame.items()
迭代 (col_name, Series)。
DataFrame.keys()
迭代列索引。
DataFrame.iterrows()
迭代行 (index, Series)。
DataFrame.itertuples()
迭代行的命名元组。

>>> for item in iter(df):
    	print(item)
col1
col2
col3
col4
col5

>>> for column, series in df.items():
    	print(column, ':\n', series)
col1 :
 0     0
1     5
2    10
3    15
4    20
Name: col1, dtype: int32
col2 :
 0     1
1     6
2    11
3    16
4    21
Name: col2, dtype: int64
col3 :
 0     2
1     7
2    12
3    17
4    22
Name: col3, dtype: int32
col4 :
 0     3.0
1     8.0
2    13.0
3    18.0
4    23.0
Name: col4, dtype: float64
col5 :
 0     4
1     9
2    14
3    19
4    24
Name: col5, dtype: int8

>>> for index in df.keys():
    	print(index)
col1
col2
col3
col4
col5

>>> for index, row in df.iterrows():
    	print(index, ":\n", row)
0 :
 col1    0.0
col2    1.0
col3    2.0
col4    3.0
col5    4.0
Name: 0, dtype: float64
1 :
 col1    5.0
col2    6.0
col3    7.0
col4    8.0
col5    9.0
Name: 1, dtype: float64
2 :
 col1    10.0
col2    11.0
col3    12.0
col4    13.0
col5    14.0
Name: 2, dtype: float64
3 :
 col1    15.0
col2    16.0
col3    17.0
col4    18.0
col5    19.0
Name: 3, dtype: float64
4 :
 col1    20.0
col2    21.0
col3    22.0
col4    23.0
col5    24.0
Name: 4, dtype: float64

>>> for row in df.itertuples():
    	print(row)
Pandas(Index=0, col1=0, col2=1, col3=2, col4=3.0, col5=4)
Pandas(Index=1, col1=5, col2=6, col3=7, col4=8.0, col5=9)
Pandas(Index=2, col1=10, col2=11, col3=12, col4=13.0, col5=14)
Pandas(Index=3, col1=15, col2=16, col3=17, col4=18.0, col5=19)
Pandas(Index=4, col1=20, col2=21, col3=22, col4=23.0, col5=24)