8-Pandas iteration遍历

qwy715229258163

于 2024-06-30 18:39:30 发布

阅读量269

点赞数 3

分类专栏： pandas 文章标签： pandas

本文链接：https://blog.csdn.net/qwy715229258163/article/details/140084495

版权

pandas 专栏收录该内容

32 篇文章 0 订阅

订阅专栏

Pandas iteration遍历

遍历是众多编程语言中必备的一种操作，比如 Python 语言通过 for 循环来遍历列表结构。那么 Pandas 是如何遍历 Series 和 DataFrame 结构呢？我们应该明确，它们的数据结构类型不同的，遍历的方法必然会存在差异。对于 Series 而言，您可以把它当做一维数组进行遍历操作；而像 DataFrame 这种二维数据表结构，则类似于遍历 Python 字典。

在 Pandas 中同样也是使用 for 循环进行遍历。通过for遍历后，Series 可直接获取相应的 value，而 DataFrame 则会获取列标签。示例如下：

import pandas as pd
import numpy as np
N=20
df = pd.DataFrame({
   'A': pd.date_range(start='2016-01-01',periods=N,freq='D'),
   'x': np.linspace(0,stop=N-1,num=N),
   'y': np.random.rand(N),
   'C': np.random.choice(['Low','Medium','High'],N).tolist(),
   'D': np.random.normal(100, 10, size=(N)).tolist()
   })
print(df)
print("迭代所有的列标签：")
for col in df:
   print (col)

输出结果：

            A     x         y       C           D
0  2016-01-01   0.0  0.306298    High   99.538774
1  2016-01-02   1.0  0.350768    High  111.734390
2  2016-01-03   2.0  0.912953     Low   90.404414
3  2016-01-04   3.0  0.553158    High   92.537805
4  2016-01-05   4.0  0.045641  Medium   92.612460
5  2016-01-06   5.0  0.289502    High   94.824675
6  2016-01-07   6.0  0.247479    High   99.026844
7  2016-01-08   7.0  0.648031    High  126.798087
8  2016-01-09   8.0  0.675396     Low  106.342780
9  2016-01-10   9.0  0.599190  Medium   92.860987
10 2016-01-11  10.0  0.394856     Low   98.485523
11 2016-01-12  11.0  0.300833  Medium   87.875087
12 2016-01-13  12.0  0.018943     Low   94.117690
13 2016-01-14  13.0  0.451572     Low  119.475830
14 2016-01-15  14.0  0.972835    High   91.034207
15 2016-01-16  15.0  0.645414    High   89.636694
16 2016-01-17  16.0  0.467082    High   99.775743
17 2016-01-18  17.0  0.108793     Low   78.775024
18 2016-01-19  18.0  0.592192    High  107.170954
19 2016-01-20  19.0  0.568169    High   99.094213
迭代所有的列标签：
A
x
y
C
D

内置迭代方法

如果想要遍历 DataFrame 的每一行，我们下列函数：

iterrows()：以 (row_index,row) 的形式遍历行;
itertuples()：使用已命名元组的方式对行遍历。

下面对上述函数做简单的介绍：

1) iterrows()

以键值对的形式遍历 DataFrame 对象，以列标签为键，以对应列的元素为值。

import pandas as pd
import numpy as np
N=20
df = pd.DataFrame({
   'A': pd.date_range(start='2016-01-01',periods=N,freq='D'),
   'x': np.linspace(0,stop=N-1,num=N),
   'y': np.random.rand(N),
   'C': np.random.choice(['Low','Medium','High'],N).tolist(),
   'D': np.random.normal(100, 10, size=(N)).tolist()
   })

for key,value in df.iterrows():
   print (key,value)

输出结果：

0 A    2016-01-01 00:00:00
x                    0.0
y               0.376904
C                   High
D              85.622403
Name: 0, dtype: object
1 A    2016-01-02 00:00:00
x                    1.0
y               0.740229
C                 Medium
D              78.572574
Name: 1, dtype: object
2 A    2016-01-03 00:00:00
x                    2.0
y               0.672089
C                 Medium
D             101.784087
Name: 2, dtype: object
......
Name: 19, dtype: object

2) itertuples

itertuples() 同样将返回一个迭代器，该方法会把 DataFrame 的每一行生成一个元组，示例如下：

import pandas as pd
import numpy as np
N=20
df = pd.DataFrame({
   'A': pd.date_range(start='2016-01-01',periods=N,freq='D'),
   'x': np.linspace(0,stop=N-1,num=N),
   'y': np.random.rand(N),
   'C': np.random.choice(['Low','Medium','High'],N).tolist(),
   'D': np.random.normal(100, 10, size=(N)).tolist()
   })

for row in df.itertuples():
   print (row)

输出结果：

Pandas(Index=0, A=Timestamp('2016-01-01 00:00:00'), x=0.0, y=0.0073833717186708725, C='Medium', D=102.21685856034675)
Pandas(Index=1, A=Timestamp('2016-01-02 00:00:00'), x=1.0, y=0.7570282079756047, C='Medium', D=77.88547775291684)
Pandas(Index=2, A=Timestamp('2016-01-03 00:00:00'), x=2.0, y=0.039159500841185246, C='Medium', D=90.60034318698546)
Pandas(Index=3, A=Timestamp('2016-01-04 00:00:00'), x=3.0, y=0.5777131686110479, C='Medium', D=108.45249228376123)
Pandas(Index=4, A=Timestamp('2016-01-05 00:00:00'), x=4.0, y=0.4726895679114832, C='High', D=102.3053880413406)
Pandas(Index=5, A=Timestamp('2016-01-06 00:00:00'), x=5.0, y=0.9181876349067116, C='High', D=88.77667424669386)
Pandas(Index=6, A=Timestamp('2016-01-07 00:00:00'), x=6.0, y=0.352008513872231, C='Low', D=94.1640236552118)
Pandas(Index=7, A=Timestamp('2016-01-08 00:00:00'), x=7.0, y=0.5722692889700786, C='Medium', D=91.32266564519188)
Pandas(Index=8, A=Timestamp('2016-01-09 00:00:00'), x=8.0, y=0.18340633936165507, C='Medium', D=91.40118820334366)
Pandas(Index=9, A=Timestamp('2016-01-10 00:00:00'), x=9.0, y=0.5822548446901658, C='Medium', D=105.26907848666296)
Pandas(Index=10, A=Timestamp('2016-01-11 00:00:00'), x=10.0, y=0.40705596480000217, C='High', D=85.52555287827161)
Pandas(Index=11, A=Timestamp('2016-01-12 00:00:00'), x=11.0, y=0.9525667200400463, C='High', D=107.35261261096153)
Pandas(Index=12, A=Timestamp('2016-01-13 00:00:00'), x=12.0, y=0.44425664486730154, C='Medium', D=92.55767916353153)
Pandas(Index=13, A=Timestamp('2016-01-14 00:00:00'), x=13.0, y=0.5468369154349298, C='High', D=87.74208234902464)
Pandas(Index=14, A=Timestamp('2016-01-15 00:00:00'), x=14.0, y=0.4727283165059927, C='Low', D=107.5236125991258)
Pandas(Index=15, A=Timestamp('2016-01-16 00:00:00'), x=15.0, y=0.990707163043359, C='Low', D=95.76090795914205)
Pandas(Index=16, A=Timestamp('2016-01-17 00:00:00'), x=16.0, y=0.6243139269960055, C='Low', D=101.45573754665573)
Pandas(Index=17, A=Timestamp('2016-01-18 00:00:00'), x=17.0, y=0.6146066882888525, C='High', D=99.43866726961795)
Pandas(Index=18, A=Timestamp('2016-01-19 00:00:00'), x=18.0, y=0.6001033142743434, C='Low', D=117.15405644081103)
Pandas(Index=19, A=Timestamp('2016-01-20 00:00:00'), x=19.0, y=0.06108299134959061, C='Medium', D=102.41567398727766)