[Pandas] 数据迭代

本文详细介绍了Pandas DataFrame的几种迭代方式,包括迭代Series、使用df.iterrows()、df.itertuples()以及df.items()。通过实例展示了如何遍历DataFrame的索引、列名、行数据,并提供了按列迭代和获取特定列数据的方法。对于大数据处理,推荐使用运行速度更快的df.itertuples()。
摘要由CSDN通过智能技术生成

df

1.迭代Series

Series本身是一个可迭代的对象,可直接对Series使用for语句来遍历它的值

import pandas as pd

df = pd.DataFrame([['liver','E',89,21,24,64],
                   ['Arry','C',36,37,37,57],
                   ['Ack','A',57,60,18,84],
                   ['Eorge','C',93,96,71,78],
                   ['Oah','D',65,49,61,86]
                  ], 
                   columns = ['name','team','Q1','Q2','Q3','Q4'])
# 迭代指定的列
for i in df.name:
    print(i)

# 效果和上面相同
# df.name.values返回array结构数据可用于迭代
for i in df.name.values:
    print(i)

# 输出结果:
# liver
# Arry
# Ack
# Eorge
# Oah

迭代索引和指定的多列,使用python内置的zip函数将其打包为可迭代的zip对象

import pandas as pd

df = pd.DataFrame([['liver','E',89,21,24,64],
                   ['Arry','C',36,37,37,57],
                   ['Ack','A',57,60,18,84],
                   ['Eorge','C',93,96,71,78],
                   ['Oah','D',65,49,61,86]
                  ], 
                   columns = ['name','team','Q1','Q2','Q3','Q4'])

# 迭代索引和指定的两列
for i, n, q in zip(df.index, df.name, df.Q1):
    print(i, n, q)

# 输出结果:
# 0 liver 89
# 1 Arry 36
# 2 Ack 57
# 3 Eorge 93
# 4 Oah 65

2. df.iterrows() 

df.iterrows()生成一个可迭代对象,将DataFrame行作为(索引,行数据)组成的Series数据对进行迭代。在for语句中需要两个变量来承接数据:一个为索引变量,即使索引在迭代中不会使用(这种情况可用useless作为变量名);另一个为数据变量,读取具体列时,可以使用字典的方法和对象属性的方法

df.iterrows()是最常用、最方便的按行迭代方法

import pandas as pd
df = pd.DataFrame([['liver','E',89,21,24,64],
                   ['Arry','C',36,37,37,57],
                   ['Ack','A',57,60,18,84],
                   ['Eorge','C',93,96,71,78],
                   ['Oah','D',65,49,61,86]
                  ], 
                   columns = ['name','team','Q1','Q2','Q3','Q4'])
# 迭代,使用name,Q1数据
for index, row in df.iterrows():
    print(index, row['name'], row.Q1)

# 输出结果:
# 0 liver 89
# 1 Arry 36
# 2 Ack 57
# 3 Eorge 93
# 4 Oah 65

3 df.itertuples() 

df.itertuples()生成一个namedtuples类型数据,name默认名为Pandas,可以在参数中指定

与df.iterrows()相比,df.itertuples()运行速度会更快一些,推荐在数据量庞大的情况下优先使用

import pandas as pd

df = pd.DataFrame([['liver','E',89,21,24,64],
                   ['Arry','C',36,37,37,57],
                   ['Ack','A',57,60,18,84],
                   ['Eorge','C',93,96,71,78],
                   ['Oah','D',65,49,61,86]
                  ], 
                   columns = ['name','team','Q1','Q2','Q3','Q4'])

for row in df.itertuples():
    print(row)

# 输出结果:
# Pandas(Index=0, name='liver', team='E', Q1=89, Q2=21, Q3=24, Q4=64)
# Pandas(Index=1, name='Arry', team='C', Q1=36, Q2=37, Q3=37, Q4=57)
# Pandas(Index=2, name='Ack', team='A', Q1=57, Q2=60, Q3=18, Q4=84)
# Pandas(Index=3, name='Eorge', team='C', Q1=93, Q2=96, Q3=71, Q4=78)
# Pandas(Index=4, name='Oah', team='D', Q1=65, Q2=49, Q3=61, Q4=86)

以下是一些使用方法示例:

import pandas as pd

df = pd.DataFrame([['liver','E',89,21,24,64],
                   ['Arry','C',36,37,37,57],
                   ['Ack','A',57,60,18,84],
                   ['Eorge','C',93,96,71,78],
                   ['Oah','D',65,49,61,86]
                  ], 
                   columns = ['name','team','Q1','Q2','Q3','Q4'])

# 不包含索引数据
for row in df.itertuples(index=False):
    print(row)

# Pandas(name='liver', team='E', Q1=89, Q2=21, Q3=24, Q4=64)
# Pandas(name='Arry', team='C', Q1=36, Q2=37, Q3=37, Q4=57)
# Pandas(name='Ack', team='A', Q1=57, Q2=60, Q3=18, Q4=84)
# Pandas(name='Eorge', team='C', Q1=93, Q2=96, Q3=71, Q4=78)
# Pandas(name='Oah', team='D', Q1=65, Q2=49, Q3=61, Q4=86)

# 自定义name
# namedtuples
for row in df.itertuples(index=False, name='Hudas'):
    print(row)

# Hudas(name='liver', team='E', Q1=89, Q2=21, Q3=24, Q4=64)
# Hudas(name='Arry', team='C', Q1=36, Q2=37, Q3=37, Q4=57)
# Hudas(name='Ack', team='A', Q1=57, Q2=60, Q3=18, Q4=84)
# Hudas(name='Eorge', team='C', Q1=93, Q2=96, Q3=71, Q4=78)
# Hudas(name='Oah', team='D', Q1=65, Q2=49, Q3=61, Q4=86)

# 使用数据
for row in df.itertuples():
    print(row.Index, row.name)

# 0 liver
# 1 Arry
# 2 Ack
# 3 Eorge
# 4 Oah

4 df.items() 

df.items()和df.iteritems()功能相同,它迭代时返回一个(列名,本列的Series结构数据),实现对列的迭代

如果需要对Series的数据再进行迭代,可嵌套for循环

import pandas as pd
df = pd.DataFrame([['liver','E',89,21,24,64],
                   ['Arry','C',36,37,37,57],
                   ['Ack','A',57,60,18,84],
                   ['Eorge','C',93,96,71,78],
                   ['Oah','D',65,49,61,86]
                  ], 
                   columns = ['name','team','Q1','Q2','Q3','Q4'])

# Series取前三个
for label, ser in df.items():
    print(label)
    print(ser[:3], end='\n\n')

# 输出结果:------------------------------------------------------------------------------
name
0    liver
1     Arry
2      Ack
Name: name, dtype: object

team
0    E
1    C
2    A
Name: team, dtype: object

Q1
0    89
1    36
2    57
Name: Q1, dtype: int64

Q2
0    21
1    37
2    60
Name: Q2, dtype: int64

Q3
0    24
1    37
2    18
Name: Q3, dtype: int64

Q4
0    64
1    57
2    84
Name: Q4, dtype: int64

#----------------------------------------------------------------------------------------

5 按列迭代 

除了df.items(),如需要迭代一个DataFrame的列,可以直接对DataFrame迭代,会循环得到列名

import pandas as pd

df = pd.DataFrame([['liver','E',89,21,24,64],
                   ['Arry','C',36,37,37,57],
                   ['Ack','A',57,60,18,84],
                   ['Eorge','C',93,96,71,78],
                   ['Oah','D',65,49,61,86]
                  ], 
                   columns = ['name','team','Q1','Q2','Q3','Q4'])

# 直接对DataFrame迭代
for column in df:
    print(column)

# 输出结果:
# name
# team
# Q1
# Q2
# Q3
# Q4

# 再利用df[列名]的方法迭代列
# 依次取出每个列
for column in df:
    print(df[column])

# 输出结果:------------------------------------------------------------------------------
0    liver
1     Arry
2      Ack
3    Eorge
4      Oah
Name: name, dtype: object
0    E
1    C
2    A
3    C
4    D
Name: team, dtype: object
0    89
1    36
2    57
3    93
4    65
Name: Q1, dtype: int64
0    21
1    37
2    60
3    96
4    49
Name: Q2, dtype: int64
0    24
1    37
2    18
3    71
4    61
Name: Q3, dtype: int64
0    64
1    57
2    84
3    78
4    86
Name: Q4, dtype: int64

#----------------------------------------------------------------------------------------

# 可对每个列的内容进行迭代:
for column in df:
    for i in df[column]:
        print(i)

# 输出结果:------------------------------------------------------------------------------
liver
Arry
Ack
Eorge
Oah
E
C
A
C
D
89
36
57
93
65
21
37
60
96
49
24
37
18
71
61
64
57
84
78
86

#----------------------------------------------------------------------------------------

# 可以迭代指定列
for i in df.name:
    print(i)

# 输出结果:
# liver
# Arry
# Ack
# Eorge
# Oah

# 只迭代想要的列
l = ['name','Q1']
cols = df.columns.intersection(l)
for col in cols:
    print(col)

# 输出结果:
# name
# Q1

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值