DataFrame 遍历访问方法

DataFrame 遍历访问方法

1. 数据准备

(1)测试数据

构建一个有index的dataframe 数据。

import numpy as np
import pandas as pd

ts = pd.Series(np.random.randn(10), index=pd.date_range('2020-1-1', periods=10))
df = pd.DataFrame(np.random.randn(10, 4), index=ts.index, columns=list('ABCD')) 
df

在这里插入图片描述

(2)pandas版本

检查pandas版本

print(pd.__version__)

2.0.3

2.访问方法

常用的一共五种方法,可以遍历dataframe数据。

(1)iterrows

通过iterrows方法,可以提取index,行记录。

for index ,row in df.iterrows() :
    print(index,row['A'],row['D']) 

2020-01-01 00:00:00 0.3641823474478886 0.7420267293577939
2020-01-02 00:00:00 -0.9086858514122141 -0.21529516253391381
2020-01-03 00:00:00 1.0707335521425283 -0.8495555020555525
2020-01-04 00:00:00 -0.9104436159077746 -1.7704251732279581
2020-01-05 00:00:00 1.6091084193842462 0.5594481402153169
2020-01-06 00:00:00 0.04828934029765889 -2.078443945278677
2020-01-07 00:00:00 -0.7111418530010771 -1.29587734532037
2020-01-08 00:00:00 0.20754578301393778 -0.39078747556747734
2020-01-09 00:00:00 1.0997255380859803 0.4272308690661768
2020-01-10 00:00:00 0.28544790543277 -0.37501666198259165

看一下数据类型,index 是pandas类型的子类
row是series ,可以通过列名调用。

print(type(index))
print(type(row['A']))
print(type(row))

<class 'pandas._libs.tslibs.timestamps.Timestamp'>
<class 'numpy.float64'>
<class 'pandas.core.series.Series'>
(2)loc

通过index索引,组合列名访问,用loc方法

for row in df.index:
    print(df.loc[row]['A'])

0.3641823474478886
-0.9086858514122141
1.0707335521425283
-0.9104436159077746
1.6091084193842462
0.04828934029765889
-0.7111418530010771
0.20754578301393778
1.0997255380859803
0.28544790543277    
(3)iloc

通过shape取行数,用iloc行标,结合列名,遍历数据

for row_id in range(df.shape[0]):
    print(df.iloc[row_id]['B'])

0.2437495579604519
0.2828630441432169
0.5036532101096077
-0.9921045754369142
-0.18953453071322154
-0.17631832794049856
-1.1557403411733949
-1.9230766108049244
0.9827603665898592
1.5838796545007081
(4)itertuples

通过itertuples方法将行转换为tuple 类型,然后访问。
0列是索引,1列对应列名A ,3列对应列名C

for tup in df.itertuples():
    print(tup[0],tup[1],tup[3])

2020-01-01 00:00:00 0.3641823474478886 -0.5538779087811666
2020-01-02 00:00:00 -0.9086858514122141 -1.7114951319715501
2020-01-03 00:00:00 1.0707335521425283 -0.48885052901155274
2020-01-04 00:00:00 -0.9104436159077746 -0.9516150263977505
2020-01-05 00:00:00 1.6091084193842462 -1.0851994280481798
2020-01-06 00:00:00 0.04828934029765889 0.9085265155873162
2020-01-07 00:00:00 -0.7111418530010771 2.1446364650140746
2020-01-08 00:00:00 0.20754578301393778 0.4748462568719993
2020-01-09 00:00:00 1.0997255380859803 -1.0555296783745742
2020-01-10 00:00:00 0.28544790543277 2.288507229443556

直接打印元组数据,效果如下:

for tup in df.itertuples():
    print(tup)

Pandas(Index=Timestamp('2020-01-01 00:00:00'), A=0.3641823474478886, B=0.2437495579604519, C=-0.5538779087811666, D=0.7420267293577939)
Pandas(Index=Timestamp('2020-01-02 00:00:00'), A=-0.9086858514122141, B=0.2828630441432169, C=-1.7114951319715501, D=-0.21529516253391381)
Pandas(Index=Timestamp('2020-01-03 00:00:00'), A=1.0707335521425283, B=0.5036532101096077, C=-0.48885052901155274, D=-0.8495555020555525)
Pandas(Index=Timestamp('2020-01-04 00:00:00'), A=-0.9104436159077746, B=-0.9921045754369142, C=-0.9516150263977505, D=-1.7704251732279581)
Pandas(Index=Timestamp('2020-01-05 00:00:00'), A=1.6091084193842462, B=-0.18953453071322154, C=-1.0851994280481798, D=0.5594481402153169)
Pandas(Index=Timestamp('2020-01-06 00:00:00'), A=0.04828934029765889, B=-0.17631832794049856, C=0.9085265155873162, D=-2.078443945278677)
Pandas(Index=Timestamp('2020-01-07 00:00:00'), A=-0.7111418530010771, B=-1.1557403411733949, C=2.1446364650140746, D=-1.29587734532037)
Pandas(Index=Timestamp('2020-01-08 00:00:00'), A=0.20754578301393778, B=-1.9230766108049244, C=0.4748462568719993, D=-0.39078747556747734)
Pandas(Index=Timestamp('2020-01-09 00:00:00'), A=1.0997255380859803, B=0.9827603665898592, C=-1.0555296783745742, D=0.4272308690661768)
Pandas(Index=Timestamp('2020-01-10 00:00:00'), A=0.28544790543277, B=1.5838796545007081, C=2.288507229443556, D=-0.37501666198259165)    
(5)values

通过pandas的values属性,访问数据。
0123分别对应ABCD列,效果如下:

for row in df.values:
    print(row[0], '  ', row[1], '  ', row[2], '  ', row[3])

0.3641823474478886    0.2437495579604519    -0.5538779087811666    0.7420267293577939
-0.9086858514122141    0.2828630441432169    -1.7114951319715501    -0.21529516253391381
1.0707335521425283    0.5036532101096077    -0.48885052901155274    -0.8495555020555525
-0.9104436159077746    -0.9921045754369142    -0.9516150263977505    -1.7704251732279581
1.6091084193842462    -0.18953453071322154    -1.0851994280481798    0.5594481402153169
0.04828934029765889    -0.17631832794049856    0.9085265155873162    -2.078443945278677
-0.7111418530010771    -1.1557403411733949    2.1446364650140746    -1.29587734532037
0.20754578301393778    -1.9230766108049244    0.4748462568719993    -0.39078747556747734
1.0997255380859803    0.9827603665898592    -1.0555296783745742    0.4272308690661768
0.28544790543277    1.5838796545007081    2.288507229443556    -0.37501666198259165 

注意:row并不是list,是numpy.ndarray

for row in df.values:
    print(row)
print(type(row))   

[ 0.36418235  0.24374956 -0.55387791  0.74202673]
[-0.90868585  0.28286304 -1.71149513 -0.21529516]
[ 1.07073355  0.50365321 -0.48885053 -0.8495555 ]
[-0.91044362 -0.99210458 -0.95161503 -1.77042517]
[ 1.60910842 -0.18953453 -1.08519943  0.55944814]
[ 0.04828934 -0.17631833  0.90852652 -2.07844395]
[-0.71114185 -1.15574034  2.14463647 -1.29587735]
[ 0.20754578 -1.92307661  0.47484626 -0.39078748]
[ 1.09972554  0.98276037 -1.05552968  0.42723087]
[ 0.28544791  1.58387965  2.28850723 -0.37501666]
<class 'numpy.ndarray'>
(6)iteritems

网上还有不少介绍,还可以通过iteritems方法访问,但是报错。

for index, col in df.iteritems():
    print(index,col.iloc[0])
报错信息如下:
AttributeError: 'DataFrame' object has no attribute 'iteritems'

网上查询,是原来pandas低版本有iteritems方法,据说是在1.5.X版本上有,未验证。
2.0.X版本上肯定不支持此功能。

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值