DataFrame数据值遍历访问方法

DataFrame数据值遍历访问方法

遍历查询或修改dataframe的元素值,通常使用loc和iloc函数实现定位元素,对比在遍历应用中,loc和iloc的使用方法和区别。

1. loc和iloc使用

loc函数:通过行索引 “Index” 中的具体值来取行数据(按column名访问,或者通过条件访问)
iloc函数:通过行号来取行数据(按行号和列号访问,不能用列名访问)

注:loc是location的意思,iloc中的i是integer的意思,仅接受整数作为参数。

loc官网说明,主要用于列标签访问

.loc is primarily label based, but may also be used with a boolean
array.

  • A single label, e.g. 5 or ‘a’ (Note that 5 is interpreted as a label of the index. This use is not an integer position along the index.).
  • A list or array of labels [‘a’, ‘b’, ‘c’].
  • A slice object with labels ‘a’:‘f’ (Note that contrary to usual Python slices, both the start and the stop are included, when present in the index! See Slicing with labels and Endpoints are inclusive.)
  • A boolean array (any NA values will be treated as False).
  • A callable function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above).

iloc官网说明,主要基于整数位置访问。

.iloc is primarily integer position based (from 0 to length-1 of the
axis), but may also be used with a boolean array. .iloc will raise
IndexError if a requested indexer is out-of-bounds, except slice
indexers which allow out-of-bounds indexing. (this conforms with
Python/NumPy slice semantics). Allowed inputs are:

  • An integer e.g. 5.
  • A list or array of integers [4, 3, 0].
  • A slice object with ints 1:7.
  • A boolean array (any NA values will be treated as False).
  • A callable function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above).

2.loc数据值遍历

(1)基于日期索引

#日期做索引
df = pd.DataFrame(abs(np.random.randn(10, 4)), index=pd.date_range('1/1/2023', periods=10), 
                  columns=list('ABCD')) 
df.index.name='date'
i = 1
#按日期索引遍历
for d in df.index :
	#偶数行A列赋值0
    if i%2 ==0 :
        df.loc[df.index==d,'A'] = 0
        
    i += 1
#索引列值最小值A列赋值100
df.loc[df.index==df.index.min()  ,['A']]=100

print(df)
pre=100
#A列如果是0 ,向下赋值
for d in df.index :
    if df.loc[d,'A']== 0.0 :
        df.loc[d,'A']=pre
    pre=df.loc[d,'A']

print(df)

效果如下:

                     A         B         C         D
date                                                
2023-01-01  100.000000  1.532303  1.667700  0.799870
2023-01-02    0.000000  0.183089  1.239692  1.321370
2023-01-03    0.741102  0.046467  1.132106  0.019921
2023-01-04    0.000000  1.051709  0.236322  1.521744
2023-01-05    0.385593  0.533345  0.762100  0.654683
2023-01-06    0.000000  1.244160  0.433445  1.050108
2023-01-07    0.243023  2.255967  0.165955  0.287973
2023-01-08    0.000000  0.454799  1.382565  0.732341
2023-01-09    0.497839  0.371737  0.366683  0.524772
2023-01-10    0.000000  0.677077  0.542580  1.384272
                     A         B         C         D
date                                                
2023-01-01  100.000000  1.532303  1.667700  0.799870
2023-01-02  100.000000  0.183089  1.239692  1.321370
2023-01-03    0.741102  0.046467  1.132106  0.019921
2023-01-04    0.741102  1.051709  0.236322  1.521744
2023-01-05    0.385593  0.533345  0.762100  0.654683
2023-01-06    0.385593  1.244160  0.433445  1.050108
2023-01-07    0.243023  2.255967  0.165955  0.287973
2023-01-08    0.243023  0.454799  1.382565  0.732341
2023-01-09    0.497839  0.371737  0.366683  0.524772
2023-01-10    0.497839  0.677077  0.542580  1.384272

(2)默认索引

df = pd.DataFrame(abs(np.random.randn(10, 4)), columns=list('ABCD')) 
print(df)
df.index.name='no'
#默认数字序列索引
for i in df.index :
    df.loc[df.index==i,['A','C']] = i

#D列赋值空值
df['D']= np.NaN
#D列偶数行赋值
for i in df.index :
    if i%2  :
        df.loc[df.index==i,['D']] = i

print(df)
#用fillna函数向前填充      
df1 = df['D'].fillna(method='bfill')
print(df1)

效果如下:

         A         B         C         D
0  1.355061  1.784947  0.530280  0.343836
1  0.591961  1.587958  0.700280  0.096845
2  0.945876  1.036163  0.903821  0.161356
3  1.144042  1.162818  0.148023  1.971303
4  0.424846  0.960678  0.891586  1.687668
5  0.441317  2.275049  0.168477  0.297483
6  0.791475  0.894168  1.309116  1.826531
7  0.349400  0.878078  1.748874  2.238486
8  0.501033  0.608020  0.346233  2.553355
9  0.795990  1.267664  0.565392  1.510390
      A         B    C    D
no                         
0   0.0  1.784947  0.0  NaN
1   1.0  1.587958  1.0  1.0
2   2.0  1.036163  2.0  NaN
3   3.0  1.162818  3.0  3.0
4   4.0  0.960678  4.0  NaN
5   5.0  2.275049  5.0  5.0
6   6.0  0.894168  6.0  NaN
7   7.0  0.878078  7.0  7.0
8   8.0  0.608020  8.0  NaN
9   9.0  1.267664  9.0  9.0
no
0    1.0
1    1.0
2    3.0
3    3.0
4    5.0
5    5.0
6    7.0
7    7.0
8    9.0
9    9.0
Name: D, dtype: float64

3.iloc数据值遍历

(1)基于日期和默认索引

基于日期和默认索引,因为只能使用位置参数,所以索引无差别。

df = pd.DataFrame(abs(np.random.randn(10, 4)), index=pd.date_range('1/1/2023', periods=10), 
                  columns=list('ABCD')) 
df.index.name='date'
#按行数循环
for i in range(df.shape[0]) :
    #偶数行,2,3列是C、D列
    if i%2 == 0 :
        df.iloc[[i],[2,3]] = 0

#第一行的CD列赋值100
df.iloc[[0],[2,3]] = 100        
#
#print(df.iloc[[0],[3]].values[0][0])
print(df)

for i in range(df.shape[0]) :
	# 偶数行赋值C列向下填充  
    if df.iloc[[i],[2]].values[0][0] == 0.0 :
        df.iloc[[i],[2]]=pre_c
       
    pre_c=df.iloc[[i],[2]]

  	# 偶数行赋值D列向下填充  
    if df.iloc[[i],[3]].values[0][0] == 0.0 :
        df.iloc[[i],[3]]=pre_d
       
    pre_d=df.iloc[[i],[3]]
    
print(df)           

效果如下:

                   A         B           C           D
date                                                  
2023-01-01  1.040645  0.369780  100.000000  100.000000
2023-01-02  1.850851  1.422875    0.066909    1.137934
2023-01-03  0.321779  0.376273    0.000000    0.000000
2023-01-04  0.316248  1.198039    1.707555    0.539617
2023-01-05  0.350327  0.144577    0.000000    0.000000
2023-01-06  0.396593  1.054268    0.791154    0.898749
2023-01-07  0.685409  1.286553    0.000000    0.000000
2023-01-08  0.366570  0.997236    1.534733    0.689972
2023-01-09  0.417907  0.823729    0.000000    0.000000
2023-01-10  1.316604  0.867192    0.514058    0.945503
                   A         B           C           D
date                                                  
2023-01-01  1.040645  0.369780  100.000000  100.000000
2023-01-02  1.850851  1.422875    0.066909    1.137934
2023-01-03  0.321779  0.376273    0.066909    1.137934
2023-01-04  0.316248  1.198039    1.707555    0.539617
2023-01-05  0.350327  0.144577    1.707555    0.539617
2023-01-06  0.396593  1.054268    0.791154    0.898749
2023-01-07  0.685409  1.286553    0.791154    0.898749
2023-01-08  0.366570  0.997236    1.534733    0.689972
2023-01-09  0.417907  0.823729    1.534733    0.689972
2023-01-10  1.316604  0.867192    0.514058    0.945503

注意:
df.iloc[[0],[3]]的数据类型

print(type(df.iloc[[0],[3]])) 
print(type(df.iloc[[0],[3]].values[0])) 
print(type(df.iloc[[0],[3]].values[0][0])) 

分别是dataframe,数组,浮点

<class 'pandas.core.frame.DataFrame'>
<class 'numpy.ndarray'>
<class 'numpy.float64'>

4.总结对比

(1)loc

  • 行根据行标签,可以索引条件筛选,列根据列标签筛选
  • 如果选取的是所有行或者所有列,可以用:代替
  • 行标签选取的时候,两端都包含,默认索引时[0:3]指的是0,1,2,3一共4列
  • 用列标签指定列,程序可读性好

(2)iloc

  • iloc基于位置索引,行列都是从0开始的。
  • iloc无法用索引条件筛选,如索引是日期类型,无法选择日期范围。
  • 行标签选取,左包括,右不包括
  • 如果dataframe列数多,对应选列值不太方便,程序可读性差

对比说明:

print(df.iloc[0:2,1:3])
print(df.loc[0:2,['C','D']])

iloc,行选择0:2,不包括第二行,1:3列,1-3对应BCD列,筛选后不包括D列
loc,行选择0:2,包括第二行。
结果:

           B    C
no               
0   1.141945  0.0
1   1.010452  1.0
      C    D
no          
0   0.0  NaN
1   1.0  1.0
2   2.0  NaN
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值