DataFrame数据值遍历访问方法

六月闻君

已于 2023-09-12 08:59:09 修改

阅读量427

点赞数

分类专栏： pandas Python 文章标签： python pandas

于 2023-09-12 08:43:53 首次发布

本文链接：https://blog.csdn.net/qq_39065491/article/details/132822000

版权

Python 同时被 2 个专栏收录

68 篇文章 4 订阅

订阅专栏

pandas

11 篇文章 0 订阅

订阅专栏

DataFrame数据值遍历访问方法

遍历查询或修改dataframe的元素值，通常使用loc和iloc函数实现定位元素，对比在遍历应用中，loc和iloc的使用方法和区别。

1. loc和iloc使用

loc函数：通过行索引 “Index” 中的具体值来取行数据（按column名访问，或者通过条件访问）
iloc函数：通过行号来取行数据（按行号和列号访问，不能用列名访问）

注：loc是location的意思，iloc中的i是integer的意思，仅接受整数作为参数。

loc官网说明，主要用于列标签访问

.loc is primarily label based, but may also be used with a boolean
array.

A single label, e.g. 5 or ‘a’ (Note that 5 is interpreted as a label of the index. This use is not an integer position along the index.).
A list or array of labels [‘a’, ‘b’, ‘c’].
A slice object with labels ‘a’:‘f’ (Note that contrary to usual Python slices, both the start and the stop are included, when present in the index! See Slicing with labels and Endpoints are inclusive.)
A boolean array (any NA values will be treated as False).
A callable function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above).

iloc官网说明，主要基于整数位置访问。

.iloc is primarily integer position based (from 0 to length-1 of the
axis), but may also be used with a boolean array. .iloc will raise
IndexError if a requested indexer is out-of-bounds, except slice
indexers which allow out-of-bounds indexing. (this conforms with
Python/NumPy slice semantics). Allowed inputs are:

An integer e.g. 5.
A list or array of integers [4, 3, 0].
A slice object with ints 1:7.
A boolean array (any NA values will be treated as False).
A callable function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above).

2.loc数据值遍历

（1）基于日期索引

#日期做索引
df = pd.DataFrame(abs(np.random.randn(10, 4)), index=pd.date_range('1/1/2023', periods=10), 
                  columns=list('ABCD')) 
df.index.name='date'
i = 1
#按日期索引遍历
for d in df.index :
	#偶数行A列赋值0
    if i%2 ==0 :
        df.loc[df.index==d,'A'] = 0
        
    i += 1
#索引列值最小值A列赋值100
df.loc[df.index==df.index.min()  ,['A']]=100

print(df)
pre=100
#A列如果是0 ，向下赋值
for d in df.index :
    if df.loc[d,'A']== 0.0 :
        df.loc[d,'A']=pre
    pre=df.loc[d,'A']

print(df)

效果如下：

                     A         B         C         D
date                                                
2023-01-01  100.000000  1.532303  1.667700  0.799870
2023-01-02    0.000000  0.183089  1.239692  1.321370
2023-01-03    0.741102  0.046467  1.132106  0.019921
2023-01-04    0.000000  1.051709  0.236322  1.521744
2023-01-05    0.385593  0.533345  0.762100  0.654683
2023-01-06    0.000000  1.244160  0.433445  1.050108
2023-01-07    0.243023  2.255967  0.165955  0.287973
2023-01-08    0.000000  0.454799  1.382565  0.732341
2023-01-09    0.497839  0.371737  0.366683  0.524772
2023-01-10    0.000000  0.677077  0.542580  1.384272
                     A         B         C         D
date                                                
2023-01-01  100.000000  1.532303  1.667700  0.799870
2023-01-02  100.000000  0.183089  1.239692  1.321370
2023-01-03    0.741102  0.046467  1.132106  0.019921
2023-01-04    0.741102  1.051709  0.236322  1.521744
2023-01-05    0.385593  0.533345  0.762100  0.654683
2023-01-06    0.385593  1.244160  0.433445  1.050108
2023-01-07    0.243023  2.255967  0.165955  0.287973
2023-01-08    0.243023  0.454799  1.382565  0.732341
2023-01-09    0.497839  0.371737  0.366683  0.524772
2023-01-10    0.497839  0.677077  0.542580  1.384272

（2）默认索引

df = pd.DataFrame(abs(np.random.randn(10, 4)), columns=list('ABCD')) 
print(df)
df.index.name='no'
#默认数字序列索引
for i in df.index :
    df.loc[df.index==i,['A','C']] = i

#D列赋值空值
df['D']= np.NaN
#D列偶数行赋值
for i in df.index :
    if i%2  :
        df.loc[df.index==i,['D']] = i

print(df)
#用fillna函数向前填充      
df1 = df['D'].fillna(method='bfill')
print(df1)

效果如下：

         A         B         C         D
0  1.355061  1.784947  0.530280  0.343836
1  0.591961  1.587958  0.700280  0.096845
2  0.945876  1.036163  0.903821  0.161356
3  1.144042  1.162818  0.148023  1.971303
4  0.424846  0.960678  0.891586  1.687668
5  0.441317  2.275049  0.168477  0.297483
6  0.791475  0.894168  1.309116  1.826531
7  0.349400  0.878078  1.748874  2.238486
8  0.501033  0.608020  0.346233  2.553355
9  0.795990  1.267664  0.565392  1.510390
      A         B    C    D
no                         
0   0.0  1.784947  0.0  NaN
1   1.0  1.587958  1.0  1.0
2   2.0  1.036163  2.0  NaN
3   3.0  1.162818  3.0  3.0
4   4.0  0.960678  4.0  NaN
5   5.0  2.275049  5.0  5.0
6   6.0  0.894168  6.0  NaN
7   7.0  0.878078  7.0  7.0
8   8.0  0.608020  8.0  NaN
9   9.0  1.267664  9.0  9.0
no
0    1.0
1    1.0
2    3.0
3    3.0
4    5.0
5    5.0
6    7.0
7    7.0
8    9.0
9    9.0
Name: D, dtype: float64

3.iloc数据值遍历

（1）基于日期和默认索引

基于日期和默认索引，因为只能使用位置参数，所以索引无差别。

df = pd.DataFrame(abs(np.random.randn(10, 4)), index=pd.date_range('1/1/2023', periods=10), 
                  columns=list('ABCD')) 
df.index.name='date'
#按行数循环
for i in range(df.shape[0]) :
    #偶数行，2,3列是C、D列
    if i%2 == 0 :
        df.iloc[[i],[2,3]] = 0

#第一行的CD列赋值100
df.iloc[[0],[2,3]] = 100        
#
#print(df.iloc[[0],[3]].values[0][0])
print(df)

for i in range(df.shape[0]) :
	# 偶数行赋值C列向下填充  
    if df.iloc[[i],[2]].values[0][0] == 0.0 :
        df.iloc[[i],[2]]=pre_c
       
    pre_c=df.iloc[[i],[2]]

  	# 偶数行赋值D列向下填充  
    if df.iloc[[i],[3]].values[0][0] == 0.0 :
        df.iloc[[i],[3]]=pre_d
       
    pre_d=df.iloc[[i],[3]]
    
print(df)

效果如下：

                   A         B           C           D
date                                                  
2023-01-01  1.040645  0.369780  100.000000  100.000000
2023-01-02  1.850851  1.422875    0.066909    1.137934
2023-01-03  0.321779  0.376273    0.000000    0.000000
2023-01-04  0.316248  1.198039    1.707555    0.539617
2023-01-05  0.350327  0.144577    0.000000    0.000000
2023-01-06  0.396593  1.054268    0.791154    0.898749
2023-01-07  0.685409  1.286553    0.000000    0.000000
2023-01-08  0.366570  0.997236    1.534733    0.689972
2023-01-09  0.417907  0.823729    0.000000    0.000000
2023-01-10  1.316604  0.867192    0.514058    0.945503
                   A         B           C           D
date                                                  
2023-01-01  1.040645  0.369780  100.000000  100.000000
2023-01-02  1.850851  1.422875    0.066909    1.137934
2023-01-03  0.321779  0.376273    0.066909    1.137934
2023-01-04  0.316248  1.198039    1.707555    0.539617
2023-01-05  0.350327  0.144577    1.707555    0.539617
2023-01-06  0.396593  1.054268    0.791154    0.898749
2023-01-07  0.685409  1.286553    0.791154    0.898749
2023-01-08  0.366570  0.997236    1.534733    0.689972
2023-01-09  0.417907  0.823729    1.534733    0.689972
2023-01-10  1.316604  0.867192    0.514058    0.945503

注意：
df.iloc[[0],[3]]的数据类型

print(type(df.iloc[[0],[3]])) 
print(type(df.iloc[[0],[3]].values[0])) 
print(type(df.iloc[[0],[3]].values[0][0]))

分别是dataframe，数组，浮点

<class 'pandas.core.frame.DataFrame'>
<class 'numpy.ndarray'>
<class 'numpy.float64'>

4.总结对比

（1）loc

行根据行标签，可以索引条件筛选，列根据列标签筛选
如果选取的是所有行或者所有列，可以用：代替
行标签选取的时候，两端都包含，默认索引时[0:3]指的是0，1，2，3一共4列
用列标签指定列，程序可读性好

（2）iloc

iloc基于位置索引，行列都是从0开始的。
iloc无法用索引条件筛选，如索引是日期类型，无法选择日期范围。
行标签选取，左包括，右不包括
如果dataframe列数多，对应选列值不太方便，程序可读性差

对比说明：

print(df.iloc[0:2,1:3])
print(df.loc[0:2,['C','D']])

iloc，行选择0:2，不包括第二行，1:3列，1-3对应BCD列，筛选后不包括D列
loc，行选择0:2，包括第二行。
结果：

           B    C
no               
0   1.141945  0.0
1   1.010452  1.0
      C    D
no          
0   0.0  NaN
1   1.0  1.0
2   2.0  NaN

六月闻君

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
DataFrame数据值遍历访问方法

遍历查询或修改dataframe的元素值，通常使用loc和iloc函数实现定位元素，对比在遍历应用中，loc和iloc的使用方法和区别。
复制链接

扫一扫

专栏目录

DataFrame数据值遍历访问方法

DataFrame数据值遍历访问方法

1. loc和iloc使用

2.loc数据值遍历

（1）基于日期索引

（2）默认索引

3.iloc数据值遍历

（1）基于日期和默认索引

4.总结对比

（1）loc

（2）iloc

“相关推荐”对你有帮助么？