逐行修改dataframe某列的每个元素遍历dataframe的每个元素

本文链接：https://blog.csdn.net/weixin_43789661/article/details/125696496

遍历dataframe的方式有两种，一种是使用df_main.loc另一种是使用df_main.iterrows()。我的目的是将df中某列的元素全部修改或按判断条件决定是否修改。比如说将某列的每个元素都只取前四位，使用loc函数遍历智能是将列或行按要求遍历提取出来，一次性会将所有元素遍历结束，无法做到逐一修改，因此本例使用的是iterrows，可以逐行遍历也可以逐列遍历使用iteriterms，自行搜索，用法很多。非常简单，此贴仅做代码保存记录。

将time列的每个元素只取【0，5】替换原列表的值，代码如下：

import os
import re
import difflib
import pandas as pd

# 读取文件
datafile = r'C:\Users\Administrator\Desktop\5.csv'
data = pd.read_csv(datafile,encoding = 'gb2312')
print(data)

df_main = data[data['pid']==279066].reset_index(drop=True)#为什么有时候45716不用带引号
print('----------------------')
print(df_main)
print('----------------------')

'''#每个时间戳取平均值   loc的遍历是一次性全部遍历完，不是逐一修改每一值
t = []
t = df_main.loc[:, 'time']
print(t[0:3])'''

for index, row in df_main.iterrows():
    count = row['time']
    # print(count)
    t = []
    t = count[0:-3]
    print(t)
    df_main.loc[index, 'count'] = t
    df_main = df_main
print(df_main)
df_main = df_main.drop(["vsz","mem","pid","agent"], axis=1)

# df_main = df_main.loc(:,["data","cpu","count"])
print(df_main)

#平均值处理
df_mean = df_main.groupby(['count'], as_index=False)['cpu'].sum()
print(df_mean)

#csv格式写入文件
df_mean.to_csv(r'C:\\cpusum.csv',index=False)