用pandas读取csv文件,并跳过不规则的非数值行,计算df列的平均值
最近在用pandas处理csv文件时,发现一个头大的问题:系统导出的csv文件表头之前有些不需要的行,这些行在每个csv文件中的数目都不一样
一、待处理的网元csv日志文件
目录:Lange_N41_RSRP0309
CDL-A CASE0.csv
网元名称,BJIGNB01_turn
任务类型,性能监测-小区性能监测
保存时间,2022-03-09 11:01:23
网元版本,BTS5900 V100R017C00SPC100
时间,"NR DU小区标识","下行RLC总吞吐率(bps)","上行RLC总吞吐率(bps)","下行MAC总吞吐率(bps)","上行MAC总吞吐率(bps)"
03-09 11:01:21(949),"21","1101284672","964120","1219751160","1149152"
03-09 11:01:22(969),"21","1086584088","914360","1207563360","1206648"
03-09 11:01:23(949),"21","1093880872","924128","1216729816","1164952"
03-09 11:01:24(949),"21","1081807848","934496","1204736448","1252160"
03-09 11:01:25(969),"21","1054864328","904768","1167371600","1196128"
03-09 11:01:26(939),"21","998016480","976184","1112138088","1240240"
03-09 11:01:27(949),"21","978282432","910072","1096194072","1166848"
03-09 11:01:28(939),"21","976951624","841608","1077764224","1134752"
03-09 11:01:29(938),"21","1026227488","932736","1153665256","1188672"
03-09 11:01:30(939),"21","1022991936","967576","1141611488","1231488"
03-09 11:01:31(949),"21","1038911560","896408","1150961320","1179952"
03-09 11:01:32(969),"21","1078508792","902184","1205576336","1201392"
03-09 11:01:33(966),"21","1056336608","923544","1196652776","1211680"
03-09 11:01:34(966),"21","1067465240","1009912","1166485264","1281136"
03-09 11:01:35(949),"21","1096801368","943936","1210285464","1221208"
03-09 11:01:36(959),"21","1092690616","926336","1218678328","1203920"
03-09 11:01:37(959),"21","1070899552","907096","1195140520","1192960"
03-09 11:01:38(969),"21","1071070040","928384","1185417888","1202424"
03-09 11:01:39(949),"21","1073769536","939680","1188301792","1211008"
03-09 11:01:40(949),"21","1024114560","920208","1142975656","1174312"
03-09 11:01:41(969),"21","978075368","913864","1096556272","1192792"
03-09 11:01:42(936),"21","959354232","909072","1068592088","1172376"
03-09 11:01:43(946),"21","959739176","926232","1061650296","1177160"
03-09 11:01:44(949),"21","942416528","874848","1071083096","1135632"
03-09 11:01:45(938),"21","879348336","887456","974437352","1121584"
二、跳过csv前面不规则的几行
read_csv_pandas_avg.py
# -*- coding: UTF-8 -*-
# 自动读取“网页版csv日志格式”脚本
import os
import pandas as pd
# 给定待读取的csv文件的路径到列表里
csv_filepath = r"D:\\myproject\\read_csv_calculate_avg\\report\\Lange_N41_RSRP0309\\CDL-A CASE0.csv"
# df1 = df.iloc[:, 0:4] #读取第1列到第4列数据
###1.跳过每个csv文件开头不需要的行的函数
def skip_to(fle,**kwargs):
if os.stat(fle).st_size == 0:
raise ValueError("File is empty")
with open(fle) as f:
pos = 0
cur_line = f.readline()
while not cur_line.find('下行MAC总吞吐率(bps)","上行MAC总吞吐率(bps)')>=0:
pos = f.tell()
cur_line = f.readline()
f.seek(pos)
return pd.read_csv(f, **kwargs)
# 2.读取csv文件到内存
df = skip_to(csv_filepath,encoding = 'gbk') # 读取csv用模块skip_to()
df = df["下行MAC总吞吐率(bps)"]
avg = df.mean() #求一列的平均值
print(df)
print(avg)