python txt文本特定字符串提取

qq_29707567

已于 2022-06-23 10:48:39 修改

阅读量4.6k

点赞数 1

分类专栏： python 文章标签： java 服务器前端 python

于 2022-06-23 09:43:18 首次发布

本文链接：https://blog.csdn.net/qq_29707567/article/details/125420854

版权

正则表达式数据提取 txt日志性能指标 Python编程

关键词由CSDN通过智能技术生成

python 专栏收录该内容

27 篇文章 0 订阅

订阅专栏

方法一：

思路：将txt文件读成一个字符串，在整个字符串中进行匹配，匹配到的是数组，再从数组的对应位置取数据

txt样例：

------begin checkaccuracy---------
Validating batch 10
Validating batch 20
Validating batch 30
Validating batch 40
Validating batch 50
Validating batch 60
Total Top1 Accuracy: 70.40%
Total Top5 Accuracy: 89.20%
FPS is: 1144.3161883555188
Run with precision fp32, batchsize 16
---------202206221903 ---> 20220622-190406 Total:42 seconds

目标：需要提取TOP1、TOP5、FPS、以及最后一行的时长数据

代码实现：

top1=re.findall(r"Total Top1 Accuracy: (.+?)\n", content)[0]
top5=re.findall(r"Total Top5 Accuracy: (.+?)\n", content)[0]
fps=re.findall(r"FPS is: (.+?)\n", content)[0]
time = re.findall(r"Total:(.+?)\n", content)[0]

实现结果（写入到了excel中）：

FPS	TOP1	TOP5	time
1144.3161883555188	70.40%	89.20%	42 seconds
1038.7133196505442	70.40%	89.20%	46 seconds
1088.510786242209	70.40%	89.20%	75 seconds
not_completed	not_completed	not_completed	not_completed
not_completed	not_completed	not_completed	not_completed
74.5299639559841	0.099	0.263	35 seconds

方法二：

思路：将txt文件逐行读出，按照字符串中数据的特定切分字符串，如冒号，切分成多部分后取对应位置的数据，如果数据中包含不想要的字符，可以用空字符替代对应字符

txt样例-同上：

代码:

with open(file, 'r') as f:
lines = f.readlines()  # 读取所有行
first_line = lines[0]  # 取第一行
last_line = lines[-1]  # 取最后一行
last5_line = lines[-5]  # 取最后一行
str="FPS"
if str in last5_line:
    FPS=last5_line.split(": ")[1]
    top1 = lines[-7].split(": ")[-1]
    top5 = lines[-6].split(": ")[-1]
    #top1 = last_line.split(' ')[-2].split(":")[1].replace('Prec', '')
    #top5 = last_line.split(' ')[-1].split(":")[-1]
    sheet.write(i, 3, FPS)
    sheet.write(i, 4, top1)
    sheet.write(i, 5, top5)

处理结果：同上

方法三：正则匹配--todo