Linux 写命令提取 log 中的 acc 和 loss

每个人的 log 长相不一样,都是随时调整,但是基本离不开 cat grep awk 这几个命令

log 长相

2020-12-04 06:02:47 INFO - 12/04/20 14:02:46 - 3 days, 3:31:31 -  360000 -  112.63 sent/s -  4279.79 words/s - MLM-ar:  2.3477 || MLM-de:  1.7728 || MLM-el:  1.5244 || MLM-en:  1.7175 || MLM-es:  1.7692 || MLM-fr:  1.4595 || MLM-ru:  1.4033 || MLM-th:  2.5566 -  - model LR: 9.4301e-06
2020-12-04 06:02:47 INFO - 12/04/20 14:02:46 - 3 days, 3:31:29 -  360000 -  112.62 sent/s -  4266.22 words/s - MLM-ar:  2.1328 || MLM-de:  1.7043 || MLM-el:  1.7100 || MLM-en:  1.5283 || MLM-es:  1.5969 || MLM-fr:  1.5846 || MLM-hi:  2.2598 || MLM-ru:  1.4770 || MLM-zh:  2.4414 -  - model LR: 9.4301e-06
2020-12-04 06:02:47 INFO - 12/04/20 14:02:46 - 3 days, 3:31:28 -  360000 -  112.63 sent/s -  4273.08 words/s - MLM-ar:  2.2959 || MLM-de:  1.6400 || MLM-en:  1.5591 || MLM-es:  1.6094 || MLM-fr:  1.6204 || MLM-ru:  1.7168 || MLM-tr:  2.2637 || MLM-vi:  2.0249 -  - model LR: 9.4301e-06
2020-12-04 06:02:47 INFO - 12/04/20 14:02:46 - 3 days, 3:31:32 -  360000 -  112.64 sent/s -  4302.83 words/s - MLM-ar:  2.2988 || MLM-bg:  1.7324 || MLM-de:  1.8081 || MLM-el:  1.9897 || MLM-en:  1.6924 || MLM-es:  1.6566 || MLM-fr:  1.4469 || MLM-hi:  2.2031 || MLM-ru:  1.5732 || MLM-sw:  2.3613 || MLM-tr:  2.0488 || MLM-ur:  2.5059 || MLM-vi:  1.9072 -  - model LR: 9.4301e-06
2020-12-04 06:02:47 INFO - 12/04/20 14:02:46 - 3 days, 3:31:30 - ============ End of epoch 35 ============
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - epoch -> 35.000000
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - valid_tr_mlm_loss -> 19798.168945
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - valid_tr_mlm_ppl -> 10.865678
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - valid_tr_mlm_acc -> 55.536812
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - valid_ur_mlm_loss -> 17430.359924
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - valid_ur_mlm_ppl -> 11.203040
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - valid_ur_mlm_acc -> 54.893263
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - valid_vi_mlm_loss -> 14435.647949
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - valid_vi_mlm_ppl -> 6.984347
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - valid_vi_mlm_acc -> 60.818635
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - valid_zh_mlm_loss -> 24963.236816
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - valid_zh_mlm_ppl -> 11.896758
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - valid_zh_mlm_acc -> 53.506597
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - valid_mlm_loss -> 19229.647253
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - valid_mlm_ppl -> 8.562056
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - valid_mlm_acc -> 59.772871
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - test_tr_mlm_loss -> 10030.201538
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - test_tr_mlm_ppl -> 10.662468
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - test_tr_mlm_acc -> 55.356300
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - test_ur_mlm_loss -> 9371.606445
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - test_ur_mlm_ppl -> 12.163368
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - test_ur_mlm_acc -> 55.238603
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - test_vi_mlm_loss -> 7488.517090
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - test_vi_mlm_ppl -> 7.186609
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - test_vi_mlm_acc -> 61.706610
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - test_zh_mlm_loss -> 12373.904297
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - test_zh_mlm_ppl -> 11.967894
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - test_zh_mlm_acc -> 53.941825
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - test_mlm_loss -> 9702.032363
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - test_mlm_ppl -> 8.654818
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - test_mlm_acc -> 60.052012
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - New best validation score: 8.562056

以上是删减过的,主要是看个乱的结构。

目的

目的是拿出最后每个 epoch 的 test_mlm_acc 并存到另一个地方

命令

cat baseline.txt | grep 'test_mlm_acc -> ' | awk  '{print $NF}' | uniq >baseline_acc.txt

其中 baseline.txt 就是原始 log 文件,cat 就是 catch 所有内容,拿来第一个 grep 找到 test_mlm_ppl 所在的行(grep 命令是抓出满足条件的行),把这些行再送给 awk,这里 $NF 是指最后一个字段,就是 log 中的数值。最后因为这是多卡跑的,所以输出重复了(下次要设置 master 输出结果…),需要用 uniq 消除重复行,注意这里只是消除上下文重复的,如果想消除所有重复的需要先 sort 一下。最后箭头就是输出到这个文件里来。

其他

后来又碰到了其他需求,懒得开新文章了,都贴在这里:

获取某个文件中第几行的最后一个字符:

head -n_line filename.txt | tail -1 | awk -F '' '$0 =$NF'

删除文件中每行最后一个字符:

cat fname.txt | sed 's/.$//' > filename.txt

删除文件中带 “inside apex” 的行:

cat single_data.txt | sed 's/.$//' > s_data.txt

获取每行第54到第58个字符:

cat funnel-codebase.txt | grep 'nll_loss=' | cut -c54-58 >funnelcodebase_acc.txt

评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值