每个人的 log 长相不一样,都是随时调整,但是基本离不开 cat
grep
awk
这几个命令
log 长相
2020-12-04 06:02:47 INFO - 12/04/20 14:02:46 - 3 days, 3:31:31 - 360000 - 112.63 sent/s - 4279.79 words/s - MLM-ar: 2.3477 || MLM-de: 1.7728 || MLM-el: 1.5244 || MLM-en: 1.7175 || MLM-es: 1.7692 || MLM-fr: 1.4595 || MLM-ru: 1.4033 || MLM-th: 2.5566 - - model LR: 9.4301e-06
2020-12-04 06:02:47 INFO - 12/04/20 14:02:46 - 3 days, 3:31:29 - 360000 - 112.62 sent/s - 4266.22 words/s - MLM-ar: 2.1328 || MLM-de: 1.7043 || MLM-el: 1.7100 || MLM-en: 1.5283 || MLM-es: 1.5969 || MLM-fr: 1.5846 || MLM-hi: 2.2598 || MLM-ru: 1.4770 || MLM-zh: 2.4414 - - model LR: 9.4301e-06
2020-12-04 06:02:47 INFO - 12/04/20 14:02:46 - 3 days, 3:31:28 - 360000 - 112.63 sent/s - 4273.08 words/s - MLM-ar: 2.2959 || MLM-de: 1.6400 || MLM-en: 1.5591 || MLM-es: 1.6094 || MLM-fr: 1.6204 || MLM-ru: 1.7168 || MLM-tr: 2.2637 || MLM-vi: 2.0249 - - model LR: 9.4301e-06
2020-12-04 06:02:47 INFO - 12/04/20 14:02:46 - 3 days, 3:31:32 - 360000 - 112.64 sent/s - 4302.83 words/s - MLM-ar: 2.2988 || MLM-bg: 1.7324 || MLM-de: 1.8081 || MLM-el: 1.9897 || MLM-en: 1.6924 || MLM-es: 1.6566 || MLM-fr: 1.4469 || MLM-hi: 2.2031 || MLM-ru: 1.5732 || MLM-sw: 2.3613 || MLM-tr: 2.0488 || MLM-ur: 2.5059 || MLM-vi: 1.9072 - - model LR: 9.4301e-06
2020-12-04 06:02:47 INFO - 12/04/20 14:02:46 - 3 days, 3:31:30 - ============ End of epoch 35 ============
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - epoch -> 35.000000
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - valid_tr_mlm_loss -> 19798.168945
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - valid_tr_mlm_ppl -> 10.865678
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - valid_tr_mlm_acc -> 55.536812
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - valid_ur_mlm_loss -> 17430.359924
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - valid_ur_mlm_ppl -> 11.203040
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - valid_ur_mlm_acc -> 54.893263
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - valid_vi_mlm_loss -> 14435.647949
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - valid_vi_mlm_ppl -> 6.984347
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - valid_vi_mlm_acc -> 60.818635
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - valid_zh_mlm_loss -> 24963.236816
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - valid_zh_mlm_ppl -> 11.896758
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - valid_zh_mlm_acc -> 53.506597
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - valid_mlm_loss -> 19229.647253
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - valid_mlm_ppl -> 8.562056
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - valid_mlm_acc -> 59.772871
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - test_tr_mlm_loss -> 10030.201538
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - test_tr_mlm_ppl -> 10.662468
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - test_tr_mlm_acc -> 55.356300
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - test_ur_mlm_loss -> 9371.606445
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - test_ur_mlm_ppl -> 12.163368
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - test_ur_mlm_acc -> 55.238603
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - test_vi_mlm_loss -> 7488.517090
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - test_vi_mlm_ppl -> 7.186609
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - test_vi_mlm_acc -> 61.706610
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - test_zh_mlm_loss -> 12373.904297
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - test_zh_mlm_ppl -> 11.967894
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - test_zh_mlm_acc -> 53.941825
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - test_mlm_loss -> 9702.032363
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - test_mlm_ppl -> 8.654818
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - test_mlm_acc -> 60.052012
2020-12-04 06:03:54 INFO - 12/04/20 14:03:52 - 3 days, 3:32:35 - New best validation score: 8.562056
以上是删减过的,主要是看个乱的结构。
目的
目的是拿出最后每个 epoch 的 test_mlm_acc
并存到另一个地方
命令
cat baseline.txt | grep 'test_mlm_acc -> ' | awk '{print $NF}' | uniq >baseline_acc.txt
其中 baseline.txt 就是原始 log 文件,cat
就是 catch 所有内容,拿来第一个 grep
找到 test_mlm_ppl 所在的行(grep 命令是抓出满足条件的行),把这些行再送给 awk
,这里 $NF
是指最后一个字段,就是 log 中的数值。最后因为这是多卡跑的,所以输出重复了(下次要设置 master 输出结果…),需要用 uniq 消除重复行,注意这里只是消除上下文重复的,如果想消除所有重复的需要先 sort
一下。最后箭头就是输出到这个文件里来。
其他
后来又碰到了其他需求,懒得开新文章了,都贴在这里:
获取某个文件中第几行的最后一个字符:
head -n_line filename.txt | tail -1 | awk -F '' '$0 =$NF'
删除文件中每行最后一个字符:
cat fname.txt | sed 's/.$//' > filename.txt
删除文件中带 “inside apex” 的行:
cat single_data.txt | sed 's/.$//' > s_data.txt
获取每行第54到第58个字符:
cat funnel-codebase.txt | grep 'nll_loss=' | cut -c54-58 >funnelcodebase_acc.txt