1. 文本内容如下:
[root@localhost log]# cat a.txt
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 2024-05-07 07:45:07.803825 60.10% [NOTICE] switch_channel.c:1123 New Channel sofia/external/12abc34@44.231.169.3 [b99c9f7c-0c45-11ef-bcda-6d19d88a8c64]
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 2024-05-07 07:45:07.803825 60.10% [INFO] sofia.c:10462 sofia/external/12abc34@44.231.169.3 receiving invite from 44.231.169.3:9080 version: 1.10.8-dev git f9bb894 2021-12-29 20:30:18Z 64bit call-id: b99c74ca-0c45-11ef-8346-dd7497e2c05e
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 2024-05-07 07:45:07.803825 60.10% [INFO] mod_dialplan_xml.c:639 Processing 12abc34 <12abc34>->188000011408499 in context public
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 2024-05-07 07:45:07.803825 60.10% [INFO] switch_cpp.cpp:731 Sending early media
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 2024-05-07 07:45:07.803825 60.10% [NOTICE] sofia_media.c:92 Pre-Answer sofia/external/12abc34@44.231.169.3!
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 2024-05-07 07:45:09.003806 59.97% [NOTICE] mod_dptools.c:1387 Hangup sofia/external/12abc34@44.231.169.3 [CS_EXECUTE] [CALL_REJECTED]
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 2024-05-07 07:45:09.003806 59.97% [NOTICE] switch_core_session.c:1771 Session 352443 (sofia/external/12abc34@44.231.169.3) Ended
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 2024-05-07 07:45:09.003806 59.97% [NOTICE] switch_core_session.c:1775 Close Channel sofia/external/12abc34@44.231.169.3 [CS_DESTROY]
2. 使用awk的print拼接$1显示自己想要的内容,结果如下:
[root@localhost log]# cat a.txt |awk '{print "字段$1="$1"#"}'
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64#
字段$1=b99c9f7c-0c45-11ef-bcda-6d19d88a8c64#
字段$1=b99c9f7c-0c45-11ef-bcda-6d19d88a8c64#
字段$1=b99c9f7c-0c45-11ef-bcda-6d19d88a8c64#
字段$1=b99c9f7c-0c45-11ef-bcda-6d19d88a8c64#
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64#
字段$1=b99c9f7c-0c45-11ef-bcda-6d19d88a8c64#
字段$1=b99c9f7c-0c45-11ef-bcda-6d19d88a8c64#
3. 明显awk的print在显示第一行和第六行时,把字符串"字段$1="给抹掉了!!!
4. 以为是awk字符串拼接的问题,然后尝试了通过逗号拼接:
[root@localhost log]# cat a.txt |awk '{print "字段$1=",$1,"#"}'
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 #
字段$1= b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 #
字段$1= b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 #
字段$1= b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 #
字段$1= b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 #
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 #
字段$1= b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 #
字段$1= b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 #
5. 以为是cat管道的问题,然后尝试awk直接操作文件:
[root@localhost log]# awk '{print "字段$1=",$1,"#"}' a.txt
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 #
字段$1= b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 #
字段$1= b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 #
字段$1= b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 #
字段$1= b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 #
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 #
字段$1= b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 #
字段$1= b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 #
6. 以为是文件格式的问题,然后进行了dos2unix的转换
[root@localhost log]# dos2unix a.txt
dos2unix: converting file a.txt to Unix format ...
[root@localhost log]# awk '{print "字段$1=",$1,"#"}' a.txt
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 #
字段$1= b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 #
字段$1= b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 #
字段$1= b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 #
字段$1= b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 #
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 #
字段$1= b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 #
字段$1= b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 #
7. 期间一直以为是awk的print显示的问题。因为awk直接"print $1"显示$1的字段值又是预期的:
[root@localhost log]# awk '{print $1}' a.txt
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64
8. 连chatgpt4.0也无法解决,甚至怀疑awk的版本存在bug,去查看了awk源码的问题描述。
9. 期间还花费了一天多的时间重新学习awk的全部用法,包括查阅英文的awk编程文档。
10. 折腾到想要换掉awk,使用其他方法解析文本时,鬼使神差使用了awk的内置函数length显示$1的长度:
[root@localhost log]# awk '{print $1," ==> ",length($1)}' a.txt
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 ==> 37
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 ==> 36
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 ==> 36
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 ==> 36
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 ==> 36
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 ==> 37
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 ==> 36
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 ==> 36
11. 瞬间有了惊人的发现,第一行和第六行的$1字段长度不一样。
12. 然后就想到了$1可能包含了转义字符,使用vi显示隐藏字符:
果然,第一行和第六行的开头有特殊符合"^M",而"^M"表示回车键 "\r"。
13. 使用awk的sub替换掉回车键之后,显示达到预期:
[root@localhost log]# awk '{sub("\r","");print $1," ==> ",length($1)}' a.txt
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 ==> 36
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 ==> 36
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 ==> 36
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 ==> 36
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 ==> 36
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 ==> 36
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 ==> 36
b99c9f7c-0c45-11ef-bcda-6d19d88a8c64 ==> 36
[root@localhost log]# cat a.txt |awk '{sub("\r","");print "字段$1="$1"#"}'
字段$1=b99c9f7c-0c45-11ef-bcda-6d19d88a8c64#
字段$1=b99c9f7c-0c45-11ef-bcda-6d19d88a8c64#
字段$1=b99c9f7c-0c45-11ef-bcda-6d19d88a8c64#
字段$1=b99c9f7c-0c45-11ef-bcda-6d19d88a8c64#
字段$1=b99c9f7c-0c45-11ef-bcda-6d19d88a8c64#
字段$1=b99c9f7c-0c45-11ef-bcda-6d19d88a8c64#
字段$1=b99c9f7c-0c45-11ef-bcda-6d19d88a8c64#
字段$1=b99c9f7c-0c45-11ef-bcda-6d19d88a8c64#
所以,文本处理要时刻注意
!!!转义字符!!!
!!!转义字符!!!
!!!转义字符!!!