有如下一行文本需要进行整理筛选:
{"atime": "20180830 , \u8fd0\u8425\u5546\u6bcf\u5929\u540c\u6b65\uff0c\u5982\u679c\u540c\u6b65\u4e0d\u53ca\u65f6\uff0c\u4f1a\u5f71\u54cd\u7528\u6237\u7684\u64ad\u653e\u4f53\u9a8c\uff0c\u6700\u7ec8\u7528\u6237\u6d41\u5931\uff01\uff01\uff01", "data": {"hdoma.qq.com.": ["ioma.qq.com."], "gz.pttc.cp81.ott.cibntv.net": ["182.254.57.25", "111.30.159.191", "58.251.81.53", "121.51.8.64", "183.232.119.146", "121.51.139.202", "182.254.21.68", "112.90.78.145", "219.133.60.190", "111.30.137.105", "14.215.158.41"], "ck.video.qq.com.": ["58.251.139.196"], "oma.qq.com.": ["aoma.qq.com."], "base.music.qq.com.": ["182.254.33.122", "163.177.68.177", "203.205.151.23", "120.198.199.168", "103.7.30.89", "14.17.32.228"], "bk-info-zb.play.t002.ottcn.com": ["tv.t002.ottcn.com"], "auto.cgiaccess.tc.qq.com.": ["auto.cgiaccess.tcdn.qq.com."], "poma.qq.com.": ["aoma.qq.com."], "bkupdate.video.qq.com.": ["203.205.151.47", "182.254.86.185", "58.247.206.178", "101.226.233.180"], "bkomaios.video.qq.com.": ["ioma.qq.com."], "info-zb.play.t002.ottcn.com": ["tv.t002.ottcn.com"], "bkinfo.zb.qq.com.": ,
....
....
"omgmta1.qq.com.": ["pingma.qq.com."], "vv.video.qq.com.trp.tc.qq.com.": ["cgiaccess.tc.qq.com."], "av.video.qq.com.": ["h5vv.video.qq.com."]}}
目的整理成如下格式
hdoma.qq.com,ioma.qq.com,gz.pttc.cp81.ott.cibntv.net:182.254.57.25,111.30.159.191,58.251.81.53,121.51.8.64,183.232.119.146,121.51.139.202,182.254.21.68,112.90.78.145,219.133.60.190,111.30.137.105,14.215.158.41
ck.video.qq.com.:58.251.139.196
然后在Excel中以“:”为分隔符分割成两列。
通过多次字符匹配和替换实现过程
1、掐头去尾:删除收尾无关信息。
2、初步整理
在Notepad++中查询替换菜单,
(1)选定查找模式(Search Mode)为正常(Normal),
(2)将“."”、“"”、“[”、“]”、“ ”替换为空;
(3)将“:”替换为“,”。
截止现在文本中只包含数字、字幕、点号和逗号。
hdoma.qq.com,ioma.qq.com,gz.pttc.cp81.ott.cibntv.net,182.254.57.25,111.30.159.191,58.251.81.53,121.51.8.64,183.232.119.146,121.51.139.202,182.254.21.68,112.90.78.145,219.133.60.190,111.30.137.105,14.215.158.41,ck.video.qq.com,58.251.139.196,oma.qq.com,aoma.qq.com,base.music.qq.com,182.254.33.122,163.177.68.177,203.205.151.23,120.198.199.168,103.7.30.89,14.17.32.228,bk-info-zb.play.t002.ottcn.com,tv.t002.ottcn.com,auto.cgiaccess.tc.qq.com,auto.cgiaccess.tcdn.qq.com,poma.qq.com,aoma.qq.com,bkupdate.video.qq.com,203.205.151.47,182.254.86.185,58.247.206.178,101.226.233.180,bkomaios.video.qq.com,ioma.qq.com,info-zb.play.t002.ottcn.com,tv.t002.ottcn.com,bkinfo.zb.qq.com,
....
....
omgmta1.qq.com,pingma.qq.com,vv.video.qq.com.trp.tc.qq.com,cgiaccess.tc.qq.com,av.video.qq.com,h5vv.video.qq.com
3、替换、插入其他符号作为分隔符
(1)通过观察发现“1,c”、“6,o”、“8,b”,每一个分割点均为“数字+,+字母”的组合,
(2)选定查找模式(Search Mode)为正则表达式(Regular expression),
(3)查找“[[:digit:]],[[:alpha:]]”字符段,------------即:[[:digit:]]——数字,[[:alpha:]]——大小写字母
(4)准备进行替换,将“[[:digit:]],[[:alpha:]]”字符段替换为“[[:digit:]];[[:alpha:]]”出现问题,无法保持收尾不变只替换中间逗号。“14.17.32.228,bk-info-zb.play.t002.ottcn.com”替换之后效果“14.17.32.22[[:digit:]];[[:alpha:]]k-info-zb.play.t002.ottcn.com”。
4、进一步研究收尾动只替换中间逗号方法
(1)可将查询结果分段标记,只替换相应段。
(2)更新查询字段为“([[:digit:]])(,)([[:alpha:]])”,用“()”分割字段为三段;
(3)进行替换,将“([[:digit:]])(,)([[:alpha:]])”替换为“\1\n\3”-------------------即:第1、3段不动,第二段逗号替换为换行符。
5、仿照第四步将“([[:alpha:]])(,)([[:digit:]])”替换为“\1:\3”-------------------即:第1、3段不动,第二段逗号替换为冒号。
最终结果如下:
hdoma.qq.com,ioma.qq.com,gz.pttc.cp81.ott.cibntv.net:182.254.57.25,111.30.159.191,58.251.81.53,121.51.8.64,183.232.119.146,121.51.139.202,182.254.21.68,112.90.78.145,219.133.60.190,111.30.137.105,14.215.158.41
ck.video.qq.com:58.251.139.196
oma.qq.com,aoma.qq.com,base.music.qq.com:182.254.33.122,163.177.68.177,203.205.151.23,120.198.199.168,103.7.30.89,14.17.32.228
bk-info-zb.play.t002.ottcn.com,tv.t002.ottcn.com,auto.cgiaccess.tc.qq.com,auto.cgiaccess.tcdn.qq.com,poma.qq.com,aoma.qq.com,bkupdate.video.qq.com:203.205.151.47,182.254.86.185,58.247.206.178,101.226.233.180
……
……
vd.l.qq.com:163.177.84.30,58.247.206.177,111.30.144.38,123.151.79.46,183.192.202.182,183.232.91.20,203.205.128.160,183.61.51.40,101.226.233.179,182.254.86.184,182.254.106.36,182.254.50.27,125.39.133.49
omgmta1.qq.com,pingma.qq.com,vv.video.qq.com.trp.tc.qq.com,cgiaccess.tc.qq.com,av.video.qq.com,h5vv.video.qq.com
参考资料相关链接:
https://www.cnblogs.com/BTMaster/p/3533583.html
https://blog.csdn.net/wangkai_123456/article/details/55254598