Python----使用正则re取出文本中特定字符串X后面多少位的字符串;取特定字符串与其后多少位字符串。去重列表数据,取有某个字符串这行所有数据
提取样本的文件:PMSWeb.2017-12-04.log.1(非常大文件129Mb)
2017-12-04 13:52:21,062 [http-apr-9080-exec-29] [INFO]-[com.*****.*****.*****.service.member.impl.MemberControlServiceImpl queryMemberControl 155]-查询会员扩展性控制记录!
2017-12-04 13:52:21,076 [http-apr-9080-exec-38] [INFO]-[com.*****.*****.*****.web.action.drawtransfer.DrawTransferAction getDrawTransferTitle 402]-SessionId=933B6DF242DC88D186848CD5B509D5EC,DrawTransferAction getDrawTransferTitle start
.............................
..........................
...........................
2017-12-04 15:57:49,472 [http-apr-9080-exec-7] [INFO]-[com.*****.*****.*****.comm.ManageFilter blacklistValidate 386]-ManageFilter.blacklistValidate end
2017-12-04 15:57:49,474 [http-apr-9080-exec-7] [INFO]-[com.*****.*****.*****.web.action.member.MemberAction isSupportFinance 3675]-SessionId=7B29CCCBDB2F2DFDF8EAE6D2BA3BB929,MemberAction.isSupportFinance start
例子1:使用正则re取出文本中特定字符串X后面多少位的字符串
脚本文件
root@kali:~/python/dinpay# cat findlogsessionid.py
#!/usr/bin/python
# --*-- coding:utf-8 --8--
import re
sourcesessionis = open("/root/python/dinpay/PMSWeb.2017-12-04.log.1").read()
temp = sourcesessionis.decode("utf8")
reg = r'SessionId=(.{32})'#只取SessionId=字符后面32位字符串
wordreg = re.compile(reg)
wordreglist = re.findall(wordreg,temp)
for word in wordreglist:
print word加班
脚本运行情况
6767B50A9DB30F556302EB5E21C05239
9A8F272AAD8A681E4FB598D587077CF1
9A8F272AAD8A681E4FB598D587077CF1
9A8F272AAD8A681E4FB598D587077CF1
6767B50A9DB30F556302EB5E21C05239
9A8F272AAD8A681E4FB598D587077CF1
9A8F272AAD8A681E4FB598D587077CF1
9A8F272AAD8A681E4FB598D587077CF1
9A8F272AAD8A681E4FB598D587077CF1
9A8F272AAD8A681