python正则表达式分析Postfix队列中有问题的邮件（这是一个正则综合使用脚本)

最新推荐文章于 2023-07-28 21:21:54 发布

南星叨叨

最新推荐文章于 2023-07-28 21:21:54 发布

阅读量420

点赞数

分类专栏： python 文章标签： python

本文链接：https://blog.csdn.net/hans99812345/article/details/115466624

版权

python 专栏收录该内容

83 篇文章 3 订阅

订阅专栏

本文介绍如何使用Python正则表达式处理Postfix队列中的问题邮件，包括多收件人、空行、时间日期格式，以及收件人和原因的提取和修复。通过shell脚本预处理，然后利用正则表达式提取关键信息如queue_id、时间、发件人和收件人，为后续数据库操作或邮件处理提供便利。

摘要由CSDN通过智能技术生成

python正则表达式分析Postfix队列中有问题的邮件

先把队列中这些有问题的抓取出来,然后对这个文本进行一些处理，标准化一些，后边正则就好处理了，不然给自己添麻烦

更新内容
发现队列里有多个收件人，而且收件人空行收件人，出BUG了。所以增加shell处理文本的过程，后边python才方便处理这个有问题的队列

/home/mail/postfix/usr/sbin/postqueue -p >  20210406.txt 
sed -i '1d' 20210406.txt
sed -i '$d' 20210406.txt

这里就把shell命令给几下来，后期方便写成脚本

然后就是分析这个文本,文本格式大致如下

D30CD1446347    83972 Fri Apr  2 10:50:46  fuck@vip.fuck.com
(host eu-smtp-inbound-2.mimecast.com[195.130.217.201] said: 451 Invalid Recipient - https://community.mimecast.com/docs/DOC-1369#451 [7QHB3u94MH61BkjjUv3XWg.uk80] (in repl
y to RCPT TO command))
                                         xxoo@sex.com


D4E10144604A    92877 Fri Apr  2 10:59:05  fuck@vip.fuck.com
(host mail.sapidshpg.com[84.47.227.24] said: 451 MAIL lookup failed (in reply to MAIL FROM command))
                                         xxoo@sex.com

1532A1446387     5641 Fri Apr  2 09:24:31  fuck@vip.fuck.com
(host mail.rainbow.ne.jp[180.222.181.57] said: 451 Mailbox full (#4.2.2) (in reply to end of DATA command))
                                         xxoo@sex.com

这里面
第一行就是队列ID，发信时间，发信人，
第二行就是不能发出去的原因,（这样的货大部分都是发垃圾的）
第三行就是收件人

我们要抓的就是这些内容

正则知识点
1.文本的多行模式

common = re.compile('\(.*\)', re.DOTALL)

因为原因有可能会写到2行，所以到用到多行模式，又是匹配括号里面的，所以就是括号里面是点星

第一个参数是要匹配的东西，第二个参数是re.DOTALL这就是文本多行模式

2.split一次加入多个切割条件
在python中，如果使用split一次想切割多个条件，比如一段字符串里面有逗号，句号。想把文本通过逗号和句号一次性切割出来，这个时候就不能用字符串调用split的方法了，需要使用re模块的，re.split()

这里的思路是以空行分隔文本，
1行空行

split('\n\n')

2行空行

split('\n\n\n')

所以代码是,后边修复了

f = re.split('\n\n\n|\n\n',q)

更新内容
新增对时间日期的匹配

time_list = re.findall('[A-Za-z]{3}\s\s[0-9]{1,2}\s[0-9]{2}:[0-9]{2}:[0-9]{2}', i)

这样就可以匹配
Apr 2 09:24:31 这样的时间日期格式

更新内容

解决reason中包括收件人，收件人匹配错误的BUG

728621445F47   552280 Fri Feb  21 11:09:53  fuck@vip.fuck.com
(host mx2.partnerconsole.net[202.124.241.197] said: 451 "Mailbox qwer@luminousspace.com.au is full" (in reply to RCPT TO command))
                                         qwer@luminousspace.com.au

定义一个收发信人的列表，增加判断

mail_user_list = []
sender_receiver_list = re.findall('[^\._-][\w\.-]+@.*', i)
        for user in sender_receiver_list:
            if '(' not in user:
                mail_user_list.append(user)
        sender = mail_user_list[0].strip()
        recipient = mail_user_list[1:]
        new_recipient = ','.join(recipient)
        mail_user_list.clear()

更新内容

今天发现queue_id这种还有字母和数字后面还有个星号的，有的没有，那么修改下匹配条件

queue_id_list = re.match('[A-Z0-9]{12}\*?', i) #*? 星号匹配0次或1次

完整代码如下

import re, json

common = re.compile('\(.*\)', re.DOTALL)

with open('test_queue.txt', 'r') as file_object:
    mail = {}
    mail_user_list = []
    q = file_object.read()
    f = re.split('\n\n', q)  # 理论上re.split('\n\n',q)这样也行了，前期把2行都处理掉了
    for i in f:
        #queue_id_list = re.match('[A-Z0-9]{12}', i)
        queue_id_list = re.match('[A-Z0-9]{12}\*?', i) #*? 星号匹配0次或1次
        time_list = re.findall('[A-Za-z]{3}\s\s[0-9]{1,2}\s[0-9]{2}:[0-9]{2}:[0-9]{2}', i)
        reason_list = re.findall(common, i)
        sender_receiver_list = re.findall('[^\._-][\w\.-]+@.*', i)
        for user in sender_receiver_list:
            if '(' not in user:
                mail_user_list.append(user)
        sender = mail_user_list[0].strip()
        recipient = mail_user_list[1:]
        new_recipient = ','.join(recipient)
        mail_user_list.clear()
        mail['queue_id'] = queue_id_list[0]
        mail['time'] = time_list[0]
        mail['sender'] = sender.strip()
        mail['recipient'] = new_recipient.strip()
        mail['reason'] = reason_list[0]
        str_mail = json.dumps(mail)  # 写入文件
        with open('error_queue.txt', 'a') as file_object:
            file_object.write(str_mail + '\n')
        # print(mail)

最后得到结果如下

{'queue_id': '99AC11446067', 'time': 'Feb  21 09:14:50', 'sender': 'fuck@vip.fuck.com', 'recipient': 'xxoo@sex.com', 'reason': '(Host or domain name not fo
und. Name service error for name=ma4.justnet.ne.jp type=MX: Host not found, try again)'}
{'queue_id': '97E6A14462A1', 'time': 'Feb  21 11:10:37', 'sender': 'fuck@vip.fuck.com', 'recipient': 'xxoo@sex.com', 'reason': '(host mail.SmartJan.com[69.72.149.
90] said: 451 Temporarily unable to process your email. Please try again later. (in reply to RCPT TO command))'}

大致功能都实现了,后边就可以存到数据库或者邮件发送了就方便了

南星叨叨

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python正则表达式分析Postfix队列中有问题的邮件（这是一个正则综合使用脚本)

python正则表达式分析Postfix队列中有问题的邮件先把队列中这些有问题的抓取出来/home/mail/postfix/usr/sbin/postqueue -p | grep 451|5xx|xxx > queue.txt然后就是分析这个文本,文本格式大致如下D30CD1446347 83972 Fri Apr 2 10:50:46 fuck@vip.fuck.com(host eu-smtp-inbound-2.mimecast.com[195.130.217.201]
复制链接

扫一扫

专栏目录