01.场景
老板让你调查个情况,你把excel表发出去了,结果反馈回来的邮件有数百之多,如果一个一个的点开保存,肯定要加班了,让 python 来帮忙吧。
02.批量下载
python 提供的 email 包十分好用,功能完备,搞定我们这个需求,小菜一碟。
# 账户信息
email = 'xxx@chinastock.com.cn'
password = 'xxx'
pop3_server = 'mail.xxx.com.cn'
# 连接到POP3服务器,带SSL的:
server = poplib.POP3_SSL(pop3_server)
# 可以打开或关闭调试信息:
server.set_debuglevel(0)
# POP3服务器的欢迎文字:
print(server.getwelcome())
# 身份认证:
server.user(email)
server.pass_(password)
# stat()返回邮件数量和占用空间:
msg_count, msg_size = server.stat()
print('message count:', msg_count)
print('message size:', msg_size, 'bytes')
执行上面的代码,如果连接没有问题,那么应该能看到邮箱中邮件个数和邮件的总大小,单位是字节。
这里先简化处理,不纠结于怎么过滤邮件了,主要考虑把附件下载到本地(收件箱里的全部邮件)。
for i in range(1, msg_count):
resp, byte_lines, octets = server.retr(i)
# 转码
str_lines = []
for x in byte_lines:
str_lines.append(x.decode())
# 拼接邮件内容
msg_content = '\n'.join(str_lines)
# 把邮件内容解析为Message对象
msg = Parser().parsestr(msg_content)
headers = get_email_headers(msg)
attachments = get_email_content(msg, r'E:\py\sendmail\attach')
# 输出
print('subject:', headers['Subject'])
print('from:', headers['From'])
print('to:', headers['To'])
if 'cc' in headers:
print('cc:', headers['Cc'])
print('date:', headers['Date'])
print('attachments: ', attachments)
print('-----------------------------')
retr() 的参数是个数字,可看成是编号,我们总共有 msg_count 封邮件,遍历这些邮件,解析后,保存附件到本地。
byte_lines 是个 list,里面的元素是字节,因此,我们需要 decode() 转换一下。用换行符 \n 拼接形成的字符串,作为入参,构造成了Message对象。
有 Message 对象,我们就可以通过解析这个对象,得到邮件的 header 和 content 了。
先说 header。
def get_email_headers(msg):
headers = {}
for header in ['From', 'To', 'Cc', 'Subject', 'Date']:
value = msg.get(header, '')
if value:
if header == 'Date':
headers['Date'] = value
if header == 'Subject':
subject = decode_str(value)
headers['Subject'] = subject
if header == 'From':
hdr, addr = parseaddr(value)
name = decode_str(hdr)
from_addr = u'%s <%s>' % (name, addr)
headers['From'] = from_addr
if header == 'To':
all_cc = value.split(',')
to = []
for x in all_cc:
hdr, addr = parseaddr(x)
name = decode_str(hdr)
to_addr = u'%s <%s>' % (name, addr)
to.append(to_addr)
headers['To'] = ','.join(to)
if header == 'Cc':
all_cc = value.split(',')
cc = []
for x in all_cc:
hdr, addr = parseaddr(x)
name = decode_str(hdr)
cc_addr = u'%s <%s>' % (name, addr)
cc.append(to_addr)
headers['Cc'] = ','.join(cc)
return headers
header 是写键值对而已,我们关心的是 From, To, Cc, Subject, Date。Message 对象提供的接口,直接 get() 就好。对于收件人和抄送,可能是多人,要注意转换,decode_str() 函数是为了应对汉字乱码。
下面的函数 get_email_content() 是用来下载附件的。
def get_email_content(message, savepath):
attachments = []
for part in message.walk():
filename = part.get_filename()
if filename:
filename = decode_str(filename)
data = part.get_payload(decode=True)
abs_filename = os.path.join(savepath, filename)
attach = open(abs_filename, 'wb')
attachments.append(filename)
attach.write(data)
attach.close()
return attachments
Message 里可能包含多个 MIMEBase,也就是多个 part,每个 part 里都可能有一个附件,message.walk() 遍历这些 part,依次解析。该函数把附件都保存到了 savepath 路径下了,不考虑附件重名的情况了。
03.小结
本文以较为简短的代码,展示了如何通过 python 批量下载邮件的附件。如果你有“邮件高度依赖症”,那么这种方法一定会给你提高数倍的工作效率。
希望能帮到你!
完整代码:
# _*_ coding: utf-8 _*_
import poplib
import email
import os
from email.parser import Parser
from email.header import decode_header
from email.utils import parseaddr
def decode_str(s):
value, charset = decode_header(s)[0]
if charset:
if charset == 'gb2312':
charset = 'gb18030'
value = value.decode(charset)
return value
def get_email_headers(msg):
headers = {}
for header in ['From', 'To', 'Cc', 'Subject', 'Date']:
value = msg.get(header, '')
if value:
if header == 'Date':
headers['Date'] = value
if header == 'Subject':
subject = decode_str(value)
headers['Subject'] = subject
if header == 'From':
hdr, addr = parseaddr(value)
name = decode_str(hdr)
from_addr = u'%s <%s>' % (name, addr)
headers['From'] = from_addr
if header == 'To':
all_cc = value.split(',')
to = []
for x in all_cc:
hdr, addr = parseaddr(x)
name = decode_str(hdr)
to_addr = u'%s <%s>' % (name, addr)
to.append(to_addr)
headers['To'] = ','.join(to)
if header == 'Cc':
all_cc = value.split(',')
cc = []
for x in all_cc:
hdr, addr = parseaddr(x)
name = decode_str(hdr)
cc_addr = u'%s <%s>' % (name, addr)
cc.append(to_addr)
headers['Cc'] = ','.join(cc)
return headers
def get_email_content(message, savepath):
attachments = []
for part in message.walk():
filename = part.get_filename()
if filename:
filename = decode_str(filename)
data = part.get_payload(decode=True)
abs_filename = os.path.join(savepath, filename)
attach = open(abs_filename, 'wb')
attachments.append(filename)
attach.write(data)
attach.close()
return attachments
if __name__ == '__main__':
# 账户信息
email = 'xxx@xxx.com.cn'
password = 'xxx'
pop3_server = 'xxx.xxx.com.cn'
# 连接到POP3服务器,带SSL的:
server = poplib.POP3_SSL(pop3_server)
# 可以打开或关闭调试信息:
server.set_debuglevel(0)
# POP3服务器的欢迎文字:
print(server.getwelcome())
# 身份认证:
server.user(email)
server.pass_(password)
# stat()返回邮件数量和占用空间:
msg_count, msg_size = server.stat()
print('message count:', msg_count)
print('message size:', msg_size, 'bytes')
# b'+OK 237 174238271' list()响应的状态/邮件数量/邮件占用的空间大小
resp, mails, octets = server.list()
for i in range(1, msg_count):
resp, byte_lines, octets = server.retr(i)
# 转码
str_lines = []
for x in byte_lines:
str_lines.append(x.decode())
# 拼接邮件内容
msg_content = '\n'.join(str_lines)
# 把邮件内容解析为Message对象
msg = Parser().parsestr(msg_content)
headers = get_email_headers(msg)
attachments = get_email_content(msg, r'E:\py\sendmail\attach')
print('subject:', headers['Subject'])
print('from:', headers['From'])
print('to:', headers['To'])
if 'cc' in headers:
print('cc:', headers['Cc'])
print('date:', headers['Date'])
print('attachments: ', attachments)
print('-----------------------------')
server.quit()