python eml解析_如何在python中读取eml文件?

fromemailimportmessage_from_fileimportos# Path to directory where attachments will be stored:path="./msgfiles"# To have attachments extracted into memory, change behaviour of 2 following functions:deffile_exists(f):"""Checks whether extracted file was extracted before."""returnos.path.exists(os.path.join(path,f))defsave_file(fn,cont):"""Saves cont to a file fn"""file=open(os.path.join(path,fn),"wb")file.write(cont)file.close()defconstruct_name(id,fn):"""Constructs a file name out of messages ID and packed file name"""id=id.split(".")id=id[0]+id[1]returnid+"."+fndefdisqo(s):"""Removes double or single quotations."""s=s.strip()ifs.startswith("'")ands.endswith("'"):returns[1:-1]ifs.startswith('"')ands.endswith('"'):returns[1:-1]returnsdefdisgra(s):"""Removes < and > from HTML-like tag or e-mail address or e-mail ID."""s=s.strip()ifs.startswith(""):returns[1:-1]returnsdefpullout(m,key):"""Extracts content from an e-mail message.

This works for multipart and nested multipart messages too.

m -- email.Message() or mailbox.Message()

key -- Initial message ID (some string)

Returns tuple(Text, Html, Files, Parts)

Text -- All text from all parts.

Html -- All HTMLs from all parts

Files -- Dictionary mapping extracted file to message ID it belongs to.

Parts -- Number of parts in original message.

"""Html=""Text=""Files={}Parts=0ifnotm.is_multipart():ifm.get_filename():# It's an attachmentfn=m.get_filename()cfn=construct_name(key,fn)Files[fn]=(cfn,None)iffile_exists(cfn):returnText,Html,Files,1save_file(cfn,m.get_payload(decode=True))returnText,Html,Files,1# Not an attachment!# See where this belongs. Text, Html or some other data:cp=m.get_content_type()ifcp=="text/plain":Text+=m.get_payload(decode=True)elifcp=="text/html":Html+=m.get_payload(decode=True)else:# Something else!# Extract a message ID and a file name if there is one:# This is some packed file and name is contained in content-type header# instead of content-disposition header explicitlycp=m.get("content-type")try:id=disgra(m.get("content-id"))except:id=None# Find file name:o=cp.find("name=")ifo==-1:returnText,Html,Files,1ox=cp.find(";",o)ifox==-1:ox=Noneo+=5;fn=cp[o:ox]fn=disqo(fn)cfn=construct_name(key,fn)Files[fn]=(cfn,id)iffile_exists(cfn):returnText,Html,Files,1save_file(cfn,m.get_payload(decode=True))returnText,Html,Files,1# This IS a multipart message.# So, we iterate over it and call pullout() recursively for each part.y=0while1:# If we cannot get the payload, it means we hit the end:try:pl=m.get_payload(y)except:break# pl is a new Message object which goes back to pulloutt,h,f,p=pullout(pl,key)Text+=t;Html+=h;Files.update(f);Parts+=p

y+=1returnText,Html,Files,Partsdefextract(msgfile,key):"""Extracts all data from e-mail, including From, To, etc., and returns it as a dictionary.

msgfile -- A file-like readable object

key -- Some ID string for that particular Message. Can be a file name or anything.

Returns dict()

Keys: from, to, subject, date, text, html, parts[, files]

Key files will be present only when message contained binary files.

For more see __doc__ for pullout() and caption() functions.

"""m=message_from_file(msgfile)From,To,Subject,Date=caption(m)Text,Html,Files,Parts=pullout(m,key)Text=Text.strip();Html=Html.strip()msg={"subject":Subject,"from":From,"to":To,"date":Date,"text":Text,"html":Html,"parts":Parts}ifFiles:msg["files"]=Filesreturnmsgdefcaption(origin):"""Extracts: To, From, Subject and Date from email.Message() or mailbox.Message()

origin -- Message() object

Returns tuple(From, To, Subject, Date)

If message doesn't contain one/more of them, the empty strings will be returned.

"""Date=""iforigin.has_key("date"):Date=origin["date"].strip()From=""iforigin.has_key("from"):From=origin["from"].strip()To=""iforigin.has_key("to"):To=origin["to"].strip()Subject=""iforigin.has_key("subject"):Subject=origin["subject"].strip()returnFrom,To,Subject,Date

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值