邮件协议整理
写在前面
最开始的邮件传输是根据SMTP实现的,但由于历史原因,Internet上的很多网关不能正确传输8 bit内码的字符,比如汉字等。所以出现了对邮件内容编码的需要。这样,在邮件协议中除了smtp、pop外,又增加了与编码相关的MIME。
概括地说,smtp、pop与邮件的接收、发送过程相关,这两者负责邮件的传输;而MIME与邮件内容(这里,邮件内容包括发件人信息、收件人/抄送人信息、邮件正文、附件)相关,约定了被传输邮件的格式。可以这样理解,smtp、pop完成了邮差的工作,mime解决了信件(包括信封)格式的问题。没有mime之前,邮差只能给美国人送邮件;有了mime之后,邮差可以提供国际快递业务了。
1. Smtp
SMTP(Simple Mail Transfer Protocol):简单邮件传输协议,是一组用于由源地址到目的地址传送邮件的规则,由它来控制信件的中转方式。SMTP协议属于TCP/IP协议族,它帮助每台计算机在发送或中转信件时找到下一个目的地。
关于SMTP的详细介绍参考rfc821,http://tools.ietf.org/html/rfc821
Rfc2821,http://tools.ietf.org/html/rfc2821
验证过程
>:auth login ---进行用户身份认证
<:334 VXNlcm5hbWU6 ---BASE64编码“Username:”
>:Y29zdGFAYW1heGl0Lm5ldA== ----发送BASE64编码的用户名
<:334 UGFzc3dvcmQ6 ---BASE64编码"Password:"
>:MTk4MjIxNA== ---客户端发送BASE64编码的密码
<:235 auth successfully ---成功
客户端命令:
HELO/EHLO 向服务器发出请求
AUTH LOGIN 用户身份认证
MAIL FROM: 发件人信息,
RCPT TO: 收件人信息,告诉服务器邮件发送给谁,
可重复多次,发送给多个收件人
DATA 邮件内容
QUIT 本次请求结束
服务器返回值:
220 <domain> Service ready
221 <domain> Service closing transmission channel
250 Requested mail action okay, completed
354 Start mail input; end with <CRLF>.<CRLF> 对data命令的应答
其它参考【rfc821】、【rfc2821】
示例:
R: 220 USC-ISI.ARPA Simple Mail Transfer Service Ready
S: HELO LBL-UNIX.ARPA
R: 250 USC-ISI.ARPA
S: MAIL FROM:<mo@LBL-UNIX.ARPA>
R: 250 OK
S: RCPT TO:<Jones@USC-ISI.ARPA>
R: OK
S: DATA
R: 354 Start mail input; end with <CRLF>.<CRLF>
S: Blah blah blah...
S: ...etc. etc. etc.
S: .
R: 250 OK
S: QUIT
R: 221 USC-ISI.ARPA Service closing transmission channel
【注意】 DATA命令之后,若邮件服务器返回354状态值表示开始接收数据;用户开始发送数据,邮件数据连续发送,并以<CRLF>.<CRLF>结束。因为后面采用对邮件内容采用了mime编码的原因,data数据中不会出现<CRLF>.<CRLF>字段与上面的结束符冲突。
The mail data may contain any of the 128 ASCII character codes, although experience has indicated that use of control characters other than SP, HT, CR, and LF may cause problems and SHOULD be avoided when possible.
2. pop
POP的全称是 Post Office Protocol,即邮局协议,用于电子邮件的接收,它使用TCP的110端口。
参考rfc1939,http://tools.ietf.org/html/rfc1939
常用命令
大部分邮件服务器使用明文的用户名、密码进行认证。
命令参数 状态 描述
------------------------------------------
USER username 认证 此命令与下面的pass命令若成功,将导致状态转换
PASS password 认证
APOP Name,Digest 认证 Digest是MD5消息摘要
------------------------------------------
STAT None 处理 请求服务器发回关于邮箱的统计资料,如邮件总数和总字节 数
UIDL [Msg#] 处理 返回邮件的唯一标识符,POP3会话的每个标识符都将是唯 一的
LIST [Msg#] 处理 返回邮件数量和每个邮件的大小
RETR [Msg#] 处理 返回由参数标识的邮件的全部文本
DELE [Msg#] 处理 服务器将由参数标识的邮件标记为删除,由quit命令执行
RSET None 处理 服务器将重置所有标记为删除的邮件,用于撤消DELE命 令
TOP [Msg#] 处理 服务器将返回由参数标识的邮件前n行内容,n必须是正整 数
NOOP None 处理 服务器返回一个肯定的响应
------------------------------------------
QUIT None 更新
【注意】任何邮件的删除都必须在quit命令发出后对已标记为删除的邮件执行删除操作,若发生访问中断,没有发出quit命令,那么虽然执行过dele命令,邮件仍不会被删除。
在客户端发出RETR 305命令后,服务器立即返回数据,数据可分在几个包中连续发送。邮件内容用<CRLF>.<CRLF>结束。
如下:
+OK 2281 octets
Received: from mail-pz0-f178.google.com ([209.85.222.178])
by oa.legendsec.com (Lotus Domino Release 6.5.3)
with ESMTP id 2009063010503284-48548 ;
Tue, 30 Jun 2009 10:50:32 +0800
Received: by pzk8 with SMTP id 8so621168pzk.28
for <gaoxl@legendsec.com>; Mon, 29 Jun 2009 19:50:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
.............
MIME-Version: 1.0
Received: by 10.142.139.9 with SMTP id m9mr316739wfd.174.1246330221459; Mon,
29 Jun 2009 19:50:21 -0700 (PDT)
Date: Tue, 30 Jun 2009 10:50:21 +0800
Message-ID: <3627518b0906291950v104c242
邮件内容需要从返回的邮件数据中解析。
邮件格式与smtp发送邮件相同,在下面的mime节介绍。
3. MIME
rfc文档中有MIME的详细说明。
3.1. 邮件mime格式
参考:
rfc4021,Registration of Mail and MIME Header Fields,
http://www.apps.ietf.org/rfc/rfc4021.html,
总体来说,MIME消息由消息头和消息体两大部分组成。这里,我们称为邮件头、邮件体。
3.1.1.邮件头
邮件头包含了发件人、收件人、主题、时间、MIME版本、邮件内容的类型等重要信息。每条信息称为一个域,由域名后加“: ”和信息内容构成,可以是一行,较长的也可以占用多行。域的首行必须“顶头”写,即左边不能有空白字符(空格和制表符);续行则必须以空白字符打头,且第一个空白字符不是信息本身固有的,解码时要过滤掉。
邮件头中不允许出现空行。有一些邮件不能被邮件客户端软件识别,显示的是原始码,就是因为首行是空行。
例如:
常见信息如下
Date: Mon, 29 Jun 2009 18:39:03 +0800
From: "=?gb2312?B?26zQocHB?=" <gaoxl@legendsec.com>
To: "moreorless" <moreorless@live.cn>
Cc: "gxl0620" <gxl0620@163.com>
BCC: "=?gb2312?B?26zQocHB?=" <venus.oso@gmail.com>
Subject: attach
Message-ID: <200906291839032504254@legendsec.com>
X-mailer: Foxmail 6, 15, 201, 21 [cn]
Mime-Version: 1.0
Date 日期
From: 发件人信息
To: 收件人信息
Cc: 抄送人信息
BCC: 密送人信息
Subject: 主题
X-mailer 客户端名称
非标准的、自定义域名都以X-开头,例如X-Mailer, X-MSMail-Priority等,通常在接收和发送邮件的是同一程序时才能理解它们的意义。
关于密送:有三种实现方式,
1. 在邮件服务器发送邮件前,将收件人、抄送人、密送人的邮件的Bcc行都删除。
2. 在邮件服务器发送邮件前,收件人、抄送人的邮件删除Bcc栏,只有密送人收到的邮件包含该字段。如果有多个密送人,可能在密送栏有所有密送人地址、或只有自己的地址
3. 邮件服务器拿到的邮件内容中根本不出现Bcc栏。
The "Bcc:" field (where the "Bcc" means "Blind Carbon Copy") contains addresses of recipients of the message whose addresses are not to be revealed to other recipients of the message. There are three ways in which the "Bcc:" field is used.
In the first case, when a message containing a "Bcc:" field is prepared to be sent, the "Bcc:" line is removed even though all of the recipients (including those specified in the "Bcc:" field) are sent a copy of the message.
In the second case, recipients specified in the "To:" and "Cc:" lines each are sent a copy of the message with the "Bcc:" line removed as above, but therecipients on the "Bcc:" line get a separate copy of the messagecontaining a "Bcc:" line. (When there are multiple recipient addresses in the "Bcc:" field, some implementations actually send a separate copy of the message to each recipient with a "Bcc:" containing only the address of that particular recipient.)
Finally, since a "Bcc:" field may contain no addresses, a "Bcc:" field can be sent without any addresses indicating to the recipients that blind copies were sent to someone. Which method to use with "Bcc:" fields is implementation dependent, but refer to the "Security
Considerations" section of this document for a discussion of each.
(来源:http://www.apps.ietf.org/rfc/rfc2822.html#sec-3.6.3)
3.1.2.邮件体
在邮件体中,大致有如下一些域:
域名含义
Content-Type 段体的类型
Content-Transfer-Encoding 段体的传输编码方式
Content-Disposition 段体的安排方式
Content-ID 段体的ID
Content-Location 段体的位置(路径)
Content-Base 段体的基位置
有的域除了值之外,还带有参数。值与参数、参数与参数之间以“;”分隔。参数名与参数值之间以“=”分隔。
邮件体包含邮件的内容,它的类型由邮件头的“Content-Type”域指出。常见的简单类型有text/plain(纯文本)和text/html(超文本)。
multipart类型,是MIME邮件的精髓。邮件体被分为多个段,每个段又包含段头和段体两部分,这两部分之间也以空行分隔。常见的multipart类型有三种:multipart/mixed, multipart/related和multipart/alternative。从它们的名称,不难推知这些类型各自的含义和用处。它们之间的层次关系可归纳为下图所示:
可以看出,如果在邮件中要添加附件,必须定义multipart/mixed段;如果存在内嵌资源,至少要定义multipart/related段;如果纯文本与超文本共存,至少要定义multipart/alternative段。
邮件正文
Content-Type: text/plain;
charset="gb2312"
Content-Transfer-Encoding: base64
DQoNCjIwMDktMDctMDEgDQoNCg0KDQrbrNChwcEgDQo=
上面的邮件正文使用gb2312字符集、base64编码
附件处理
.multipart/mixed:表示文档的多个部分是混合的,指正文与附件的关系。如果邮件的MIME类型是multipart/mixed,即表示邮件带有附件。
Content-Disposition Intended content disposition and file name
Indicates whether a MIME body part is to be shown inline or is an attachment; can also indicate a suggested filename for use when saving an attachment to a file.
例:
1.附件名:readme.txt
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename="readme.txt"
2.附件名:邮件内容
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename="=?gb2312?B?08q8/sTayN0udHh0?="
filename后是编码后的附件内容。
3.2. MIME编码
参考rfc2047,MIME Part Three:Message Header Extensions for Non-ASCII Text
http://tools.ietf.org/html/rfc2047
MIME编码的两种方法:
对邮件进行编码最初的原因是因为Internet上的很多网关不能正确传输8bit内码的字符,比如汉字等。编码的原理就是把8bit的内容转换成7bit的形式以能正确传输,在接收方收到之后,再将其还原成8bit的内容。
MIME是“多用途网际邮件扩充协议”的缩写,在MIME协议之前,邮件的编码曾经有过UUENCODE等编码方式,但是由于MIME协议算法简单,并且易于扩展,现在已经成为邮件编码方式的主流,不仅是用来传输8 bit的字符,也可以用来传送二进制的文件,如邮件附件中的图像、音频等信息,而且扩展了很多基于MIME的应用。
从编码方式来说,MIME定义了两种编码方法Base64与QP(Quote-Printable):
3.1.1. Base64
Base64是一种通用的方法,其原理很简单,就是把三个Byte的数据用4个Byte表示,这样,这四个Byte中,实际用到的都只有前面6 bit,这样就不存在只能传输7bit的字符的问题了。Base64的缩写一般是“B”。
Base64将输入的字符串或一段数据编码成只含有{'A'-'Z', 'a'-'z', '0'-'9', '+', '/'}这64个字符的串,'='用于填充。其编码的方法是,将输入数据流每次取6bit,用此6bit的值(0-63)作为索引去查表,输出相应字符。这样,每3个字节将编码为4个字符(3×8 → 4×6);不满4个字符的以'='填充。 Base64的算法很简单,它将字符流顺序放入一个24位的缓冲区,缺字符的地方补零。 然后将缓冲区截断成为4个部分,高位在先,每个部分6位,用64个字符重新表示。如果输入只有一个或两个字节,那么输出将用等号“=”补足。这可以隔断附加的信息造成编码的混乱。
3.2.2 QP
另一种方法是QP(Quote-Printable)方法,通常缩写为“Q”方法,其原理是把一个8 bit 的字符用两个16进制数值表示,然后在前面加“=”。所以我们看到经过QP编码后的文件通常是这个样子:=B3=C2=BF=A1=C7=E5=A3=AC=C4=FA=BA=C3=A3=A1。
QP编码要求编码后每行不能超过76个字符。当超过这个限制时,将使用软换行,用”=”表示编码行的断行,后接CRLF。(76的限制包括”=”)。
“=”等号被编码为”=3D”。
tab和空格出现在行尾时,需要被编码为”=09”(tab) “=20”(space)
Any 8-bit byte value may be encoded with 3 characters, an "=" followed by two hexadecimal digits (0–9 or A–F) representing the byte's numeric value. For example, a US-ASCII form feed character (decimal value 12) can be represented by "=0C", anda US-ASCII equal sign (decimal value 61) is represented by "=3D". All characters except printable ASCII characters or end of line characters must be encoded in this fashion.
All printable ASCII characters (decimal values between 33 and 126) may be represented by themselves, except "=" (decimal 61).
ASCII tab and space characters, decimal values 9 and 32, may be represented by themselves, except if these characters appear at the end of a line. If one of these characters appears at the end of a line it must be encoded as "=09" (tab) or "=20" (space).
If the data being encoded contains meaningful line breaks, they must be encoded as an ASCII CR LF sequence, not as their original byte values. Conversely if byte values 10 and 13 have meanings other than end of line then they must be encoded as =0A and =0D.
Lines of quoted-printable encoded data must not be longer than 76 characters. To satisfy this requirement without altering the encoded text, soft line breaks may be added as desired. A soft line break consists of an "=" at the end of an encoded line, and does not cause a line break in the decoded text.
编码格式:encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
编码信息有"=?"和"?="括起来,"=?"后是字符集名称,再一个"?"后是编码方式,再一个"?"后是编码后的字符串。字符集和编码方式都不区分大小写。
字符集可以是任意系统支持的字符集(iso-8859-1、utf-8、gb2312、gbk、gb18030....)
编码方式有两种:"B"或"b"代表base64编码;"Q"或"q"代表QP编码。
Generally, an "encoded-word" is a sequence of printable ASCII characters that begins with "=?", ends with "?=", and has two "?"s in between. It specifies a character set and an encoding method, and also includes the original text encoded as graphic ASCII characters, according to the rules for that encoding method.
下面是一个例子:
Subject: =?gb2312?B?xOO6w6Oh?=
这一主题的内容,这不是一段完整的编码,只有部分是编码了的,这个部分用=?、?=两个标记括起来,=?后面说明的是这段文字的字符集是GB2312,然后一个?后面的一个B表示的是用的Base64编码。
另一个例子:=?iso-8859-1?q?this=20is=20some=20text?=
4. smtp与mime的关系
从上图可以看出发件人、收件人地址都出现了两次,一次在smtp命令中(SMTP email address),一次在邮件正文中(MIME email address)。需要注意的是:
1. 邮件正文中可以包含发件人、收件人的别名,smtp命令中不可以
2. 密送人的地址不一定会出现在邮件正文中。不同客户端实现不同。
5. 一些测试数据
1. Utf8
1. 邮件主题:smtp_utf8测试
From - Tue Jun 30 18:13:22 2009
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00800000
X-Mozilla-Keys:
Message-ID: <4A49E538.7090508@163.com>
Date: Tue, 30 Jun 2009 18:13:12 +0800
From: =?UTF-8?B?6YOc5bCP5Lqu?= <gxl0620@163.com>
User-Agent: Thunderbird 2.0.0.22 (Windows/20090605)
MIME-Version: 1.0
To: gaoxl@legendsec.com
Subject: =?UTF-8?B?c210cF91dGY45rWL6K+V?=
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
2. 邮件主题:smtp_utf8
From - Tue Jun 30 18:13:22 2009
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00800000
X-Mozilla-Keys:
Message-ID: <4A49E538.7090508@163.com>
Date: Tue, 30 Jun 2009 18:13:12 +0800
From: =?UTF-8?B?6YOc5bCP5Lqu?= <gxl0620@163.com>
User-Agent: Thunderbird 2.0.0.22 (Windows/20090605)
MIME-Version: 1.0
To: gaoxl@legendsec.com
Subject: =?UTF-8?B?c210cF91dGY45rWL6K+V?=
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
3. 邮件主题:smtp 不需要编码使用7bit传输
From - Tue Jun 30 18:19:25 2009
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00800000
X-Mozilla-Keys:
Message-ID: <4A49E6AD.80205@163.com>
Date: Tue, 30 Jun 2009 18:19:25 +0800
From: =?UTF-8?B?6YOc5bCP5Lqu?= <gxl0620@163.com>
User-Agent: Thunderbird 2.0.0.22 (Windows/20090605)
MIME-Version: 1.0
To: gaoxl@legendsec.com
Subject: smtp
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
2. GB2312
邮件主题:中文
From - Tue Jun 30 18:32:03 2009
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00800000
X-Mozilla-Keys:
Message-ID: <4A49E9A2.10500@163.com>
Date: Tue, 30 Jun 2009 18:32:02 +0800
From: =?GB2312?B?26zQocHB?= <gxl0620@163.com>
User-Agent: Thunderbird 2.0.0.22 (Windows/20090605)
MIME-Version: 1.0
To: gaoxl@legendsec.com
Subject: =?GB2312?B?1tDOxA==?=
Content-Type: text/plain; charset=GB2312
Content-Transfer-Encoding: 7bit
3. Gb18030
邮件主题:中文
From - Tue Jun 30 18:33:47 2009
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00800000
X-Mozilla-Keys:
Message-ID: <4A49EA0B.80900@163.com>
Date: Tue, 30 Jun 2009 18:33:47 +0800
From: =?gb18030?Q?=DB=AC=D0=A1=C1=C1?= <gxl0620@163.com>
User-Agent: Thunderbird 2.0.0.22 (Windows/20090605)
MIME-Version: 1.0
To: gaoxl@legendsec.com
Subject: =?gb18030?Q?=D6=D0=CE=C4?=
Content-Type: text/plain; charset=GB18030; format=flowed
Content-Transfer-Encoding: 7bit
6. 一封邮件的完整mime信息
Date: Mon, 29 Jun 2009 18:39:03 +0800
From: "=?gb2312?B?26zQocHB?=" <gaoxl@legendsec.com>
To: "moreorless" <moreorless@live.cn>
Cc: "gxl0620" <gxl0620@163.com>
BCC: "=?gb2312?B?26zQocHB?=" <venus.oso@gmail.com>
Subject: attach
Message-ID: <200906291839032504254@legendsec.com>
X-mailer: Foxmail 6, 15, 201, 21 [cn]
Mime-Version: 1.0
Content-Type: multipart/mixed;
boundary="=====001_Dragon777814155473_====="
This is a multi-part message in MIME format.
--=====001_Dragon777814155473_=====
Content-Type: multipart/alternative;
boundary="=====003_Dragon777814155473_====="
--=====003_Dragon777814155473_=====
Content-Type: text/plain;
charset="gb2312"
Content-Transfer-Encoding: base64
DQoNCjIwMDktMDYtMjkgDQoNCg0KDQrbrNChwcEgDQo=
--=====003_Dragon777814155473_=====
Content-Type: text/html;
charset="gb2312"
Content-Transfer-Encoding: base64
PCFET0NUWVBFIEhUTUwgUFVCTElDICItLy9XM0MvL0RURCBIVE1MIDQuMCBUcmFuc2l0aW9uYWwv
L0VOIj4NCjxIVE1MPjxIRUFEPg0KPE1FVEEgY29udGVudD0idGV4dC9odG1sOyBjaGFyc2V0PWdi
MjMxMiIgaHR0cC1lcXVpdj1Db250ZW50LVR5cGU+DQo8TUVUQSBuYW1lPUdFTkVSQVRPUiBjb250
ZW50PSJNU0hUTUwgOC4wMC42MDAxLjE4NzAyIj48TElOSyByZWw9c3R5bGVzaGVldCANCmhyZWY9
IkJMT0NLUVVPVEV7bWFyZ2luLVRvcDogMHB4OyBtYXJnaW4tQm90dG9tOiAwcHg7IG1hcmdpbi1M
ZWZ0OiAyZW19Ij48L0hFQUQ+DQo8Qk9EWSBzdHlsZT0iTUFSR0lOOiAxMHB4OyBGT05ULUZBTUlM
WTogdmVyZGFuYTsgRk9OVC1TSVpFOiAxMHB0Ij4NCjxESVY+PEZPTlQgc2l6ZT0yIGZhY2U9VmVy
ZGFuYT48L0ZPTlQ+Jm5ic3A7PC9ESVY+DQo8RElWPjxGT05UIHNpemU9MiBmYWNlPVZlcmRhbmE+
PC9GT05UPiZuYnNwOzwvRElWPg0KPERJViBhbGlnbj1sZWZ0PjxGT05UIGNvbG9yPSNjMGMwYzAg
c2l6ZT0yIGZhY2U9VmVyZGFuYT4yMDA5LTA2LTI5IA0KPC9GT05UPjwvRElWPjxGT05UIHNpemU9
MiBmYWNlPVZlcmRhbmE+DQo8SFIgc3R5bGU9IldJRFRIOiAxMjJweDsgSEVJR0hUOiAycHgiIGFs
aWduPWxlZnQgU0laRT0yPg0KDQo8RElWPjxGT05UIGNvbG9yPSNjMGMwYzAgc2l6ZT0yIGZhY2U9
VmVyZGFuYT48U1BBTj7brNChwcE8L1NQQU4+IA0KPC9GT05UPjwvRElWPjwvRk9OVD48L0JPRFk+
PC9IVE1MPg0K
--=====003_Dragon777814155473_=====--
--=====001_Dragon777814155473_=====
Content-Type: application/octet-stream;
name="readme.txt"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename="readme.txt"
YWJjZGVkZg==
--=====001_Dragon777814155473_=====--