百度下邮箱格式的正则表示,能够搜索到各式各样,五花八门的表示。如果没有仔细甄别,错误使用其中的一些代码,则很可能造成在遇到一些特殊的邮箱格式时无法识别。这里就分析下邮件相关的RFC标准,可详见RFC 5322, Internet Message Format或[2-RFC5322], 但在此之前需要先学习下[1-RFC5234]中关于ABNF的核心规则。
[1-RFC5234] 中Appendix B. Core ABNF of ABNF
B.1. Core Rules
ALPHA = %x41-5A / %x61-7A ; A-Z / a-z
BIT = "0" / "1"
CHAR = %x01-7F ; any 7-bit US-ASCII character, excluding NUL
CR = %x0D ; carriage return
CRLF = CR LF ; Internet standard newline
CTL = %x00-1F / %x7F ; controls
DIGIT = %x30-39 ; 0-9
DQUOTE = %x22 ; " (Double Quote)
HEXDIG = DIGIT / "A" / "B" / "C" / "D" / "E" / "F"
HTAB = %x09 ; horizontal tab
LF = %x0A ; linefeed
LWSP = *(WSP / CRLF WSP)
; Use of this linear-white-space rule
; permits lines containing only white
; space that are no longer legal in
; mail headers and have caused
; interoperability problems in other
; contexts.
; Do not use when defining mail
; headers and use with caution in
; other contexts.
OCTET = %x00-FF ; 8 bits of data
SP = %x20 ; space
VCHAR = %x21-7E ; visible (printing) characters
WSP = SP / HTAB ; white space
以下内容见[2-RFC5322], 章节编号与原文保持一致。