Network Working Group E. Nebel Request For Comments: 1867 L. Masinter Category: Experimental Xerox Corporation November 1995 HTML中基于表单的文件上传 (RFC1867 Form-based File Upload in HTML) 本备忘录的状态 本备忘录描述了一种Internet社区的试验协议。本备忘录并未规定任何Internet标准,它需 要进一步进行讨论和建议以得到改进。本备忘录的发布不受限制。 目录 1.摘要 2 2.带有文件提交功能的HTML表单 2 3.建议采纳的应用 3 3.1 FILE组件的显示 4 3.2提交之后的动作 4 3.3 multipart/form-data的使用 4 3.4其他属性的解释 5 4.向后兼容性的考虑 5 5.其他的考虑 6 5.1压缩,加密 6 5.2文件传输延迟 6 5.3传输二进制数据的其他解决办法 7 5.4 不修改<INPUT> 7 5.5字段内容的默认类型 8 5.6允许ACTION指向"mailto:" 8 5.7第三方传输的远程文件 8 5.8用ENCTYPE=x-www-form-urlencoded来传输文件 8 5.9将CRLF作为行分隔符 8 5.10和multipart/related的关系 9 5.11含有非ASCII码的字段名 9 6.例子 9 7. multipart/form-data的登记 10 8.安全性考虑 11 9.结论 11 作者地址: 12 A.为multipart/form-data登记的媒体类型 12 参考: 13 1.摘要 目前,HTML的表单让表单编写者能够通过表单得到浏览表单的用户的信息。在许多需要得 到用户输入的应用中,表单被证明是非常有用的。但是,因为HTML表单并没有提供让用 户可以上传文件或数据的途径,这种能力受到了一定的限制。所以那些需要从用户那儿得到 文件的服务提供商们不得不自己来建立相应的应用程序。(我们可以在www-talk邮件列表 中找到这类客户浏览器的例子。)既然文件上传是能够让许多应用程序受益的特点,这使得 人们要求扩展HTML,以便能让信息提供商们能够统一地处理文件上传请求,并为文件上传 响应提供统一的MIME兼容的表现形式。本方案同时也包括了一个向后保持兼容的策略介 绍,以便能让新的服务器能和现有的HTML客户端进行互动。 本建议独立于现有的各版本HTML。 2.带有文件提交功能的HTML表单 现有的HTML规范为INPUT元素的TYPE属性定义了八种可能的值,分别是:CHECKBOX, HIDDEN, IMAGE, PASSWORD, RADIO, RESET, SUBMIT, TEXT. 另外,当表单采用 POST方式的时候,表单默认的具有"application/x-www-form-urlencoded" 的ENCTYPE 属性。 本建议对HTML做出了两处修改: 1)为INPUT元素的TYPE属性增加了一个FILE选项。 2)INPUT标记可以具有ACCEPT属性,该属性能够指定可被上传的文件类型或文件格式 列表。 另外,本建议还定义了一种新的MIME类型:multipart/form-data,以及当处理一个带有 ENCTYPE="multipart/form-data" 并且/或含有<INPUT type="file">的标记的表单时所应该 采取的行为。 这些改变可以被视为是完全独立的,但对于合理的文件上传需求来说,这些改变都是必需的。 举例来说,当HTML表单作者想让用户能够上传一个或更多的文件时,他可以这么写: <FORM ENCTYPE="multipart/form-data" ACTION="_URL_" METHOD=POST> File to process: <INPUT NAME="userfile1" TYPE="file"> <INPUT TYPE="submit" VALUE="Send File"> </FORM> HTML DTD里所需要做出的改动是为InputType实体增加一个选项。此外,我们也建议用 一系列用逗号分隔的文件类型来作为INPUT标记的ACCEPT属性。 ... (其他元素) ... <!ENTITY % InputType "(TEXT | PASSWORD | CHECKBOX | RADIO | SUBMIT | RESET | IMAGE | HIDDEN | FILE )"> <!ELEMENT INPUT - 0 EMPTY> <!ATTLIST INPUT TYPE %InputType TEXT NAME CDATA #IMPLIED -- required for all but submit and reset VALUE CDATA #IMPLIED SRC %URI #IMPLIED -- for image inputs -- CHECKED (CHECKED) #IMPLIED SIZE CDATA #IMPLIED --like NUMBERS, but delimited with comma, not space MAXLENGTH NUMBER #IMPLIED ALIGN (top|middle|bottom) #IMPLIED ACCEPT CDATA #IMPLIED --list of content types > ... (其他元素) ... 3.建议采纳的应用 因为用户端有多种途径来选择最合适的方式来解释HTML内容,本节针对其中的一种: WWW浏览器来建议如何实现文件上传。 3.1 FILE组件的显示 当浏览器遇到一个FILE类型的INPUT标记时,它将显示一个文件名(或者是前面所选择 的文件名),和一个Browse(浏览)按钮或类似的选择方式。选择这个Browse(浏览) 按钮将触发浏览器对应于其所运行的平台相应的文件选择方式。举例来说,基于Windows 的浏览器将会弹出一个文件选择窗口。在这个文件选择窗口中,用户可以进行替换现有的选 择,为选择增加一个新的文件等操作。浏览器的设计者可以自己确定所选择的文件名列表是 否可以被用户手工修改。 如果该标记有ACCEPT属性,浏览器还可以限制符合该平台的文件类型。 3.2提交之后的动作 当用户填完了表单,并且选择了SUBMIT元素,浏览器应该将表单的内容和所选择的文件 的内容传回。对于传送那些大容量的二进制数据或包含非ASCII字符的文本来说, application/x-www-form-urlencoded编码类型是远远不能满足要求的。于是,我们提出了一 种新的媒体类型:multipart/form-data,用来作为将填写好的表单内容从客户端传回到主机 端的高效方式。 3.3 multipart/form-data的使用 第7节里面对multipart/form-data做出了具体的定义。最极端的情况是选择中不包括任何数 据。(这种选择在某些情况下是非常可能的。)作为数据流的一部分,表单中的每一项内容 都按照它们在表单中出现的顺序被依次发送。每一部分由它们在HTML表单中INPUT标记 的名字所标识。如果该部分内容的类型是已知的,就用相应的媒体内容进行标识(举例来说, 可以从文件的扩展名或者从操作系统的相关类型信息中得知),否则的话,就标识为 application/octet-stream。 如果有多个文件被选中上传,它们必须按照multipart/mixed格式进行传输。 虽然HTTP协议能够传送任意形式的二进制数据,邮件传送(举例来说,如果表单的ACTION 是mailto的形式)的默认方式是7位编码。但是如果传送的内容和默认的编码方式不兼容 的话,所传送的内容将需要进行编码,并且加上一个"content-transfer-encoding"标识头。 (此方面详细内容可参看RFC 1521第5节)。 上传文件的原始文件名也应该一道被传送,或者是作为filename参数,或者是 'content-disposition: form-data'的标题头,如果传送的是多个文件的话,也可以是子内容中 的'content-disposition:file'的标题头。客户端应用程序应该尽量提供文件名。如果客户端操 作系统上的文件名包含有非US-ASCII字符,文件名可以用类似的字符或者是按照RFC1522 中描述的方法进行编码。这在某些情况下有其便利之处,比如说上传的文件中可能包含互相 关联的关系,例如一个TeX文件可能会有一个后缀为.sty的附加类型描述文件。 在服务器端,ACTION可能是指向一个HTTP地址,借助CGI来完成表单的处理程序。在 这种情况下,CGI程序将会注意到内容类型是multipart/form-data,并采取措施来处理不同 的字段(校验合法性,按照处理顺序将文件写入磁盘等等) 3.4其他属性的解释 <INPUT TYPE=file>标记可以有一个VALUE属性来指定默认的文件名。这有可能会影响到 平台无关性,但也可能非常有用。举例来说在某些有多个提交过程的操作中,可以避免让用 户不停的选择同样的文件名。 可以用“SIZE=宽,高”来指定SIZE属性。宽度默认为文件名的宽度,而高度是所选择的 文件列表的显示区域大小。举例来说,对那些希望在浏览器中实现上传多个文件,并且显示 多行的文件输入框(当然,旁边还有一个Browse按钮)的人来说,这点非常有用。当没有 指定高度值时,将只会显示一个单行的文件输入框(如果表单设计者只希望上传一个文件的 话),而如果高度值大于1的话,将显示带有滚动条的多行输入框(如果表单设计者希望 上传多个文件的话)。 4.向后兼容性的考虑 尽管对于现有的WWW表单机制来说,一个成功的改进方案不一定要考虑这点,但是考虑 一种迁移的策略也是有帮助的:对于那些使用比较老版本的浏览器的用户来说,借助于一个 附加程序,他们也能够进行文件上传。现有的绝大部分浏览器在碰到<INPUT TYPE=FILE> 时,会将它按照<INPUT TYPE=TEXT>对待,并给用户一个文本输入框。用户能在这个框 里面输入文件名。此外,似乎现有的浏览器都忽略了表单元素中的ENCTYPE参数,并按 照application/x-www-form-urlencoded传送表单数据。 这样的话,当服务器端的CGI处理传送回来的表单数据时,如果数据类型是 application/x-www-form-urlencoded,而不是multipart/form-data,就可以知道用户使用的 浏览器没有实现文件上传。 在这种情况下,服务器端的CGI不会返回一个“text/html”响应,而是返回一个数据流以 便附加程序能够处理;这个数据流可能被标识为"application/x-please-send-files",并包含 以下内容: ? 表单数据实际需要被传送至的(标准)URL地址(以CRLF结尾) ? 应该包含文件内容的字段名字列表(用空格间隔开,以CRLF结尾) ? 客户端传至服务器端的application/x-www-form-urlencoded表单数据 这时候,浏览器需要被设置以便能启动一个附加程序来处理application/x-please-send-files 请求。 附加程序能够处理表单数据,并且注意到那些包含有“本地文件名”、需要用实际的文件内 容替代的字段。它可能会需要提示用户来改变或增加文件列表,然后重新将数据和文件内容 打包成multipart/form-data,并再次传回给服务器。 附加程序能够象那些新版本的浏览器实际处理数据那样处理表单,并按照原始的ACTION 指定的URL地址将数据发送。这样处理的好处是服务器端可以使用“同样的”CGI来处理 老版本及新版本的浏览器。 附加程序不需要显示表单数据,但是“需要”确保用户能够得知传送的文件是恰当的。(这 是为了避免那些不怀好意的服务器要求传送用户本来没有要求传送的文件而可能带来的安 全问题。)如果能够显示当前正在传送的文件状态,将非常有帮助。 5.其他的考虑 5.1压缩,加密 本方案并没有考虑可能存在的文件压缩。经过一定的考虑,我们发现如果要让浏览器自己来 决定那些文件需要被压缩的话,对文件压缩进行优化的讨论将变得非常复杂。许多连接层的 传输协议(比如说高速调制解调器)在连接层对数据进行压缩,如果在这一层上对压缩进行 优化可能不是非常恰当。如果确实希望如此的话,可以让浏览器选择是否对文件内容进行 content-transfer-encoding的x-compress压缩,并且在服务器端处理数据前进行数据解压 缩。但这将不在该方案中进行讨论。 同样,本方案也没有包括对数据进行加密的机制。这应该由其他的数据保密传输协议进行处 理,或者是保密HTTP(HTTPs),或者是电子邮件。 5.2文件传输延迟 在某些情况下,在确实准备接受数据前,服务器先对表单数据中的某些元素(比如说用户名, 账号等)进行验证是推荐的做法。但是,经过一定的考虑后,我们认为如果服务器想这样做 的话,最好是采用一系列的表单,并将前面所验证过的数据元素作为“隐藏”字段传回给客 户端,或者是通过安排表单使那些需要验证的元素先显示出来。这样的话,那些需要做复杂 的应用的服务器可以自己维持事务处理的状态,而那些简单的应用的则可以实现得简单些。 HTTP协议可能需要知道整个事务处理中的内容总长度。即使没有明确要求,HTTP客户端 也应该提供上传的所有文件的内容总长度,这样一个繁忙的服务器就能够判断文件的内容是 否是过大以至于将不能完整地处理,从而返回一个错误代码并关闭该连接,而不用等到接受 了所有的数据才进行判断。目前一些现有的CGI应用对所有的POST事务都需要知道内容 总长度。 如果INPUT标记含有一个MAXLENGTH属性,客户端可以将这个属性值看作是服务器端 所能够接受的传送文件的最大字节数。在这种情况下,服务器能够在上传开始前,提示客户 端在服务器上有多少空间可以用来进行文件上传。但是应该引起注意的是,这仅仅是一个提 示,在表单被创建后和文件上传前,服务器的实际需求可能会发生改变。 在任何情况下,如果接受的文件过大的话,任何一个HTTP服务器都有可能在文件传输的 过程中中断传输。 5.3传输二进制数据的其他解决办法 有些人曾经建议使用一种新的MIME类型"aggregate",比如说aggregate/mixed 或是 content-transfer-encoding "包"来描述那些不确定长度的二进制数据,而不是靠分解为多个 部分来表示。虽然我们并不反对这么做,但这需要增加额外的设计和标准化工作来让大家接 受并理解"aggregate"。 从另一方面来说,"分解为多部分"的机制工作得很好,能够非常简 单的在客户发送端和服务器接受端加以实现,而且能像其他一些综合处理二进制数据的方式 一样高效率地工作。 5.4 不修改<INPUT> 有些人曾经提到过,为什么要修改INPUT来实现文件上传功能,而不是为表单元素提供一 个完全不同的类型?在这种种考虑中,当我们使用<INPUT>时,最重要的考虑是兼容策略。 事实上,<INPUT>标记"早就已经"被修改过以用来包含各种输入的数据,相比较于创造不同 种类的<INPUT>标记,对<INPUT>进行加强看起来是更为合理的办法。INPUT的“类型” 并不是它所返回的内容类型,而更象是“多类型”的,也就是说,它表示了和用户互动的方 式。它的定义被仔细地斟酌以便其既能在文本浏览器,也能在声音标记中使用。 5.5字段内容的默认类型 HTML中许多字段都需要用户进行输入。过去人们对这些表单数据应该如何传回到服务器有 些意见分歧。但是将这些INPUT字段的内容看成是纯文本很明显将有助于消除这方面的分 歧。客户端再将这些数据传回到服务器以前应该将它们用CRLF分隔开,并进行适当的编 码。 5.6允许ACTION指向"mailto:" 虽然和本方案无关,但是如果允许客户端的表单的ACTION指向一个"mailto:"地址将肯定非 常有用。不管本方案本身怎么设想,这都是一个好主意。同样的,那些用来接受邮件的表单 的ACTION也应该默认指向"reply-to:"。这两个设想有助于让HTML表单借助于HTTP服务 器工作,但通过电子邮件发送内容。或者也可以这么做:允许HTML表单能够被电子邮件 发送,当HTML中指明的邮件收件人填写完表单后,再将结果发送作为邮件传送回来。 5.7第三方传输的远程文件 在某些情况下,那些操作客户端软件的用户可能希望通过指定一个URL地址来传送位于网 上,而不是本地的数据文件。在这种情况下,浏览器能够发送给客户一个指向远程数据的连 接,而不是实际的所有内容吗?这种要求实际上是可以办得到的,举例来说,只要让客户在 发送给服务器的数据当中,用"message/external-body"来指明数据的类型,同时将 "access-type"设置为连接的地址,并在发送的内容中包含远程数据的URL地址就可以了。 5.8用ENCTYPE=x-www-form-urlencoded来传输文件 如果一个表单包含了<INPUT TYPE=file>元素,但是表单本身未包含ENCTYPE属性,也 就是没有详细说明相应的行为的话。这将可能导致为服务器进行不恰当的对大量数据进行 URN编码,而这将是服务器端所不希望看到的 5.9将CRLF作为行分隔符 象所有的MIME传输一样,在用POST方式传送表单内容的时候,CRLF都被用作行的分 隔符。 5.10和multipart/related的关系 MIMESGML小组正在考虑制订一种新的类型,称为multipart/related。它包含和 multipart/form-data类似的特点。Form-data的使用和应用却是完全不同的,所以它被单独 进行描述。 在某些情况下,有可能将HTML表单的内容(包括文件)作为multipart/related进行编码, 但这和本方案所讨论的情况有很大的不同。 5.11含有非ASCII码的字段名 需要注意的是MIME的标题头通常是由7位的US-ASCII字符集构成。所以如果字段名的字 符不属于该字符集的话,就必须按照RFC 1522里面所提到的方法进行编码。在HTML 2.0 里面,默认的字符集是ISO-8859-1,而由非ASCII码字符组成的字段名就必须进行编码。 6.例子 假设服务器段提供的是如下的HTML: <FORM ACTION="http://server.dom/cgi/handle" ENCTYPE="multipart/form-data" METHOD=POST> What is your name? <INPUT TYPE=TEXT NAME=submitter> What files are you sending? <INPUT TYPE=FILE NAME=pics> </FORM> 用户在“姓名”字段里面填写"Joe Blow",对问题'What files are you sending?',用户选择 了一个文本文件"file1.txt"。 客户段可能发送回如下的数据: Content-type: multipart/form-data, boundary=AaB03x --AaB03x content-disposition: form-data; name="field1" Joe Blow --AaB03x content-disposition: form-data; name="pics"; filename="file1.txt" Content-Type: text/plain ... file1.txt 的内容... --AaB03x-- 如果用户同时还选择了另一个图片文件"file2.gif",那么客户端可能发送的数据将是: Content-type: multipart/form-data, boundary=AaB03x --AaB03x content-disposition: form-data; name="field1" Joe Blow --AaB03x content-disposition: form-data; name="pics" Content-type: multipart/mixed, boundary=BbC04y --BbC04y Content-disposition: attachment; filename="file1.txt" Content-Type: text/plain ... file1.txt 的内容... --BbC04y Content-disposition: attachment; filename="file2.gif" Content-type: image/gif Content-Transfer-Encoding: binary ... file2.gif的内容... --BbC04y-- --AaB03x-- 7. multipart/form-data的登记 multipart/form-data的媒体内容遵从RFC 1521所规定的多部分的数据流规则。它主要被用 来描述表单填写后返回的数据。在一个表单中(这里指的是HTML,当然其他一些应用也可 能使用表单),有一系列字段提供给用户进行填写,每个字段都有自己的名字。在一个确定 的表单中,每个名字都是唯一的。 multipart/form-data由多个部分组成,每一部分都有一个content-disposition标题头,它的 值是"form-data",它的属性指明了其在表单内的字段名。举例来说,'content-disposition: form-data; name="xxxxx"',这里的xxxxx就是对应于该字段的字段名。如果字段名包含非 ASCII码字符的话,还应该按照RFC 1522里面所规定的方法进行编码。 对所有的多部分MIME类型来说,每一部分有一个可选的Content-Type,默认的值是 text/plain。如果文件的内容是通过表单填写上传返回的话,那么输入的文件就被定义为 application/octet-stream,或者,如果知道是什么类型的话,就定义为相应的媒体类型。如 果一个表单返回多个文件,那么它们就作为multipart/form-data中所结合的multipart/mixed 被返回。 如果所传送的内容不符合默认的编码方式的话,该部分都将被编码,并加上 "content-transfer-encoding"的标题头。 上传的文件也可能被指定文件名,文件名可以由标题头"content-disposition"中的filename 参数所指定。虽然这并不是必需的,但我们强烈建议在能够得知原始文件名的情况下这么做。 对于很多应用程序来说,这都是必需的或者是有用的。 8.安全性考虑 如果用户没有明确要求发送某个文件,用户端就不应该发送该文件,这点非常重要。所以, 在碰到<INPUT TYPE=file VALUE="yyyy">的标记的时候,HTML解释器应该能够让用户确 认默认的文件名。不要使用隐含的字段来指定任何文件。 本方案并没有包括对数据加密的讨论;这应该是保密数据传输协议,或者是加密HTTP,或 者是MOSS所提供的加密协议(在RFC 1848中有具体的描述)所讨论的问题。 一旦文件上传成功,就将取决于文件接受方来处理文件或者是储存在适当的地方。 9.结论 我们所建议的应用让客户端有很大的弹性来决定它发送到服务器的文件的类型和数量,也让 服务器端有权决定是否接受上传的文件,同时也让服务器有机会和那些不支持类型为file的 INPUT的浏览器进行互动。 对HTML DTD的改动虽然很简单,但却有很大的作用。能够让目前这种缺少文件上传机制 的万维网实现很多种服务。这将给万维网实际的性能增加许多惊人的价值。 作者地址: Larry Masinter Xerox Palo Alto Research Center 3333 Coyote Hill Road Palo Alto, CA 94304 Phone: (415) 812-4365 Fax: (415) 812-4333 EMail: masinter@parc.xerox.com Ernesto Nebel XSoft, Xerox Corporation 10875 Rancho Bernardo Road, Suite 200 San Diego, CA 92127-2116 Phone: (619) 676-7817 Fax: (619) 676-7865 EMail: nebel@xsoft.sd.xerox.com A.为multipart/form-data登记的媒体类型 媒体类型名称: multipart 子类型名称: form-data 必需的参数: 无 可选参数: 无 编码考虑: 和其他类型相比没有额外的考虑。 发行的规范: RFC 1867 安全性考虑 multipart/form-data并未引进新的安全性考虑来针对那些可能存在所附的内容中的问题。 参考: [RFC 1521] MIME (多用途的网际邮件扩充协议) 第一部分: 网上邮件内容格式的确定和规范机制 N. Borenstein & N. Freed. 1993年9月. [RFC 1522] MIME (多用途的网际邮件扩充协议) 第二部分: 非ASCII码文本的邮件头扩充 K. Moore. 1993年9月. [RFC 1806] 英特网上的信息通讯和表达 信息: Content-Disposition标题头. R. Troost & S. Dorner, 1995年6月. RFC 1867 Form-based File Upload in HTML HTML中基于表单的文件上传 RFC文档中文翻译计划 组织:中国互动出版网(http://www.china-pub.com/) RFC文档中文翻译计划(http://www.china-pub.com/compters/emook/aboutemook.htm) E-mail:ouyang@china-pub.com 译者:黄俊(hujiao hj_chinese@yahoo.com) 译文发布时间:2001-4-26 版权:本中文翻译文档版权归中国互动出版网所有。可以用于非商业用途自由转载,但必须 保留本文档的翻译及版权信息。 1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Network Working Group E. Nebel Request For Comments: 1867 L. Masinter Category: Experimental Xerox Corporation November 1995 Form-based File Upload in HTML Status of this Memo This memo defines an Experimental Protocol for the Internet community. This memo does not specify an Internet standard of any kind. Discussion and suggestions for improvement are requested. Distribution of this memo is unlimited. 1. Abstract Currently, HTML forms allow the producer of the form to request information from the user reading the form. These forms have proven useful in a wide variety of applications in which input from the user is necessary. However, this capability is limited because HTML forms don't provide a way to ask the user to submit files of data. Service providers who need to get files from the user have had to implement custom user applications. (Examples of these custom browsers have appeared on the www-talk mailing list.) Since file-upload is a feature that will benefit many applications, this proposes an extension to HTML to allow information providers to express file upload requests uniformly, and a MIME compatible representation for file upload responses. This also includes a description of a backward compatibility strategy that allows new servers to interact with the current HTML user agents. The proposal is independent of which version of HTML it becomes a part. 2. HTML forms with file submission The current HTML specification defines eight possible values for the attribute TYPE of an INPUT element: CHECKBOX, HIDDEN, IMAGE, PASSWORD, RADIO, RESET, SUBMIT, TEXT. In addition, it defines the default ENCTYPE attribute of the FORM element using the POST METHOD to have the default value "application/x-www-form-urlencoded". Nebel & Masinter Experimental [Page 1] RFC 1867 Form-based File Upload in HTML November 1995 This proposal makes two changes to HTML: 1) Add a FILE option for the TYPE attribute of INPUT. 2) Allow an ACCEPT attribute for INPUT tag, which is a list of media types or type patterns allowed for the input. In addition, it defines a new MIME media type, multipart/form-data, and specifies the behavior of HTML user agents when interpreting a form with ENCTYPE="multipart/form-data" and/or <INPUT type="file"> tags. These changes might be considered independently, but are all necessary for reasonable file upload. The author of an HTML form who wants to request one or more files from a user would write (for example): <FORM ENCTYPE="multipart/form-data" ACTION="_URL_" METHOD=POST> File to process: <INPUT NAME="userfile1" TYPE="file"> <INPUT TYPE="submit" VALUE="Send File"> </FORM> The change to the HTML DTD is to add one item to the entity "InputType". In addition, it is proposed that the INPUT tag have an ACCEPT attribute, which is a list of comma-separated media types. ... (other elements) ... <!ENTITY % InputType "(TEXT | PASSWORD | CHECKBOX | RADIO | SUBMIT | RESET | IMAGE | HIDDEN | FILE )"> <!ELEMENT INPUT - 0 EMPTY> <!ATTLIST INPUT TYPE %InputType TEXT NAME CDATA #IMPLIED -- required for all but submit and reset VALUE CDATA #IMPLIED SRC %URI #IMPLIED -- for image inputs -- CHECKED (CHECKED) #IMPLIED SIZE CDATA #IMPLIED --like NUMBERS, but delimited with comma, not space MAXLENGTH NUMBER #IMPLIED ALIGN (top|middle|bottom) #IMPLIED ACCEPT CDATA #IMPLIED --list of content types > Nebel & Masinter Experimental [Page 2] RFC 1867 Form-based File Upload in HTML November 1995 ... (other elements) ... 3. Suggested implementation While user agents that interpret HTML have wide leeway to choose the most appropriate mechanism for their context, this section suggests how one class of user agent, WWW browsers, might implement file upload. 3.1 Display of FILE widget When a INPUT tag of type FILE is encountered, the browser might show a display of (previously selected) file names, and a "Browse" button or selection method. Selecting the "Browse" button would cause the browser to enter into a file selection mode appropriate for the platform. Window-based browsers might pop up a file selection window, for example. In such a file selection dialog, the user would have the option of replacing a current selection, adding a new file selection, etc. Browser implementors might choose let the list of file names be manually edited. If an ACCEPT attribute is present, the browser might constrain the file patterns prompted for to match those with the corresponding appropriate file extensions for the platform. 3.2 Action on submit When the user completes the form, and selects the SUBMIT element, the browser should send the form data and the content of the selected files. The encoding type application/x-www-form-urlencoded is inefficient for sending large quantities of binary data or text containing non-ASCII characters. Thus, a new media type, multipart/form-data, is proposed as a way of efficiently sending the values associated with a filled-out form from client to server. 3.3 use of multipart/form-data The definition of multipart/form-data is included in section 7. A boundary is selected that does not occur in any of the data. (This selection is sometimes done probabilisticly.) Each field of the form is sent, in the order in which it occurs in the form, as a part of the multipart stream. Each part identifies the INPUT name within the original HTML form. Each part should be labelled with an appropriate content-type if the media type is known (e.g., inferred from the file extension or operating system typing information) or as application/octet-stream. Nebel & Masinter Experimental [Page 3] RFC 1867 Form-based File Upload in HTML November 1995 If multiple files are selected, they should be transferred together using the multipart/mixed format. While the HTTP protocol can transport arbitrary BINARY data, the default for mail transport (e.g., if the ACTION is a "mailto:" URL) is the 7BIT encoding. The value supplied for a part may need to be encoded and the "content-transfer-encoding" header supplied if the value does not conform to the default encoding. [See section 5 of RFC 1521 for more details.] The original local file name may be supplied as well, either as a 'filename' parameter either of the 'content-disposition: form-data' header or in the case of multiple files in a 'content-disposition: file' header of the subpart. The client application should make best effort to supply the file name; if the file name of the client's operating system is not in US-ASCII, the file name might be approximated or encoded using the method of RFC 1522. This is a convenience for those cases where, for example, the uploaded files might contain references to each other, e.g., a TeX file and its .sty auxiliary style description. On the server end, the ACTION might point to a HTTP URL that implements the forms action via CGI. In such a case, the CGI program would note that the content-type is multipart/form-data, parse the various fields (checking for validity, writing the file data to local files for subsequent processing, etc.). 3.4 Interpretation of other attributes The VALUE attribute might be used with <INPUT TYPE=file> tags for a default file name. This use is probably platform dependent. It might be useful, however, in sequences of more than one transaction, e.g., to avoid having the user prompted for the same file name over and over again. The SIZE attribute might be specified using SIZE=width,height, where width is some default for file name width, while height is the expected size showing the list of selected files. For example, this would be useful for forms designers who expect to get several files and who would like to show a multiline file input field in the browser (with a "browse" button beside it, hopefully). It would be useful to show a one line text field when no height is specified (when the forms designer expects one file, only) and to show a multiline text area with scrollbars when the height is greater than 1 (when the forms designer expects multiple files). Nebel & Masinter Experimental [Page 4] RFC 1867 Form-based File Upload in HTML November 1995 4. Backward compatibility issues While not necessary for successful adoption of an enhancement to the current WWW form mechanism, it is useful to also plan for a migration strategy: users with older browsers can still participate in file upload dialogs, using a helper application. Most current web browers, when given <INPUT TYPE=FILE>, will treat it as <INPUT TYPE=TEXT> and give the user a text box. The user can type in a file name into this text box. In addition, current browsers seem to ignore the ENCTYPE parameter in the <FORM> element, and always transmit the data as application/x-www-form-urlencoded. Thus, the server CGI might be written in a way that would note that the form data returned had content-type application/x-www-form- urlencoded instead of multipart/form-data, and know that the user was using a browser that didn't implement file upload. In this case, rather than replying with a "text/html" response, the CGI on the server could instead send back a data stream that a helper application might process instead; this would be a data stream of type "application/x-please-send-files", which contains: * The (fully qualified) URL to which the actual form data should be posted (terminated with CRLF) * The list of field names that were supposed to be file contents (space separated, terminated with CRLF) * The entire original application/x-www-form-urlencoded form data as originally sent from client to server. In this case, the browser needs to be configured to process application/x-please-send-files to launch a helper application. The helper would read the form data, note which fields contained 'local file names' that needed to be replaced with their data content, might itself prompt the user for changing or adding to the list of files available, and then repackage the data & file contents in multipart/form-data for retransmission back to the server. The helper would generate the kind of data that a 'new' browser should actually have sent in the first place, with the intention that the URL to which it is sent corresponds to the original ACTION URL. The point of this is that the server can use the *same* CGI to implement the mechanism for dealing with both old and new browsers. The helper need not display the form data, but *should* ensure that the user actually be prompted about the suitability of sending the files requested (this is to avoid a security problem with malicious servers that ask for files that weren't actually promised by the Nebel & Masinter Experimental [Page 5] RFC 1867 Form-based File Upload in HTML November 1995 user.) It would be useful if the status of the transfer of the files involved could be displayed. 5. Other considerations 5.1 Compression, encryption This scheme doesn't address the possible compression of files. After some consideration, it seemed that the optimization issues of file compression were too complex to try to automatically have browsers decide that files should be compressed. Many link-layer transport mechanisms (e.g., high-speed modems) perform data compression over the link, and optimizing for compression at this layer might not be appropriate. It might be possible for browsers to optionally produce a content-transfer-encoding of x-compress for file data, and for servers to decompress the data before processing, if desired; this was left out of the proposal, however. Similarly, the proposal does not contain a mechanism for encryption of the data; this should be handled by whatever other mechanisms are in place for secure transmission of data, whether via secure HTTP or mail. 5.2 Deferred file transmission In some situations, it might be advisable to have the server validate various elements of the form data (user name, account, etc.) before actually preparing to receive the data. However, after some consideration, it seemed best to require that servers that wish to do this should implement this as a series of forms, where some of the data elements that were previously validated might be sent back to the client as 'hidden' fields, or by arranging the form so that the elements that need validation occur first. This puts the onus of maintaining the state of a transaction only on those servers that wish to build a complex application, while allowing those cases that have simple input needs to be built simply. The HTTP protocol may require a content-length for the overall transmission. Even if it were not to do so, HTTP clients are encouraged to supply content-length for overall file input so that a busy server could detect if the proposed file data is too large to be processed reasonably and just return an error code and close the connection without waiting to process all of the incoming data. Some current implementations of CGI require a content-length in all POST transactions. If the INPUT tag includes the attribute MAXLENGTH, the user agent should consider its value to represent the maximum Content-Length (in Nebel & Masinter Experimental [Page 6] RFC 1867 Form-based File Upload in HTML November 1995 bytes) which the server will accept for transferred files. In this way, servers can hint to the client how much space they have available for a file upload, before that upload takes place. It is important to note, however, that this is only a hint, and the actual requirements of the server may change between form creation and file submission. In any case, a HTTP server may abort a file upload in the middle of the transaction if the file being received is too large. 5.3 Other choices for return transmission of binary data Various people have suggested using new mime top-level type "aggregate", e.g., aggregate/mixed or a content-transfer-encoding of "packet" to express indeterminate-length binary data, rather than relying on the multipart-style boundaries. While we are not opposed to doing so, this would require additional design and standardization work to get acceptance of "aggregate". On the other hand, the 'multipart' mechanisms are well established, simple to implement on both the sending client and receiving server, and as efficient as other methods of dealing with multiple combinations of binary data. 5.4 Not overloading <INPUT>: Various people have wondered about the advisability of overloading 'INPUT' for this function, rather than merely providing a different type of FORM element. Among other considerations, the migration strategy which is allowed when using <INPUT> is important. In addition, the <INPUT> field *is* already overloaded to contain most kinds of data input; rather than creating multiple kinds of <INPUT> tags, it seems most reasonable to enhance <INPUT>. The 'type' of INPUT is not the content-type of what is returned, but rather the 'widget-type'; i.e., it identifies the interaction style with the user. The description here is carefully written to allow <INPUT TYPE=FILE> to work for text browsers or audio-markup. 5.5 Default content-type of field data Many input fields in HTML are to be typed in. There has been some ambiguity as to how form data should be transmitted back to servers. Making the content-type of <INPUT> fields be text/plain clearly disambiguates that the client should properly encode the data before sending it back to the server with CRLFs. 5.6 Allow form ACTION to be "mailto:" Independent of this proposal, it would be very useful for HTML interpreting user agents to allow a ACTION in a form to be a Nebel & Masinter Experimental [Page 7] RFC 1867 Form-based File Upload in HTML November 1995 "mailto:" URL. This seems like a good idea, with or without this proposal. Similarly, the ACTION for a HTML form which is received via mail should probably default to the "reply-to:" of the message. These two proposals would allow HTML forms to be served via HTTP servers but sent back via mail, or, alternatively, allow HTML forms to be sent by mail, filled out by HTML-aware mail recipients, and the results mailed back. 5.7 Remote files with third-party transfer In some scenarios, the user operating the client software might want to specify a URL for remote data rather than a local file. In this case, is there a way to allow the browser to send to the client a pointer to the external data rather than the entire contents? This capability could be implemented, for example, by having the client send to the server data of type "message/external-body" with "access-type" set to, say, "uri", and the URL of the remote data in the body of the message. 5.8 File transfer with ENCTYPE=x-www-form-urlencoded If a form contains <INPUT TYPE=file> elements but does not contain an ENCTYPE in the enclosing <FORM>, the behavior is not specified. It is probably inappropriate to attempt to URN-encode large quantities of data to servers that don't expect it. 5.9 CRLF used as line separator As with all MIME transmissions, CRLF is used as the separator for lines in a POST of the data in multipart/form-data. 5.10 Relationship to multipart/related The MIMESGML group is proposing a new type called multipart/related. While it contains similar features to multipart/form-data, the use and application of form-data is different enough that form-data is being described separately. It might be possible at some point to encode the result of HTML forms (including files) in a multipart/related body part; this is not incompatible with this proposal. 5.11 Non-ASCII field names Note that mime headers are generally required to consist only of 7- bit data in the US-ASCII character set. Hence field names should be encoded according to the prescriptions of RFC 1522 if they contain characters outside of that set. In HTML 2.0, the default character Nebel & Masinter Experimental [Page 8] RFC 1867 Form-based File Upload in HTML November 1995 set is ISO-8859-1, but non-ASCII characters in field names should be encoded. 6. Examples Suppose the server supplies the following HTML: <FORM ACTION="http://server.dom/cgi/handle" ENCTYPE="multipart/form-data" METHOD=POST> What is your name? <INPUT TYPE=TEXT NAME=submitter> What files are you sending? <INPUT TYPE=FILE NAME=pics> </FORM> and the user types "Joe Blow" in the name field, and selects a text file "file1.txt" for the answer to 'What files are you sending?' The client might send back the following data: Content-type: multipart/form-data, boundary=AaB03x --AaB03x content-disposition: form-data; name="field1" Joe Blow --AaB03x content-disposition: form-data; name="pics"; filename="file1.txt" Content-Type: text/plain ... contents of file1.txt ... --AaB03x-- If the user also indicated an image file "file2.gif" for the answer to 'What files are you sending?', the client might client might send back the following data: Content-type: multipart/form-data, boundary=AaB03x --AaB03x content-disposition: form-data; name="field1" Joe Blow --AaB03x content-disposition: form-data; name="pics" Content-type: multipart/mixed, boundary=BbC04y --BbC04y Content-disposition: attachment; filename="file1.txt" Nebel & Masinter Experimental [Page 9] RFC 1867 Form-based File Upload in HTML November 1995 Content-Type: text/plain ... contents of file1.txt ... --BbC04y Content-disposition: attachment; filename="file2.gif" Content-type: image/gif Content-Transfer-Encoding: binary ...contents of file2.gif... --BbC04y-- --AaB03x-- 7. Registration of multipart/form-data The media-type multipart/form-data follows the rules of all multipart MIME data streams as outlined in RFC 1521. It is intended for use in returning the data that comes about from filling out a form. In a form (in HTML, although other applications may also use forms), there are a series of fields to be supplied by the user who fills out the form. Each field has a name. Within a given form, the names are unique. multipart/form-data contains a series of parts. Each part is expected to contain a content-disposition header where the value is "form- data" and a name attribute specifies the field name within the form, e.g., 'content-disposition: form-data; name="xxxxx"', where xxxxx is the field name corresponding to that field. Field names originally in non-ASCII character sets may be encoded using the method outlined in RFC 1522. As with all multipart MIME types, each part has an optional Content- Type which defaults to text/plain. If the contents of a file are returned via filling out a form, then the file input is identified as application/octet-stream or the appropriate media type, if known. If multiple files are to be returned as the result of a single form entry, they can be returned as multipart/mixed embedded within the multipart/form-data. Each part may be encoded and the "content-transfer-encoding" header supplied if the value of that part does not conform to the default encoding. File inputs may also identify the file name. The file name may be described using the 'filename' parameter of the "content-disposition" header. This is not required, but is strongly recommended in any case where the original filename is known. This is useful or necessary in many applications. Nebel & Masinter Experimental [Page 10] RFC 1867 Form-based File Upload in HTML November 1995 8. Security Considerations It is important that a user agent not send any file that the user has not explicitly asked to be sent. Thus, HTML interpreting agents are expected to confirm any default file names that might be suggested with <INPUT TYPE=file VALUE="yyyy">. Never have any hidden fields be able to specify any file. This proposal does not contain a mechanism for encryption of the data; this should be handled by whatever other mechanisms are in place for secure transmission of data, whether via secure HTTP, or by security provided by MOSS (described in RFC 1848). Once the file is uploaded, it is up to the receiver to process and store the file appropriately. 9. Conclusion The suggested implementation gives the client a lot of flexibility in the number and types of files it can send to the server, it gives the server control of the decision to accept the files, and it gives servers a chance to interact with browsers which do not support INPUT TYPE "file". The change to the HTML DTD is very simple, but very powerful. It enables a much greater variety of services to be implemented via the World-Wide Web than is currently possible due to the lack of a file submission facility. This would be an extremely valuable addition to the capabilities of the World-Wide Web. Nebel & Masinter Experimental [Page 11] RFC 1867 Form-based File Upload in HTML November 1995 Authors' Addresses Larry Masinter Xerox Palo Alto Research Center 3333 Coyote Hill Road Palo Alto, CA 94304 Phone: (415) 812-4365 Fax: (415) 812-4333 EMail: masinter@parc.xerox.com Ernesto Nebel XSoft, Xerox Corporation 10875 Rancho Bernardo Road, Suite 200 San Diego, CA 92127-2116 Phone: (619) 676-7817 Fax: (619) 676-7865 EMail: nebel@xsoft.sd.xerox.com Nebel & Masinter Experimental [Page 12] RFC 1867 Form-based File Upload in HTML November 1995 A. Media type registration for multipart/form-data Media Type name: multipart Media subtype name: form-data Required parameters: none Optional parameters: none Encoding considerations: No additional considerations other than as for other multipart types. Published specification: RFC 1867 Security Considerations The multipart/form-data type introduces no new security considerations beyond what might occur with any of the enclosed parts. References [RFC 1521] MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies. N. Borenstein & N. Freed. September 1993. [RFC 1522] MIME (Multipurpose Internet Mail Extensions) Part Two: Message Header Extensions for Non-ASCII Text. K. Moore. September 1993. [RFC 1806] Communicating Presentation Information in Internet Messages: The Content-Disposition Header. R. Troost & S. Dorner, June 1995. Nebel & Masinter Experimental [Page 13]