RFC1867

最新推荐文章于 2023-10-01 01:29:32 发布

又菜又爱玩٩( ö̆ ) و

最新推荐文章于 2023-10-01 01:29:32 发布

阅读量1k

点赞数

分类专栏： Java Web

Java Web 专栏收录该内容

27 篇文章 1 订阅

订阅专栏

Network Working Group                                           E. Nebel
Request For Comments: 1867                                   L. Masinter
Category: Experimental                                 Xerox Corporation
                                                           November 1995

                     HTML中基于表单的文件上传
(RFC1867  Form-based File Upload in HTML)

本备忘录的状态

本备忘录描述了一种Internet社区的试验协议。本备忘录并未规定任何Internet标准，它需
要进一步进行讨论和建议以得到改进。本备忘录的发布不受限制。

目录
1．摘要	2
2．带有文件提交功能的HTML表单	2
3．建议采纳的应用	3
3.1 FILE组件的显示	4
3.2提交之后的动作	4
3.3 multipart/form-data的使用	4
3.4其他属性的解释	5
4.向后兼容性的考虑	5
5.其他的考虑	6
5.1压缩，加密	6
5.2文件传输延迟	6
5.3传输二进制数据的其他解决办法	7
5.4 不修改<INPUT>	7
5.5字段内容的默认类型	8
5.6允许ACTION指向"mailto:"	8
5.7第三方传输的远程文件	8
5.8用ENCTYPE=x-www-form-urlencoded来传输文件	8
5.9将CRLF作为行分隔符	8
5.10和multipart/related的关系	9
5.11含有非ASCII码的字段名	9
6.例子	9
7. multipart/form-data的登记	10
8.安全性考虑	11
9.结论	11
作者地址：	12
A.为multipart/form-data登记的媒体类型	12
参考：	13

1．摘要

目前，HTML的表单让表单编写者能够通过表单得到浏览表单的用户的信息。在许多需要得
到用户输入的应用中，表单被证明是非常有用的。但是，因为HTML表单并没有提供让用
户可以上传文件或数据的途径，这种能力受到了一定的限制。所以那些需要从用户那儿得到
文件的服务提供商们不得不自己来建立相应的应用程序。（我们可以在www-talk邮件列表
中找到这类客户浏览器的例子。）既然文件上传是能够让许多应用程序受益的特点，这使得
人们要求扩展HTML，以便能让信息提供商们能够统一地处理文件上传请求，并为文件上传
响应提供统一的MIME兼容的表现形式。本方案同时也包括了一个向后保持兼容的策略介
绍，以便能让新的服务器能和现有的HTML客户端进行互动。

本建议独立于现有的各版本HTML。

2．带有文件提交功能的HTML表单

现有的HTML规范为INPUT元素的TYPE属性定义了八种可能的值，分别是：CHECKBOX, 
HIDDEN, IMAGE, PASSWORD, RADIO, RESET, SUBMIT, TEXT. 另外，当表单采用
POST方式的时候，表单默认的具有"application/x-www-form-urlencoded" 的ENCTYPE
属性。

本建议对HTML做出了两处修改：
1）为INPUT元素的TYPE属性增加了一个FILE选项。
2）INPUT标记可以具有ACCEPT属性，该属性能够指定可被上传的文件类型或文件格式
列表。

另外，本建议还定义了一种新的MIME类型：multipart/form-data，以及当处理一个带有
ENCTYPE="multipart/form-data" 并且/或含有<INPUT type="file">的标记的表单时所应该
采取的行为。

这些改变可以被视为是完全独立的，但对于合理的文件上传需求来说，这些改变都是必需的。

举例来说，当HTML表单作者想让用户能够上传一个或更多的文件时，他可以这么写：

    <FORM ENCTYPE="multipart/form-data" ACTION="_URL_" METHOD=POST>

    File to process: <INPUT NAME="userfile1" TYPE="file">

    <INPUT TYPE="submit" VALUE="Send File">

    </FORM>

HTML DTD里所需要做出的改动是为InputType实体增加一个选项。此外，我们也建议用
一系列用逗号分隔的文件类型来作为INPUT标记的ACCEPT属性。

  ... (其他元素) ...

  <!ENTITY % InputType "(TEXT | PASSWORD | CHECKBOX |
                         RADIO | SUBMIT | RESET |
                         IMAGE | HIDDEN | FILE )">
  <!ELEMENT INPUT - 0 EMPTY>
  <!ATTLIST INPUT
          TYPE %InputType TEXT
          NAME CDATA #IMPLIED  -- required for all but submit and reset
          VALUE CDATA #IMPLIED
          SRC %URI #IMPLIED  -- for image inputs --
          CHECKED (CHECKED) #IMPLIED
          SIZE CDATA #IMPLIED  --like NUMBERS,
                                  but delimited with comma, not space
          MAXLENGTH NUMBER #IMPLIED
          ALIGN (top|middle|bottom) #IMPLIED
          ACCEPT CDATA #IMPLIED --list of content types
          >

  ... (其他元素) ...

3．建议采纳的应用

因为用户端有多种途径来选择最合适的方式来解释HTML内容，本节针对其中的一种：
WWW浏览器来建议如何实现文件上传。

3.1 FILE组件的显示

当浏览器遇到一个FILE类型的INPUT标记时，它将显示一个文件名（或者是前面所选择
的文件名），和一个Browse（浏览）按钮或类似的选择方式。选择这个Browse（浏览）
按钮将触发浏览器对应于其所运行的平台相应的文件选择方式。举例来说，基于Windows
的浏览器将会弹出一个文件选择窗口。在这个文件选择窗口中，用户可以进行替换现有的选
择，为选择增加一个新的文件等操作。浏览器的设计者可以自己确定所选择的文件名列表是
否可以被用户手工修改。

如果该标记有ACCEPT属性，浏览器还可以限制符合该平台的文件类型。

3.2提交之后的动作

当用户填完了表单，并且选择了SUBMIT元素，浏览器应该将表单的内容和所选择的文件
的内容传回。对于传送那些大容量的二进制数据或包含非ASCII字符的文本来说，
application/x-www-form-urlencoded编码类型是远远不能满足要求的。于是，我们提出了一
种新的媒体类型：multipart/form-data，用来作为将填写好的表单内容从客户端传回到主机
端的高效方式。

3.3 multipart/form-data的使用

第7节里面对multipart/form-data做出了具体的定义。最极端的情况是选择中不包括任何数
据。（这种选择在某些情况下是非常可能的。）作为数据流的一部分，表单中的每一项内容
都按照它们在表单中出现的顺序被依次发送。每一部分由它们在HTML表单中INPUT标记
的名字所标识。如果该部分内容的类型是已知的，就用相应的媒体内容进行标识（举例来说，
可以从文件的扩展名或者从操作系统的相关类型信息中得知），否则的话，就标识为
application/octet-stream。

如果有多个文件被选中上传，它们必须按照multipart/mixed格式进行传输。

虽然HTTP协议能够传送任意形式的二进制数据，邮件传送（举例来说，如果表单的ACTION
是mailto的形式）的默认方式是7位编码。但是如果传送的内容和默认的编码方式不兼容
的话，所传送的内容将需要进行编码，并且加上一个"content-transfer-encoding"标识头。
（此方面详细内容可参看RFC 1521第5节）。

上传文件的原始文件名也应该一道被传送，或者是作为filename参数，或者是
'content-disposition: form-data'的标题头，如果传送的是多个文件的话，也可以是子内容中
的'content-disposition:file'的标题头。客户端应用程序应该尽量提供文件名。如果客户端操
作系统上的文件名包含有非US-ASCII字符，文件名可以用类似的字符或者是按照RFC1522
中描述的方法进行编码。这在某些情况下有其便利之处，比如说上传的文件中可能包含互相
关联的关系，例如一个TeX文件可能会有一个后缀为.sty的附加类型描述文件。

在服务器端，ACTION可能是指向一个HTTP地址，借助CGI来完成表单的处理程序。在
这种情况下，CGI程序将会注意到内容类型是multipart/form-data，并采取措施来处理不同
的字段（校验合法性，按照处理顺序将文件写入磁盘等等）

3.4其他属性的解释

<INPUT TYPE=file>标记可以有一个VALUE属性来指定默认的文件名。这有可能会影响到
平台无关性，但也可能非常有用。举例来说在某些有多个提交过程的操作中，可以避免让用
户不停的选择同样的文件名。

可以用“SIZE=宽，高”来指定SIZE属性。宽度默认为文件名的宽度，而高度是所选择的
文件列表的显示区域大小。举例来说，对那些希望在浏览器中实现上传多个文件，并且显示
多行的文件输入框（当然，旁边还有一个Browse按钮）的人来说，这点非常有用。当没有
指定高度值时，将只会显示一个单行的文件输入框（如果表单设计者只希望上传一个文件的
话），而如果高度值大于1的话，将显示带有滚动条的多行输入框（如果表单设计者希望
上传多个文件的话）。

4.向后兼容性的考虑

尽管对于现有的WWW表单机制来说，一个成功的改进方案不一定要考虑这点，但是考虑
一种迁移的策略也是有帮助的：对于那些使用比较老版本的浏览器的用户来说，借助于一个
附加程序，他们也能够进行文件上传。现有的绝大部分浏览器在碰到<INPUT TYPE=FILE>
时，会将它按照<INPUT TYPE=TEXT>对待，并给用户一个文本输入框。用户能在这个框
里面输入文件名。此外，似乎现有的浏览器都忽略了表单元素中的ENCTYPE参数，并按
照application/x-www-form-urlencoded传送表单数据。

这样的话，当服务器端的CGI处理传送回来的表单数据时，如果数据类型是
application/x-www-form-urlencoded，而不是multipart/form-data，就可以知道用户使用的
浏览器没有实现文件上传。

在这种情况下，服务器端的CGI不会返回一个“text/html”响应，而是返回一个数据流以
便附加程序能够处理；这个数据流可能被标识为"application/x-please-send-files"，并包含
以下内容：

?	表单数据实际需要被传送至的（标准）URL地址（以CRLF结尾）
?	应该包含文件内容的字段名字列表（用空格间隔开，以CRLF结尾）
?	客户端传至服务器端的application/x-www-form-urlencoded表单数据

这时候，浏览器需要被设置以便能启动一个附加程序来处理application/x-please-send-files
请求。

附加程序能够处理表单数据，并且注意到那些包含有“本地文件名”、需要用实际的文件内
容替代的字段。它可能会需要提示用户来改变或增加文件列表，然后重新将数据和文件内容
打包成multipart/form-data，并再次传回给服务器。

附加程序能够象那些新版本的浏览器实际处理数据那样处理表单，并按照原始的ACTION
指定的URL地址将数据发送。这样处理的好处是服务器端可以使用“同样的”CGI来处理
老版本及新版本的浏览器。

附加程序不需要显示表单数据，但是“需要”确保用户能够得知传送的文件是恰当的。（这
是为了避免那些不怀好意的服务器要求传送用户本来没有要求传送的文件而可能带来的安
全问题。）如果能够显示当前正在传送的文件状态，将非常有帮助。

5.其他的考虑

5.1压缩，加密

本方案并没有考虑可能存在的文件压缩。经过一定的考虑，我们发现如果要让浏览器自己来
决定那些文件需要被压缩的话，对文件压缩进行优化的讨论将变得非常复杂。许多连接层的
传输协议（比如说高速调制解调器）在连接层对数据进行压缩，如果在这一层上对压缩进行
优化可能不是非常恰当。如果确实希望如此的话，可以让浏览器选择是否对文件内容进行
content-transfer-encoding的x-compress压缩，并且在服务器端处理数据前进行数据解压
缩。但这将不在该方案中进行讨论。

同样，本方案也没有包括对数据进行加密的机制。这应该由其他的数据保密传输协议进行处
理，或者是保密HTTP（HTTPs），或者是电子邮件。

5.2文件传输延迟

在某些情况下，在确实准备接受数据前，服务器先对表单数据中的某些元素（比如说用户名，
账号等）进行验证是推荐的做法。但是，经过一定的考虑后，我们认为如果服务器想这样做
的话，最好是采用一系列的表单，并将前面所验证过的数据元素作为“隐藏”字段传回给客
户端，或者是通过安排表单使那些需要验证的元素先显示出来。这样的话，那些需要做复杂
的应用的服务器可以自己维持事务处理的状态，而那些简单的应用的则可以实现得简单些。

HTTP协议可能需要知道整个事务处理中的内容总长度。即使没有明确要求，HTTP客户端
也应该提供上传的所有文件的内容总长度，这样一个繁忙的服务器就能够判断文件的内容是
否是过大以至于将不能完整地处理，从而返回一个错误代码并关闭该连接，而不用等到接受
了所有的数据才进行判断。目前一些现有的CGI应用对所有的POST事务都需要知道内容
总长度。

如果INPUT标记含有一个MAXLENGTH属性，客户端可以将这个属性值看作是服务器端
所能够接受的传送文件的最大字节数。在这种情况下，服务器能够在上传开始前，提示客户
端在服务器上有多少空间可以用来进行文件上传。但是应该引起注意的是，这仅仅是一个提
示，在表单被创建后和文件上传前，服务器的实际需求可能会发生改变。

在任何情况下，如果接受的文件过大的话，任何一个HTTP服务器都有可能在文件传输的
过程中中断传输。

5.3传输二进制数据的其他解决办法

有些人曾经建议使用一种新的MIME类型"aggregate"，比如说aggregate/mixed 或是
content-transfer-encoding "包"来描述那些不确定长度的二进制数据，而不是靠分解为多个
部分来表示。虽然我们并不反对这么做，但这需要增加额外的设计和标准化工作来让大家接
受并理解"aggregate"。 从另一方面来说，"分解为多部分"的机制工作得很好，能够非常简
单的在客户发送端和服务器接受端加以实现，而且能像其他一些综合处理二进制数据的方式
一样高效率地工作。

5.4 不修改<INPUT>

有些人曾经提到过，为什么要修改INPUT来实现文件上传功能，而不是为表单元素提供一
个完全不同的类型？在这种种考虑中，当我们使用<INPUT>时，最重要的考虑是兼容策略。
事实上，<INPUT>标记"早就已经"被修改过以用来包含各种输入的数据，相比较于创造不同
种类的<INPUT>标记，对<INPUT>进行加强看起来是更为合理的办法。INPUT的“类型”
并不是它所返回的内容类型，而更象是“多类型”的，也就是说，它表示了和用户互动的方
式。它的定义被仔细地斟酌以便其既能在文本浏览器，也能在声音标记中使用。

5.5字段内容的默认类型

HTML中许多字段都需要用户进行输入。过去人们对这些表单数据应该如何传回到服务器有
些意见分歧。但是将这些INPUT字段的内容看成是纯文本很明显将有助于消除这方面的分
歧。客户端再将这些数据传回到服务器以前应该将它们用CRLF分隔开，并进行适当的编
码。

5.6允许ACTION指向"mailto:"

虽然和本方案无关，但是如果允许客户端的表单的ACTION指向一个"mailto:"地址将肯定非
常有用。不管本方案本身怎么设想，这都是一个好主意。同样的，那些用来接受邮件的表单
的ACTION也应该默认指向"reply-to:"。这两个设想有助于让HTML表单借助于HTTP服务
器工作，但通过电子邮件发送内容。或者也可以这么做：允许HTML表单能够被电子邮件
发送，当HTML中指明的邮件收件人填写完表单后，再将结果发送作为邮件传送回来。

5.7第三方传输的远程文件
在某些情况下，那些操作客户端软件的用户可能希望通过指定一个URL地址来传送位于网
上，而不是本地的数据文件。在这种情况下，浏览器能够发送给客户一个指向远程数据的连
接，而不是实际的所有内容吗？这种要求实际上是可以办得到的，举例来说，只要让客户在
发送给服务器的数据当中，用"message/external-body"来指明数据的类型，同时将
"access-type"设置为连接的地址，并在发送的内容中包含远程数据的URL地址就可以了。

5.8用ENCTYPE=x-www-form-urlencoded来传输文件

如果一个表单包含了<INPUT TYPE=file>元素，但是表单本身未包含ENCTYPE属性，也
就是没有详细说明相应的行为的话。这将可能导致为服务器进行不恰当的对大量数据进行
URN编码，而这将是服务器端所不希望看到的

5.9将CRLF作为行分隔符

象所有的MIME传输一样，在用POST方式传送表单内容的时候，CRLF都被用作行的分
隔符。

5.10和multipart/related的关系
MIMESGML小组正在考虑制订一种新的类型，称为multipart/related。它包含和
multipart/form-data类似的特点。Form-data的使用和应用却是完全不同的，所以它被单独
进行描述。

在某些情况下，有可能将HTML表单的内容（包括文件）作为multipart/related进行编码，
但这和本方案所讨论的情况有很大的不同。

5.11含有非ASCII码的字段名

需要注意的是MIME的标题头通常是由7位的US-ASCII字符集构成。所以如果字段名的字
符不属于该字符集的话，就必须按照RFC 1522里面所提到的方法进行编码。在HTML 2.0
里面，默认的字符集是ISO-8859-1，而由非ASCII码字符组成的字段名就必须进行编码。

6.例子

假设服务器段提供的是如下的HTML：

     <FORM ACTION="http://server.dom/cgi/handle"
           ENCTYPE="multipart/form-data"
           METHOD=POST>
     What is your name? <INPUT TYPE=TEXT NAME=submitter>
     What files are you sending? <INPUT TYPE=FILE NAME=pics>
     </FORM>

用户在“姓名”字段里面填写"Joe Blow"，对问题'What files are you sending?'，用户选择
了一个文本文件"file1.txt"。

客户段可能发送回如下的数据：

        Content-type: multipart/form-data, boundary=AaB03x

        --AaB03x
        content-disposition: form-data; name="field1"

        Joe Blow
        --AaB03x
        content-disposition: form-data; name="pics"; filename="file1.txt"
        Content-Type: text/plain

         ... file1.txt 的内容...
        --AaB03x--

如果用户同时还选择了另一个图片文件"file2.gif"，那么客户端可能发送的数据将是：

        Content-type: multipart/form-data, boundary=AaB03x

        --AaB03x
        content-disposition: form-data; name="field1"

        Joe Blow
        --AaB03x
        content-disposition: form-data; name="pics"
        Content-type: multipart/mixed, boundary=BbC04y

        --BbC04y
        Content-disposition: attachment; filename="file1.txt"

        Content-Type: text/plain

        ... file1.txt 的内容...
        --BbC04y
        Content-disposition: attachment; filename="file2.gif"
        Content-type: image/gif
        Content-Transfer-Encoding: binary

          ... file2.gif的内容...
        --BbC04y--
        --AaB03x--

7. multipart/form-data的登记
multipart/form-data的媒体内容遵从RFC 1521所规定的多部分的数据流规则。它主要被用
来描述表单填写后返回的数据。在一个表单中（这里指的是HTML，当然其他一些应用也可
能使用表单），有一系列字段提供给用户进行填写，每个字段都有自己的名字。在一个确定
的表单中，每个名字都是唯一的。

multipart/form-data由多个部分组成，每一部分都有一个content-disposition标题头，它的
值是"form-data"，它的属性指明了其在表单内的字段名。举例来说，'content-disposition: 
form-data; name="xxxxx"'，这里的xxxxx就是对应于该字段的字段名。如果字段名包含非
ASCII码字符的话，还应该按照RFC 1522里面所规定的方法进行编码。

对所有的多部分MIME类型来说，每一部分有一个可选的Content-Type，默认的值是
text/plain。如果文件的内容是通过表单填写上传返回的话，那么输入的文件就被定义为
application/octet-stream，或者，如果知道是什么类型的话，就定义为相应的媒体类型。如
果一个表单返回多个文件，那么它们就作为multipart/form-data中所结合的multipart/mixed
被返回。

如果所传送的内容不符合默认的编码方式的话，该部分都将被编码，并加上
"content-transfer-encoding"的标题头。

上传的文件也可能被指定文件名，文件名可以由标题头"content-disposition"中的filename
参数所指定。虽然这并不是必需的，但我们强烈建议在能够得知原始文件名的情况下这么做。
对于很多应用程序来说，这都是必需的或者是有用的。

8.安全性考虑

如果用户没有明确要求发送某个文件，用户端就不应该发送该文件，这点非常重要。所以，
在碰到<INPUT TYPE=file VALUE="yyyy">的标记的时候，HTML解释器应该能够让用户确
认默认的文件名。不要使用隐含的字段来指定任何文件。

本方案并没有包括对数据加密的讨论；这应该是保密数据传输协议，或者是加密HTTP，或
者是MOSS所提供的加密协议（在RFC 1848中有具体的描述）所讨论的问题。

一旦文件上传成功，就将取决于文件接受方来处理文件或者是储存在适当的地方。

9.结论

我们所建议的应用让客户端有很大的弹性来决定它发送到服务器的文件的类型和数量，也让
服务器端有权决定是否接受上传的文件，同时也让服务器有机会和那些不支持类型为file的
INPUT的浏览器进行互动。

对HTML DTD的改动虽然很简单，但却有很大的作用。能够让目前这种缺少文件上传机制
的万维网实现很多种服务。这将给万维网实际的性能增加许多惊人的价值。

作者地址：

   Larry Masinter
   Xerox Palo Alto Research Center
   3333 Coyote Hill Road
   Palo Alto, CA 94304

   Phone:  (415) 812-4365
   Fax:    (415) 812-4333
   EMail:   masinter@parc.xerox.com

   Ernesto Nebel
   XSoft, Xerox Corporation
   10875 Rancho Bernardo Road, Suite 200
   San Diego, CA 92127-2116

   Phone:  (619) 676-7817
   Fax:    (619) 676-7865
   EMail:   nebel@xsoft.sd.xerox.com

A.为multipart/form-data登记的媒体类型
媒体类型名称:
 multipart

子类型名称:
 form-data

必需的参数:
 无

可选参数:
 无

编码考虑:
和其他类型相比没有额外的考虑。

发行的规范:
 RFC 1867

安全性考虑

multipart/form-data并未引进新的安全性考虑来针对那些可能存在所附的内容中的问题。

参考：

[RFC 1521] MIME (多用途的网际邮件扩充协议) 第一部分:
	  网上邮件内容格式的确定和规范机制
	  N. Borenstein & N. Freed.
           1993年9月.

[RFC 1522] MIME (多用途的网际邮件扩充协议) 第二部分:
	  非ASCII码文本的邮件头扩充
           K. Moore.
           1993年9月.

[RFC 1806] 英特网上的信息通讯和表达
           信息: Content-Disposition标题头. 
	  R. Troost & S. Dorner, 
	  1995年6月.
RFC 1867 Form-based File Upload in HTML	HTML中基于表单的文件上传
RFC文档中文翻译计划

组织：中国互动出版网（http://www.china-pub.com/） RFC文档中文翻译计划（http://www.china-pub.com/compters/emook/aboutemook.htm） E-mail：ouyang@china-pub.com 译者：黄俊（hujiao hj_chinese@yahoo.com） 译文发布时间：2001-4-26 版权：本中文翻译文档版权归中国互动出版网所有。可以用于非商业用途自由转载，但必须 保留本文档的翻译及版权信息。		1

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Network Working Group                                           E. Nebel
Request For Comments: 1867                                   L. Masinter
Category: Experimental                                 Xerox Corporation
                                                           November 1995


                     Form-based File Upload in HTML

Status of this Memo

   This memo defines an Experimental Protocol for the Internet
   community.  This memo does not specify an Internet standard of any
   kind.  Discussion and suggestions for improvement are requested.
   Distribution of this memo is unlimited.

1. Abstract

   Currently, HTML forms allow the producer of the form to request
   information from the user reading the form.  These forms have proven
   useful in a wide variety of applications in which input from the user
   is necessary.  However, this capability is limited because HTML forms
   don't provide a way to ask the user to submit files of data.  Service
   providers who need to get files from the user have had to implement
   custom user applications.  (Examples of these custom browsers have
   appeared on the www-talk mailing list.)  Since file-upload is a
   feature that will benefit many applications, this proposes an
   extension to HTML to allow information providers to express file
   upload requests uniformly, and a MIME compatible representation for
   file upload responses.  This also includes a description of a
   backward compatibility strategy that allows new servers to interact
   with the current HTML user agents.

   The proposal is independent of which version of HTML it becomes a
   part.

2.  HTML forms with file submission

   The current HTML specification defines eight possible values for the
   attribute TYPE of an INPUT element: CHECKBOX, HIDDEN, IMAGE,
   PASSWORD, RADIO, RESET, SUBMIT, TEXT.

   In addition, it defines the default ENCTYPE attribute of the FORM
   element using the POST METHOD to have the default value
   "application/x-www-form-urlencoded".







Nebel & Masinter              Experimental                      [Page 1]

RFC 1867             Form-based File Upload in HTML        November 1995


   This proposal makes two changes to HTML:

   1) Add a FILE option for the TYPE attribute of INPUT.
   2) Allow an ACCEPT attribute for INPUT tag, which is a list of
      media types or type patterns allowed for the input.

   In addition, it defines a new MIME media type, multipart/form-data,
   and specifies the behavior of HTML user agents when interpreting a
   form with ENCTYPE="multipart/form-data" and/or <INPUT type="file">
   tags.

   These changes might be considered independently, but are all
   necessary for reasonable file upload.

   The author of an HTML form who wants to request one or more files
   from a user would write (for example):

    <FORM ENCTYPE="multipart/form-data" ACTION="_URL_" METHOD=POST>

    File to process: <INPUT NAME="userfile1" TYPE="file">

    <INPUT TYPE="submit" VALUE="Send File">

    </FORM>

   The change to the HTML DTD is to add one item to the entity
   "InputType". In addition, it is proposed that the INPUT tag have an
   ACCEPT attribute, which is a list of comma-separated media types.

  ... (other elements) ...

  <!ENTITY % InputType "(TEXT | PASSWORD | CHECKBOX |
                         RADIO | SUBMIT | RESET |
                         IMAGE | HIDDEN | FILE )">
  <!ELEMENT INPUT - 0 EMPTY>
  <!ATTLIST INPUT
          TYPE %InputType TEXT
          NAME CDATA #IMPLIED  -- required for all but submit and reset
          VALUE CDATA #IMPLIED
          SRC %URI #IMPLIED  -- for image inputs --
          CHECKED (CHECKED) #IMPLIED
          SIZE CDATA #IMPLIED  --like NUMBERS,
                                  but delimited with comma, not space
          MAXLENGTH NUMBER #IMPLIED
          ALIGN (top|middle|bottom) #IMPLIED
          ACCEPT CDATA #IMPLIED --list of content types
          >




Nebel & Masinter              Experimental                      [Page 2]

RFC 1867             Form-based File Upload in HTML        November 1995


  ... (other elements) ...

3.  Suggested implementation

   While user agents that interpret HTML have wide leeway to choose the
   most appropriate mechanism for their context, this section suggests
   how one class of user agent, WWW browsers, might implement file
   upload.

3.1 Display of FILE widget

   When a INPUT tag of type FILE is encountered, the browser might show
   a display of (previously selected) file names, and a "Browse" button
   or selection method. Selecting the "Browse" button would cause the
   browser to enter into a file selection mode appropriate for the
   platform. Window-based browsers might pop up a file selection window,
   for example. In such a file selection dialog, the user would have the
   option of replacing a current selection, adding a new file selection,
   etc. Browser implementors might choose let the list of file names be
   manually edited.

   If an ACCEPT attribute is present, the browser might constrain the
   file patterns prompted for to match those with the corresponding
   appropriate file extensions for the platform.

3.2 Action on submit

   When the user completes the form, and selects the SUBMIT element, the
   browser should send the form data and the content of the selected
   files.  The encoding type application/x-www-form-urlencoded is
   inefficient for sending large quantities of binary data or text
   containing non-ASCII characters.  Thus, a new media type,
   multipart/form-data, is proposed as a way of efficiently sending the
   values associated with a filled-out form from client to server.

3.3 use of multipart/form-data

   The definition of multipart/form-data is included in section 7.  A
   boundary is selected that does not occur in any of the data. (This
   selection is sometimes done probabilisticly.) Each field of the form
   is sent, in the order in which it occurs in the form, as a part of
   the multipart stream.  Each part identifies the INPUT name within the
   original HTML form. Each part should be labelled with an appropriate
   content-type if the media type is known (e.g., inferred from the file
   extension or operating system typing information) or as
   application/octet-stream.





Nebel & Masinter              Experimental                      [Page 3]

RFC 1867             Form-based File Upload in HTML        November 1995


   If multiple files are selected, they should be transferred together
   using the multipart/mixed format.

   While the HTTP protocol can transport arbitrary BINARY data, the
   default for mail transport (e.g., if the ACTION is a "mailto:" URL)
   is the 7BIT encoding.  The value supplied for a part may need to be
   encoded and the "content-transfer-encoding" header supplied if the
   value does not conform to the default encoding.  [See section 5 of
   RFC 1521 for more details.]

   The original local file name may be supplied as well, either as a
   'filename' parameter either of the 'content-disposition: form-data'
   header or in the case of multiple files in a 'content-disposition:
   file' header of the subpart. The client application should make best
   effort to supply the file name; if the file name of the client's
   operating system is not in US-ASCII, the file name might be
   approximated or encoded using the method of RFC 1522.  This is a
   convenience for those cases where, for example, the uploaded files
   might contain references to each other, e.g., a TeX file and its .sty
   auxiliary style description.

   On the server end, the ACTION might point to a HTTP URL that
   implements the forms action via CGI. In such a case, the CGI program
   would note that the content-type is multipart/form-data, parse the
   various fields (checking for validity, writing the file data to local
   files for subsequent processing, etc.).

3.4 Interpretation of other attributes

   The VALUE attribute might be used with <INPUT TYPE=file> tags for a
   default file name. This use is probably platform dependent.  It might
   be useful, however, in sequences of more than one transaction, e.g.,
   to avoid having the user prompted for the same file name over and
   over again.

   The SIZE attribute might be specified using SIZE=width,height, where
   width is some default for file name width, while height is the
   expected size showing the list of selected files.  For example, this
   would be useful for forms designers who expect to get several files
   and who would like to show a multiline file input field in the
   browser (with a "browse" button beside it, hopefully).  It would be
   useful to show a one line text field when no height is specified
   (when the forms designer expects one file, only) and to show a
   multiline text area with scrollbars when the height is greater than 1
   (when the forms designer expects multiple files).






Nebel & Masinter              Experimental                      [Page 4]

RFC 1867             Form-based File Upload in HTML        November 1995


4.  Backward compatibility issues

   While not necessary for successful adoption of an enhancement to the
   current WWW form mechanism, it is useful to also plan for a migration
   strategy: users with older browsers can still participate in file
   upload dialogs, using a helper application. Most current web browers,
   when given <INPUT TYPE=FILE>, will treat it as <INPUT TYPE=TEXT> and
   give the user a text box. The user can type in a file name into this
   text box. In addition, current browsers seem to ignore the ENCTYPE
   parameter in the <FORM> element, and always transmit the data as
   application/x-www-form-urlencoded.

   Thus, the server CGI might be written in a way that would note that
   the form data returned had content-type application/x-www-form-
   urlencoded instead of multipart/form-data, and know that the user was
   using a browser that didn't implement file upload.

   In this case, rather than replying with a "text/html" response, the
   CGI on the server could instead send back a data stream that a helper
   application might process instead; this would be a data stream of
   type "application/x-please-send-files", which contains:

   * The (fully qualified) URL to which the actual form data should
     be posted (terminated with CRLF)
   * The list of field names that were supposed to be file contents
     (space separated, terminated with CRLF)
   * The entire original application/x-www-form-urlencoded form data
     as originally sent from client to server.

   In this case, the browser needs to be configured to process
   application/x-please-send-files to launch a helper application.

   The helper would read the form data, note which fields contained
   'local file names' that needed to be replaced with their data
   content, might itself prompt the user for changing or adding to the
   list of files available, and then repackage the data & file contents
   in multipart/form-data for retransmission back to the server.

   The helper would generate the kind of data that a 'new' browser
   should actually have sent in the first place, with the intention that
   the URL to which it is sent corresponds to the original ACTION URL.
   The point of this is that the server can use the *same* CGI to
   implement the mechanism for dealing with both old and new browsers.

   The helper need not display the form data, but *should* ensure that
   the user actually be prompted about the suitability of sending the
   files requested (this is to avoid a security problem with malicious
   servers that ask for files that weren't actually promised by the



Nebel & Masinter              Experimental                      [Page 5]

RFC 1867             Form-based File Upload in HTML        November 1995


   user.) It would be useful if the status of the transfer of the files
   involved could be displayed.

5.  Other considerations

5.1 Compression, encryption

   This scheme doesn't address the possible compression of files.  After
   some consideration, it seemed that the optimization issues of file
   compression were too complex to try to automatically have browsers
   decide that files should be compressed.  Many link-layer transport
   mechanisms (e.g., high-speed modems) perform data compression over
   the link, and optimizing for compression at this layer might not be
   appropriate. It might be possible for browsers to optionally produce
   a content-transfer-encoding of x-compress for file data, and for
   servers to decompress the data before processing, if desired; this
   was left out of the proposal, however.

   Similarly, the proposal does not contain a mechanism for encryption
   of the data; this should be handled by whatever other mechanisms are
   in place for secure transmission of data, whether via secure HTTP or
   mail.

5.2 Deferred file transmission

   In some situations, it might be advisable to have the server validate
   various elements of the form data (user name, account, etc.)  before
   actually preparing to receive the data.  However, after some
   consideration, it seemed best to require that servers that wish to do
   this should implement this as a series of forms, where some of the
   data elements that were previously validated might be sent back to
   the client as 'hidden' fields, or by arranging the form so that the
   elements that need validation occur first.  This puts the onus of
   maintaining the state of a transaction only on those servers that
   wish to build a complex application, while allowing those cases that
   have simple input needs to be built simply.

   The HTTP protocol may require a content-length for the overall
   transmission. Even if it were not to do so, HTTP clients are
   encouraged to supply content-length for overall file input so that a
   busy server could detect if the proposed file data is too large to be
   processed reasonably and just return an error code and close the
   connection without waiting to process all of the incoming data.  Some
   current implementations of CGI require a content-length in all POST
   transactions.

   If the INPUT tag includes the attribute MAXLENGTH, the user agent
   should consider its value to represent the maximum Content-Length (in



Nebel & Masinter              Experimental                      [Page 6]

RFC 1867             Form-based File Upload in HTML        November 1995


   bytes) which the server will accept for transferred files.  In this
   way, servers can hint to the client how much space they have
   available for a file upload, before that upload takes place.  It is
   important to note, however, that this is only a hint, and the actual
   requirements of the server may change between form creation and file
   submission.

   In any case, a HTTP server may abort a file upload in the middle of
   the transaction if the file being received is too large.

5.3 Other choices for return transmission of binary data

   Various people have suggested using new mime top-level type
   "aggregate", e.g., aggregate/mixed or a content-transfer-encoding of
   "packet" to express indeterminate-length binary data, rather than
   relying on the multipart-style boundaries.  While we are not opposed
   to doing so, this would require additional design and standardization
   work to get acceptance of "aggregate".  On the other hand, the
   'multipart' mechanisms are well established, simple to implement on
   both the sending client and receiving server, and as efficient as
   other methods of dealing with multiple combinations of binary data.

5.4 Not overloading <INPUT>:

   Various people have wondered about the advisability of overloading
   'INPUT' for this function, rather than merely providing a different
   type of FORM element.  Among other considerations, the migration
   strategy which is allowed when using <INPUT> is important.  In
   addition, the <INPUT> field *is* already overloaded to contain most
   kinds of data input; rather than creating multiple kinds of <INPUT>
   tags, it seems most reasonable to enhance <INPUT>.  The 'type' of
   INPUT is not the content-type of what is returned, but rather the
   'widget-type'; i.e., it identifies the interaction style with the
   user.  The description here is carefully written to allow <INPUT
   TYPE=FILE> to work for text browsers or audio-markup.

5.5 Default content-type of field data

   Many input fields in HTML are to be typed in. There has been some
   ambiguity as to how form data should be transmitted back to servers.
   Making the content-type of <INPUT> fields be text/plain clearly
   disambiguates that the client should properly encode the data before
   sending it back to the server with CRLFs.

5.6 Allow form ACTION to be "mailto:"

   Independent of this proposal, it would be very useful for HTML
   interpreting user agents to allow a ACTION in a form to be a



Nebel & Masinter              Experimental                      [Page 7]

RFC 1867             Form-based File Upload in HTML        November 1995


   "mailto:" URL. This seems like a good idea, with or without this
   proposal. Similarly, the ACTION for a HTML form which is received via
   mail should probably default to the "reply-to:" of the message.
   These two proposals would allow HTML forms to be served via HTTP
   servers but sent back via mail, or, alternatively, allow HTML forms
   to be sent by mail, filled out by HTML-aware mail recipients, and the
   results mailed back.

5.7 Remote files with third-party transfer

   In some scenarios, the user operating the client software might want
   to specify a URL for remote data rather than a local file. In this
   case, is there a way to allow the browser to send to the client a
   pointer to the external data rather than the entire contents? This
   capability could be implemented, for example, by having the client
   send to the server data of type "message/external-body" with
   "access-type" set to, say, "uri", and the URL of the remote data in
   the body of the message.

5.8 File transfer with ENCTYPE=x-www-form-urlencoded

   If a form contains <INPUT TYPE=file> elements but does not contain an
   ENCTYPE in the enclosing <FORM>, the behavior is not specified.  It
   is probably inappropriate to attempt to URN-encode large quantities
   of data to servers that don't expect it.

5.9 CRLF used as line separator

   As with all MIME transmissions, CRLF is used as the separator for
   lines in a POST of the data in multipart/form-data.

5.10 Relationship to multipart/related

   The MIMESGML group is proposing a new type called multipart/related.
   While it contains similar features to multipart/form-data, the use
   and application of form-data is different enough that form-data is
   being described separately.

   It might be possible at some point to encode the result of HTML forms
   (including files) in a multipart/related body part; this is not
   incompatible with this proposal.

5.11 Non-ASCII field names

   Note that mime headers are generally required to consist only of 7-
   bit data in the US-ASCII character set. Hence field names should be
   encoded according to the prescriptions of RFC 1522 if they contain
   characters outside of that set. In HTML 2.0, the default character



Nebel & Masinter              Experimental                      [Page 8]

RFC 1867             Form-based File Upload in HTML        November 1995


   set is ISO-8859-1, but non-ASCII characters in field names should be
   encoded.

6. Examples

   Suppose the server supplies the following HTML:

     <FORM ACTION="http://server.dom/cgi/handle"
           ENCTYPE="multipart/form-data"
           METHOD=POST>
     What is your name? <INPUT TYPE=TEXT NAME=submitter>
     What files are you sending? <INPUT TYPE=FILE NAME=pics>
     </FORM>

   and the user types "Joe Blow" in the name field, and selects a text
   file "file1.txt" for the answer to 'What files are you sending?'

   The client might send back the following data:

        Content-type: multipart/form-data, boundary=AaB03x

        --AaB03x
        content-disposition: form-data; name="field1"

        Joe Blow
        --AaB03x
        content-disposition: form-data; name="pics"; filename="file1.txt"
        Content-Type: text/plain

         ... contents of file1.txt ...
        --AaB03x--

   If the user also indicated an image file "file2.gif" for the answer
   to 'What files are you sending?', the client might client might send
   back the following data:

        Content-type: multipart/form-data, boundary=AaB03x

        --AaB03x
        content-disposition: form-data; name="field1"

        Joe Blow
        --AaB03x
        content-disposition: form-data; name="pics"
        Content-type: multipart/mixed, boundary=BbC04y

        --BbC04y
        Content-disposition: attachment; filename="file1.txt"



Nebel & Masinter              Experimental                      [Page 9]

RFC 1867             Form-based File Upload in HTML        November 1995


        Content-Type: text/plain

        ... contents of file1.txt ...
        --BbC04y
        Content-disposition: attachment; filename="file2.gif"
        Content-type: image/gif
        Content-Transfer-Encoding: binary

          ...contents of file2.gif...
        --BbC04y--
        --AaB03x--

7. Registration of multipart/form-data

   The media-type multipart/form-data follows the rules of all multipart
   MIME data streams as outlined in RFC 1521. It is intended for use in
   returning the data that comes about from filling out a form. In a
   form (in HTML, although other applications may also use forms), there
   are a series of fields to be supplied by the user who fills out the
   form. Each field has a name. Within a given form, the names are
   unique.

   multipart/form-data contains a series of parts. Each part is expected
   to contain a content-disposition header where the value is "form-
   data" and a name attribute specifies the field name within the form,
   e.g., 'content-disposition: form-data; name="xxxxx"', where xxxxx is
   the field name corresponding to that field. Field names originally in
   non-ASCII character sets may be encoded using the method outlined in
   RFC 1522.

   As with all multipart MIME types, each part has an optional Content-
   Type which defaults to text/plain.  If the contents of a file are
   returned via filling out a form, then the file input is identified as
   application/octet-stream or the appropriate media type, if known.  If
   multiple files are to be returned as the result of a single form
   entry, they can be returned as multipart/mixed embedded within the
   multipart/form-data.

   Each part may be encoded and the "content-transfer-encoding" header
   supplied if the value of that part does not conform to the default
   encoding.

   File inputs may also identify the file name. The file name may be
   described using the 'filename' parameter of the "content-disposition"
   header. This is not required, but is strongly recommended in any case
   where the original filename is known. This is useful or necessary in
   many applications.




Nebel & Masinter              Experimental                     [Page 10]

RFC 1867             Form-based File Upload in HTML        November 1995


8. Security Considerations

   It is important that a user agent not send any file that the user has
   not explicitly asked to be sent. Thus, HTML interpreting agents are
   expected to confirm any default file names that might be suggested
   with <INPUT TYPE=file VALUE="yyyy">.  Never have any hidden fields be
   able to specify any file.

   This proposal does not contain a mechanism for encryption of the
   data; this should be handled by whatever other mechanisms are in
   place for secure transmission of data, whether via secure HTTP, or by
   security provided by MOSS (described in RFC 1848).

   Once the file is uploaded, it is up to the receiver to process and
   store the file appropriately.

9.  Conclusion

   The suggested implementation gives the client a lot of flexibility in
   the number and types of files it can send to the server, it gives the
   server control of the decision to accept the files, and it gives
   servers a chance to interact with browsers which do not support INPUT
   TYPE "file".

   The change to the HTML DTD is very simple, but very powerful.  It
   enables a much greater variety of services to be implemented via the
   World-Wide Web than is currently possible due to the lack of a file
   submission facility.  This would be an extremely valuable addition to
   the capabilities of the World-Wide Web.






















Nebel & Masinter              Experimental                     [Page 11]

RFC 1867             Form-based File Upload in HTML        November 1995


Authors' Addresses

   Larry Masinter
   Xerox Palo Alto Research Center
   3333 Coyote Hill Road
   Palo Alto, CA 94304

   Phone:  (415) 812-4365
   Fax:    (415) 812-4333
   EMail:   masinter@parc.xerox.com


   Ernesto Nebel
   XSoft, Xerox Corporation
   10875 Rancho Bernardo Road, Suite 200
   San Diego, CA 92127-2116

   Phone:  (619) 676-7817
   Fax:    (619) 676-7865
   EMail:   nebel@xsoft.sd.xerox.com




Nebel & Masinter              Experimental                     [Page 12]

RFC 1867             Form-based File Upload in HTML        November 1995


A. Media type registration for multipart/form-data

Media Type name:
 multipart

Media subtype name:
 form-data

Required parameters:
 none

Optional parameters:
 none

Encoding considerations:
 No additional considerations other than as for other multipart types.

Published specification:
 RFC 1867

Security Considerations

  The multipart/form-data type introduces no new security
  considerations beyond what might occur with any of the enclosed
  parts.

References

[RFC 1521] MIME (Multipurpose Internet Mail Extensions) Part One:
           Mechanisms for Specifying and Describing the Format of
           Internet Message Bodies.  N. Borenstein & N. Freed.
           September 1993.

[RFC 1522] MIME (Multipurpose Internet Mail Extensions) Part Two:
           Message Header Extensions for Non-ASCII Text. K. Moore.
           September 1993.

[RFC 1806] Communicating Presentation Information in Internet
           Messages: The Content-Disposition Header. R. Troost & S.
           Dorner, June 1995.




Nebel & Masinter              Experimental                     [Page 13]