# 可参考的文章有:

http://www.cnblogs.com/kaixuan/archive/2008/01/31/1060284.html

# 通过 http 协议上传文件(rfc1867协议概述，jsp 应用举例，客户端发送内容构造)

1、概述

http://www.ietf.org/rfc/rfc1867.txt
) 为 http 协议添加了这个功能。客户端的浏览器，如 Microsoft IE, Mozila, Opera 等，按照此规范将用户指定的文件发送到服务器。服务器端的网页程序，如 php, asp, jsp 等，可以按照此规范，解析出用户发送来的文件。
Microsoft IE, Mozila, Opera 已经支持此协议，在网页中使用一个特殊的 form 就可以发送文件。

2、上传文件的实例：用 servelet 实现（http server 为 tomcat 4.1.24）
1. 在一个 html 网页中，写一个如下的form ：

<input name="userfile1" type="file" ><br>

<input name="userfile2" type="file"><br>

<input name="userfile3" type="file"><br>

<input name="userfile4" type="file"><br>

text field :<input type="text" name="text" value="text"><br>

<input type="submit" value=" 提交 "><input type=reset>

</form>

2. 服务端 servelet 的编写

public void doPost( HttpServletRequest request, HttpServletResponse response ) {

//  允许文件最大长度

//  设置内存缓冲大小

//  设置临时目录

List fileItems = diskFileUpload.parseRequest( request );

Iterator iter = fileItems.iterator();

for( ; iter.hasNext(); ) {

FileItem fileItem = (FileItem) iter.next();

if( fileItem.isFormField() ) {

//  当前是一个表单项

out.println( "form field : " + fileItem.getFieldName() + ", " + fileItem.getString() );

} else {

//  当前是一个上传的文件

String fileName = fileItem.getName();

}

}

}

3、 客户端发送内容构造

a

bb

XXX

ccc

Accept: text/plain, */*

Accept-Language: zh-cn

Host: 192.168.29.65:80

# Content-Type:multipart/form-data;boundary=---------------------------7d33a816d302b6

User-Agent: Mozilla/4.0 (compatible; OpenOffice.org)

# Content-Length: 424

Connection: Keep-Alive

-----------------------------7d33a816d302b6

Content-Disposition: form-data; name="userfile1"; filename="E:/s"

Content-Type: application/octet-stream

a

bb

XXX

ccc

-----------------------------7d33a816d302b6

Content-Disposition: form-data; name="text1"

foo

-----------------------------7d33a816d302b6

bar

-----------------------------7d33a816d302b6--

Content-Type: multipart/form-data; boundary=---------------------------7d33a816d302b6

---------------------------7d33a816d302b6 是分隔符，分隔多个文件、表单项。其中 33a816d302b6 是即时生成的一个数字，用以确保整个分隔符不会在文件或表单项的内容中出现。前面 的 ---------------------------7d 是 IE 特有的标志。 Mozila 为 ---------------------------71

(上面有一个回车)

http://www.vivtek.com/rfc1867.html

RFC1867 is the standard definition of that "Browse..." button that you use to upload files to a Web server. It introduced the INPUT field type="file", which is that button, and also specified a multipart form encoding which is capable of encapsulating files for upload along with all the other fields on an upload form.

It's not easy to find documentation on how to work with this stuff, though. Partly this is because if you're writing a Perl CGI it's really rather easy to work with, and partly it's due to the fact that Microsoft IIS ASP doesn't (exactly) support RFC1867 file upload. So on the one hand the Unixheads think it's too trivial to document, while the ASP script kiddies think that file upload is the exclusive preserve of genius and guru alike. I.e. Bill doesn't think you need to use it.

If that last sounds overly bitter, it's because I just finished up a really horrible job that involved uploading files to an IIS server. It would have been nice had somebody at Microsoft found file upload a sufficiently significant function to design competently. As it is, IIS 5.0 now provides a "Request.ReadBinary" method that gives you the whole request in plaintext, and graciously allows you to design your own object to read it. Note that VBS has no (easy) ability to read this binary data.

So let's assume for the time being that you're working with some reasonable non-IIS server. How do you really deal with file upload? It turns out to be easy. First, you design your form so that it will actually do an upload. In short, do this:

<form action=/mycode.cgi method=post enctype=multipart/form-data

>
<input type="file"

>
</form>

In case you were wondering, the standard encoding type for a form is application/x-www-form-urlencoded, and if you leave the multipart enctype out of your form, then Netscape, for one, will not upload the file, it'll just include the filename. If that's what you actually want, this is pretty useful. (However, the RFC leaves behavior in this situation undefined, so you shouldn't rely on any particular behavior. I haven't looked to see what IE does in this situation. Undoubtedly something different.)

So this much information I already knew going into my horrible project, or at least knew of it. That's why I assumed that the server end was just as simple. And as I mentioned, in Perl it isn't much more difficult than retrieving normal posted data is already. It's just that IIS doesn't support multipart/form-data posts, that's all. Oh, Microsoft has a solution of sorts, called the something-or-other manager, and IIS 5.0 is so powerful that this manager thingy is now included right in the service pack with, gee, at least a kilobyte of documentation.

Yeesh. I'm off-track again, aren't I?

OK, so when this post gets to the server, what does it look like? Well, first of all the Content-type header of the request is set to
multipart/form-data; boundary=[some stuff]
This is how you can ascertain that you're really dealing with a properly encoded upload post. The boundary value is probably of the form --------------------------------1878979834, where the digits are randomly generated. This boundary is a MIME boundary; it's guaranteed not to appear anywhere in the data except between the multiple parts of the data.

The data itself appears in blocks that are made up of lines separated by CR/LF pairs. It looks like this, more or less:

-------------------------------18788734234
Content-Disposition: form-data; name="nonfile_field"
value here
-------------------------------18788734234
Content-Type: image/gif
[ooh -- file contents!]
-------------------------------18788734234--

As you can see, this post isn't from the form I listed above, because I threw in a non-upload field just to show what it looks like. Anyway, you can see where everything is. Note that you get the originating local filename of the document for free in this format, meaning that you can use this to develop a document management system. Actual implementation is left as an exercise for the reader. I'll write more later on this topic, especially if you ask me any questions. Hint, hint.

So a Perl reader for this guy is simple: you iterate on the lines of the input and break on your boundary. Do things with the parts as you find them. I have an extensive example that you can read and use, which you can see here. It works (I'm using it daily) and it's well-documented.

And thus concludes the lesson for today. Go forth and upload files.

http://tools.ietf.org/html/rfc1867

http://tools.ietf.org/html/rfc2854

http://tools.ietf.org/html/rfc2388

• 评论

1

• 上一篇
• 下一篇