How does HTTP file upload work?

转自:http://stackoverflow.com/questions/8659808/how-does-http-file-upload-work


Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

When I submit a simple form like this with a file attached:

<form enctype="multipart/form-data" action="http://localhost:3000/upload?upload_progress_id=12344" method="POST">
<input type="hidden" name="MAX_FILE_SIZE" value="100000" />
Choose a file to upload: <input name="uploadedfile" type="file" /><br />
<input type="submit" value="Upload File" />
</form>

How does it send the file internally? Is the file sent as part of the HTTP body as data? In the headers of this request, I don't see anything related to the name of the file. 

I just would like the know the internal workings of the HTTP when sending a file.

share improve this question
 
 
I have not used a sniffer in a while but if you want to see what is being sent in your request (since it is to the server it is a request) sniff it. This question is too broad. SO is more for specific programming questions. –  Blam  Dec 28 '11 at 18:39
 
...as sniffers go, fiddler is my weapon of choice. You can even build up your own test requests to see how they post. –   Phil Cooper  Jan 31 '14 at 12:04

4 Answers

up vote 75 down vote accepted

Let's take a look at what happens when you select a file and submit your form (I've truncated the headers for brevity):

POST /upload?upload_progress_id=12344 HTTP/1.1
Host: localhost:3000
Content-Length: 1325
Origin: http://localhost:3000
Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryePkpFF7tjBAqx29L
<other headers>

------WebKitFormBoundaryePkpFF7tjBAqx29L
Content-Disposition: form-data; name="MAX_FILE_SIZE"

100000
------WebKitFormBoundaryePkpFF7tjBAqx29L
Content-Disposition: form-data; name="uploadedfile"; filename="hello.o"
Content-Type: application/x-object

<file data>
------WebKitFormBoundaryePkpFF7tjBAqx29L--

Instead of URL encoding the form parameters, the form parameters (including the file data) are sent as sections in a multipart document in the body of the request.

In the example above, you can see the input MAX_FILE_SIZE with the value set in the form, as well as a section containing the file data. The file name is part of the Content-Disposition header.

The full details are here.

share improve this answer
 
 
Does this mean that port 80 (or the port serving http requests) is unusable during the time of the file transfer?. For e.g. if a huge file (about a GB) is being uploaded will the web server not be able to respond to any other requests during this time? –   source.rar  Apr 23 '14 at 16:39 
2 
@source.rar: No. Webservers are (almost?) always threaded so that they can handle concurrent connections. Essentially, the daemon process that's listening on port 80 immediately hands off the task of serving to another thread/process in order that it can return to listening for another connection; even if two incoming connections arrive at exactly the same moment, they'll just sit in the network buffer until the daemon is ready to read them. –   eggyal  Apr 30 '14 at 8:56
 
The threading explanation is a bit incorrect since there are high performance servers that are designed as single threaded and use a state machine to quickly take turns downloading packets of data from connections. Rather, in TCP/IP, port 80 is a listening port, not the port the data is transferred on. –  slebetman  Oct 13 '14 at 17:08
1 
When an IP listening socket (port 80) receives a connection another socket is created on another port, usually with a random number above 1000. This socket is then connected to the remote socket leaving port 80 free to listen for new connections. –   slebetman  Oct 13 '14 at 17:10
1 
@slebetman First of all, this is about HTTP. FTP active mode doesn't apply here. Second, listening socket doesn't get blocked on every connection. You can have as many connections to one port, as the other sides has ports to bind their own end to. –   Slotos  Nov 12 '14 at 20:58 

How does it send the file internally?

The format is called multipart/form-data, as asked at: What does enctype='multipart/form-data' mean?

Once you see some examples of it, it will be really easy to understand how it works.

You can produce examples using nc -l or an ECHO server and an user agent like a browser or cURL.

Save the form to an .html file:

<form action="http://localhost:8000" method="post" enctype="multipart/form-data">
  <p><input type="text" name="text" value="text default">
  <p><input type="file" name="file1">
  <p><input type="file" name="file2">
  <p><button type="submit">Submit</button>
</form>

Create files to upload:

echo 'Content of a.txt.' > a.txt
echo '<!DOCTYPE html><title>Content of a.html.</title>' > a.html

Run:

while true; do printf '' | nc -l localhost 8000; done

Open the HTML on your browser, select the files and click on submit and check the terminal. 

nc prints the request received. Firefox sent:

POST / HTTP/1.1
Host: localhost:8000
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:29.0) Gecko/20100101 Firefox/29.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Cookie: __atuvc=34%7C7; permanent=0; _gitlab_session=226ad8a0be43681acf38c2fab9497240; __profilin=p%3Dt; request_method=GET
Connection: keep-alive
Content-Type: multipart/form-data; boundary=---------------------------9051914041544843365972754266
Content-Length: 554

-----------------------------9051914041544843365972754266
Content-Disposition: form-data; name="text"

text default
-----------------------------9051914041544843365972754266
Content-Disposition: form-data; name="file1"; filename="a.txt"
Content-Type: text/plain

Content of a.txt.

-----------------------------9051914041544843365972754266
Content-Disposition: form-data; name="file2"; filename="a.html"
Content-Type: text/html

<!DOCTYPE html><title>Content of a.html.</title>

-----------------------------9051914041544843365972754266--

Therefore it is clear that:

  • Content-Type: multipart/form-data; boundary=---------------------------9051914041544843365972754266 sets the content type to multipart/form-data and says that the fields are separated by the given boundary string.

  • every field gets some sub headers before its data: Content-Disposition: form-data;, the field name, the filename, followed by the data.

    The server reads the data until the next boundary string. The browser must choose a boundary that will not appear in any of the fields, so this is why the boundary may vary between requests.

share improve this answer
 

An HTTP message may have a body of data sent after the header lines. In a response, this is where the requested resource is returned to the client (the most common use of the message body), or perhaps explanatory text if there's an error. In a request, this is where user-entered data or uploaded files are sent to the server.

http://www.tutorialspoint.com/http/http_messages.htm

share improve this answer

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值