The notion of a "thin-client" is a myth today. Perhaps this will change with the proliferation of TV- or Palm-top browsers, but the vast majority of web clients today use a highly functional PC, with plenty of client-side storage and lots of interesting client-side content.
File Upload is a useful feature to virtually any web application. Here is a sample of how some of our customers are integrating file upload with their web applications:
- Web-based e-mail use file upload to add attachments to messages.
- Extranet Applications use file upload to send files among partners, such as certificates of conformance, software updates or documentation.
- Technical support sites use file upload to receive error logs and defective documents from users
- Intranet document publishing use file upload to share files among users with a friendly web interface
- Graphics libraries use file upload to control submissions and generate thumbnails
- ISP-hosted Storefronts use file upload to send product images.
Web-based file upload is vastly superior alternative to other means of transferring files to a central server over the Internet protocols. Let's examine why.
HTTP vs. FTP
FTP has been the standard mechanism for sending files to a server since the earliest days of TCP/IP. It is reliable, can take into account text vs. binary formats across platforms, and there are ubiquitous clients. However, compared to the flexibility of HTTP, it is deeply lacking. Let's compare:
- Authentication: With FTP uploads you must either manage many user accounts or allow anonymous access. With uploads via a web application, the application can determine who is allowed to upload, without a large administrative burden.
- Security: Uploads via HTTP can be SSL encoded so that the information is encrypted during transmission. There is no means for doing that using standard FTP.
- Ease of configuration: FTP uploads require the administrator to fine tune NTFS permissions. With HTTP-based uploads and your application, this is determined by the application as well as by the adminstrator, if desired.
- Flexibility: Want to save DOC files in one location and graphics in another? With FTP, your users have to know that. With a web application, you can enforce these policies in your application and change them without disrupting your users.
- Power: With a web application, you can limit the size of the uploaded file dynamically every time it is invoked. You could even change the size depending on information contained in the same form. Additionally, you can flush uploads that do match certain criteria, such as wrong MIME type or file contents.
- Simplicity and friendliness: A pleasing web page can offer instructions, advise, on-line help. This is not possible with batch based FTP. More importantly, when errors occur, you can provide immediate feedback to the user and offer corrective action.
- Firewall support: Many organizations do not allow out-bound FTP for security and intellectual property reasons. While this is simply a configuration issue, most firewalls do allow HTTP uploads.
- Supplemental Information: An HTTP upload (using RFC1867) renders accessible additional information about the upload, such as the user's original filename. This can be very useful in intranet scenarios.
- Upload to a database: Server-side components, such SA-FileUp, allows you to upload to an OLE DB database. Try that with FTP!
- Performance: both FTP and HTTP ultimately use the TCP protocol, which is the primary determinant of transfer performance.
- Reliability and Restart: Both FTP and HTTP 1.1 allow for transfer restart. Unfortunately, many servers including IIS, do not support restart of either protocol at this time. FTP restart is apparently coming in IIS5.
In short, like the web itself, it is programmability of the server that offers vast advantages of HTTP uploads over FTP.
Forms of HTTP upload
There are three mechanisms of file upload via HTTP: RFC1867, PUT and WebDAV.
HTTP Upload Method 1: RFC1867
RFC1867 (http://info.internet.isi.edu/in-notes/rfc/files/rfc1867.txt) stayed as a proposed standard within the IETF for a while before it received the blessing of the W3C ultimately in HTML 3.2. It was first implemented by Netscape in Navigator 2.0, followed by Microsoft as an add-on to IE 3.02 (32-bit) and native in IE 3.03 (16-bit). It is a very simple yet powerful idea: define a new type of form field
<INPUT TYPE= "FILE">
and add different encoding scheme to the form itself, rather than the typical:
<FORM ACTION="formproc.asp" METHOD= "POST">
<FORM ACTION="formproc.asp" METHOD="POST" ENCTYPE= "multipart/form-data">
This encoding scheme is much more efficient at transferring large amounts of data than the default "application/x-url-encoded" form encoding scheme. As you may be aware, URL encoding has a very limited character set. Anything outside of the character set must be replaced by '%nn' where nn is the two digit hexadecimal equivalent. For example, even the common <space> character is replaced by '%20'. If the browser had to encode entire files using this inefficient scheme, the transmitted size of the uploaded file could 2-3 times larger than the original file! Instead, RFC1867 use Multipart MIME encoding, as commonly found in e-mail messages, to transfer large amounts of data with no encoding, and just a few simple but useful headers around the data.
The result looks like a regular HTML form post, but rather than being say, 4 KB of form data, it can be megabytes long! RFC1867 also proposed a number of attributes of the TYPE="FILE" tag that have yet to adopted by the browser vendors. These include:
- ACCEPT: to let the web site restrict the type of file to be uploaded before receiving the file
- SIZE: to set size of a single filename text box or to allow multiple files with a single <INPUT> tag
- MAXLENGTH: to potentially set on the client-side, the maximum size file to be transferred.
- Wildcards and directory uploads: neither IE nor Navigator supports wildcarded names or directories even though this is suggested in the RFC.
Fortunately, both browser vendors implemented the suggested "Browse..." button so the user can easily pick the file to be uploaded using the native "Open File..." dialog box.
The use of the VALUE clause is interesting. Normally, it is intuitive to let the web site preset values of form fields for user convenience. However in this case, it could allow a nefarious web site to preset the name of the file to uploaded, and coupled with a client-side form submit, "steal" files off a user's PC without their consent. In the summer of 1997, the CERT in conjunction with an employee at Bell Labs, issued a security warning about this, and both Netscape and Microsoft quickly issued patches that prevent presetting the file be uploaded (see: http://www.microsoft.com/ie/security/bell.htm)
This is unfortunate, since the original RFC1867 clearly specified "it is important that a user agent not send any file that the user has not explicitly asked to be sent." So rather than disabling presetting the name entirely, the browser vendors could have simply issued an alert dialog box such as : "Are you want to transmit files x, y, z to the server?". As a final twist to this, yet another security hole was found in IE 4.01 in mid-October that allows a web site to circumvent IE's current security mechanism (see http://www.microsoft.com/windows/ie/security/paste.htm)
HTTP Upload Method 2: HTTP PUT
HTTP 1.1 introduced a new HTTP verb: PUT. When a web server receives an HTTP PUT and object name ("/myweb/image/x.gif"), it will authenticate the user, and take the content of the HTTP stream and store it directly to the web server. Since this could wreak havoc on a web site it is not used frequently. It also takes away HTTP's greatest advantage: programmability of the server. In the case of PUT, the web server handles the request itself: there is no room for a CGI or ASP application to step in. The only way for your application to capture a PUT is to operate on the low-level, ISAPI filter level. Most web developers have no interest in this, with due reason.
HTTP Upload Method 3: WebDAV
WebDAV (http://www.ietf.org/html.charters/webdav-charter.html) allows Distributed Authoring and Versioning of web content. It introduces several new HTTP verbs that permit uploading, locking/unlocking, check-in/check-out of web content via HTTP. Think of it as a non-proprietary Configuration Management (e.g. SourceSafe) plus file transfer for the web. Microsoft has publicly announced that it will be supported in IIS5, Office 2000 and future versions of IE. ISPs will love it as a replacement for the low-level, often broken, mechanics of FrontPage server extensions. Note that it will not replace the FrontPage server extensions: it will simply offer low-level standard services to support the more sophisticated functions that the server extensions currently perform. It is via WebDAV that Office 2000 can do those nifty "Save to web" functions you may have seen at the October '98 PDC.
Sounds great, right? Well, if all you are interested in is uploading content, WebDAV is great. It solves many problems. However, if you need file uploading within your web application, WebDAV will do nothing for you. Like HTTP PUT, the WebDAV verbs are interpreted by the server, not your web application. You need to work at the ISAPI filter level to access the WebDAV verbs and interpret the content in your application.
HTTP Upload Mechanisms: Conclusion
RFC1867 still remains the most flexible means of uploading files to your web application. PUT has very limited use. WebDAV is great for content authors, such as FrontPage users, but will be of little use to web developers who want to add file upload to their web application.
So we've concluded that RFC1867 is best way to add file upload capabilities to your web application. How is it actually implemented? What tools does Microsoft supply? What other tools are available?
Microsoft's Posting Acceptor
ASP does not understand the "multipart/form-data" encoding scheme. Instead, Microsoft provides for free the Posting Acceptor (http://www.microsoft.com/iis/support/iishelp/iis/htm/core/pareadme.htm). The Posting Acceptor is an ISAPI application that accepts a REPOST to an ASP page after the upload is complete. (See also Scott Stanfield's article in July '98 issue of MIND).
SA-FileUp from Software Artisans
SA-FileUp (http://www.softartisans.com/softartisans/saf.html) was one of the first commercial Active Server Components. Version 1 shipped in May '97 and is currently in use on thousands of sites world-wide including microsoft.com. Early betas used a combination of ISAPI filter and Active Server component for integration with ASP. Microsoft then delivered ASP 1.0b (ASP.DLL 184.108.40.206) that provided a new method: Request.BinaryRead. The BinaryRead method made available the raw, unprocessed data from the browser to an Active Server component. Once that was available, SA-FileUp dropped the need for the ISAPI filter and now exists purely as an ASP component.
Using Request.BinaryRead, as does SA-FileUp, is mutually exclusive with the Request.Form object. This makes sense: how could you read the raw stream of data from the browser and concurrently parse it as it were form information? To make life easier for the ASP developer, SA-FileUp reimplements all of the Request.Form functionality in its own .Form collection. This makes using SA-FileUp familiar to ASP coders who are used to using Request.Form.
Comparison of Posting Acceptor and SA-FileUp
Here is an objective as possible comparison between PA and SA-FileUp:
- ASP Integration: SA-FileUp is fully scriptable by Active Server Pages. Rather than existing as a separate ISAPI DLL, SA-FileUp integrates very smoothly with your ASP application.
- Standards support: PA Upload from IE browsers uses the proprietary WebPost API, rather than the standard RFC 1867, so by default you need different forms for Netscape and IE users.
- Anonymous Connections: Since PA uses an ISAPI DLL, it must provide additional security protection outside of your ASP application. For this reason, PA disallows all anonymous connections by default. PA 1.1 can allow anonymous uploads, but since there is programmatic control of the upload, there is a considerable security risk here. Since SA-FileUp is integrated with ASP, your application can decide the appropriate level of security, including anonymous.
- Control of the Upload: PA does not allow any control of the upload as it being sent. With SA-FileUp, you limit the size of the upload, or decide at run-time to flush the upload. Best of all, you can change the location of the upload dynamically.
- Processing: PA has a two-step upload and repost processing. With SA-FileUp, everything can be accomplished in a single step, such as writing to a database depending on the status of the upload.
- Uploading to a Database: PA can only upload to files. SA-FileUp can upload to files as well as databases.
- "Spaces in filenames": PA has a known issue when processing filenames that contains spaces. SA-FileUp has no such restriction.
- Price: PA is bundled with NT Option Pack and free for download from Microsoft. SA-FileUp is not free: it is a supported commercial component.
Scott Stanfield, President of Vertigo Software (http://www.vertigosoftware.com ), author of the Posting Acceptor article for the July '98 MIND Magazine, wrote to Software Artisans upon learning about SA-FileUp after the MIND article was published:
"We were very excited to learn about [SA-FileUp]. Fantastic and very valuable product"
Common Support Issues
By far, the most common support issues for file upload are security related. Typically, a site has secured NTFS permissions too carefully, which prevents the anonymous user account from writing to the destination file location. Also, security is often misunderstood by even advanced server administrators.
Remember that IIS/ASP executes each ASP page in a specific security context. If no authentication mechanism is in place (no Basic, no NT Challenge/Response), each page is executed as the anonymous user. The NT account that corresponds to the anonymous user can be set by the web admin.
For IIS3, the default anonymous user is IUSR_<computername>.
For IIS4, the default anonymous user is IUSR_<computername> for all in-process web applications ("Run in a separate memory space" is not checked). The default anonymous user is IWAM_<computername> for all out of process web applications ("Run in a separate memory space" is checked).
When using SA-FileUp, you must ensure that the destination directory has Read, Write and Delete permissions by the appropriate user.
If authentication is in force, then IIS/ASP will impersonate the authenticated user during the execution of the ASP page. So, the authenticated user's NT login account must have Read, Write and Permissions to the destination directory.
A complete discussion of IIS security is beyond the scope of the article. Please see the IIS 4 Resource Kit for a very good explanation.
Let's See Some Code
So enough theory, let see what the ASP code looks like.
A Single File Upload
Here is a simple HTML form that will upload a single file:
<HTML> <HEAD> <TITLE>Please Upload Your File</TITLE> </HEAD> <BODY> <form enctype="multipart/form-data" method="post" action="formresp.asp"> Enter filename to upload: <input type="file" name="f1"><br> <input type="submit"> </form> </BODY> </HTML>
Here would be the file 'formresp.asp'
<%@ LANGUAGE="VBSCRIPT" %> <HTML><HEAD> <TITLE>Upload File Results</TITLE> </HEAD> <BODY> Thank you for uploading your file.<br> <% Set upl = Server.CreateObject("SoftArtisans.FileUp") %> <% upl.SaveAs "C:/temp/upload.out" %><BR> Total Bytes Written: <%=upl.TotalBytes%> </BODY> </HTML>
File Upload with Additional Form Elements
Adding additional form elements is easy. It behaves just like a usual HTML form, as long as the ENCTYPE is specified correctly:
<HTML> <HEAD> <TITLE>Please Upload Your File</TITLE> </HEAD> <BODY> <form enctype="multipart/form-data" method="post" action="mformresp.asp"> Enter description: <input type="text" name="descrip"><br> Enter filename to upload: <input type="file" name="f1"><br> <input type="submit"> </form> </BODY> </HTML>
Here would be the file 'mformresp.asp':
<%@ LANGUAGE="VBSCRIPT" %> <HTML><HEAD> <TITLE>Upload File Results</TITLE> </HEAD> <BODY> Thank you for uploading your file.<br> <% Set upl = Server.CreateObject("SoftArtisans.FileUp") %> <% upl.SaveAs "C:/temp/upload.out" %><BR> Your description is: '<%=upl.Form("descrip")%>'<BR> Total Bytes Written: <%=upl.TotalBytes%> </BODY> </HTML>
What About Multiple Files
For multiple files, since the browser's do not support the SIZE= attribute, you must use an additional <INPUT> tag for every file:
Enter first filename: <input type="file" name="f1"><br> Enter second filename: <input type="file" name="f2"><br>
The form processing is the same:
<%@ LANGUAGE="VBSCRIPT" %> <HTML><HEAD> <TITLE>Multiple File Upload Results</TITLE> </HEAD> <BODY> Thank you for uploading your files.<br> <% Set upl = Server.CreateObject("SoftArtisans.FileUp") %> <% upl.Form("f1").SaveAs "C:/temp/upload1.out" %><BR> Total Bytes Written for file 1: <%=upl.Form("f1").TotalBytes%> <% upl.Form("f2").SaveAs "C:/temp/upload2.out" %><BR> Total Bytes Written for file 2: <%=upl.Form("f2").TotalBytes%> </BODY> </HTML>
Limiting the Size of the Upload
To limit the size of the upload, simply set a property:
<%@ LANGUAGE="VBSCRIPT" %> <HTML><HEAD> <TITLE>Upload File Results</TITLE> </HEAD> <BODY> Thank you for uploading your file.<br> <% Set upl = Server.CreateObject("SoftArtisans.FileUp") %> <% upl.MaxBytes = 1000 '--- limit the upload size to 1000 bytes %> The maximum size that you are permitted to upload is <%=upl.MaxBytes%> bytes per file.<br> <% upl.SaveAs "C:/temp/upload.out" %> Total Bytes Written: <%=upl.TotalBytes%><br> Server Filename: <%=upl.ServerName%><br> Total Bytes Transmitted by you: <%=Request.TotalBytes%> </BODY> </HTML>
Any content after the 1000th byte will be discarded, so the web server's disks are not unnecessarily filled.
Uploading files to your web application is simple: it can be accomplished in as little as two lines of ASP code. HTTP/RFC1867 file upload is the preferred mechanism because of the rich programming environment offered by the server. SA-FileUp, as an Active Server component integrated with ASP, offers significant advantages over the free Posting Acceptor from Microsoft.
Posting Acceptor Newsgroup: news://msnews.microsoft.com/microsoft.public.site-server.postingacceptr
David Wihl is President of Software Artisans, Inc. (http://www.softartisans.com), in Brookline, MA, a rapidly growing provider of high performance Active Server Components. He was the original author of SA-FileUp and is still deeply involved with its exciting upcoming enhancements. He can be reached at firstname.lastname@example.org
SA-FileUp? is a trademark of Software Artisans, Inc.
All other trademarks are the property of their respective owners.