A Call To Web Standards
Having read thousands of pages of articles on the web, magazines, MSDN documentation, online forums, etc, I still don't recall seeing a call for producing ASP.NET code compliant with web standards. Wait! I take it back. I saw one—a post in Scott Guthrie's blog. This one is a must-read! According to Scott, the upcoming version of Visual Studio .NET will feature server controls that produce web standard-compliant code, accessibility validation, etc. Now that's very good news!
As of the time of this writing ASP.NET does not produce code that is capable of passing successful validation in any of the SRTICT modes (see Eric Meyer's Picking a Rendering Mode and W3C's List of valid DTDs you can use in your document for more information on DOCTYPEs). To enforce XHTML compliant code it takes some effort to implement automatic code cleaning (all right, fudging).
The point of this article is two-fold—to reiterate the importance of web standards and learn how to implement response filters.
Anatomy of HTTP Response Filters
Instead of creating an abstract sample for this discussion, I'll refer to a real-world example of a filter application. This very site, www.AspNetResources.com, utilizes this filter to enforce XHTML 1.0 Strict compliancy.
The HttpResponse class has a very useful property:
public Stream Filter {get; set;}
MSDN provides a helpful description of this property: "Gets or sets a wrapping filter object used to modify the HTTP entity body before transmission." Confused? In other words, you can assign your own custom filter to each page response. HttpResponse will send all content through your filter. This filter will be invoked right before the response goes back to the user and you will have a change to transform it if need be. This could be extremely helpful if you need to transform output from "legacy" code or substitute placeholders (header, footer, navigation, you name it) with proper code. Besides, at times it's simply impossible to ensure that every server control plays by the rules and produces what you expect it to. Enter response filters.
The Filter
property is of type System.IO.Stream
. To create your own filter you need to derive a class from System.IO.Stream
(which is an abstract class) and add implementation to its numerous methods.
using System; using System.Text; using System.Text.RegularExpressions; using System.IO; using System.Web; namespace AspNetResources.Web { /// <summary> /// PageFilter does all the dirty work of tinkering the /// outgoing HTML stream. This is a good place to /// enforce some compilancy with web standards. /// </summary> public class PageFilter : Stream { Stream responseStream; long position; StringBuilder responseHtml; public PageFilter (Stream inputStream) { responseStream = inputStream; responseHtml = new StringBuilder (); } #region Filter overrides public override bool CanRead { get { return true;} } public override bool CanSeek { get { return true; } } public override bool CanWrite { get { return true; } } public override void Close() { responseStream.Close (); } public override void Flush() { responseStream.Flush (); } public override long Length { get { return 0; } } public override long Position { get { return position; } set { position = value; } } public override long Seek(long offset, SeekOrigin origin) { return responseStream.Seek (offset, origin); } public override void SetLength(long length) { responseStream.SetLength (length); } public override int Read(byte[] buffer, int offset, int count) { return responseStream.Read (buffer, offset, count); } #endregion #region Dirty work public override void Write(byte[] buffer, int offset, int count) { string strBuffer = System.Text.UTF8Encoding.UTF8.« GetString (buffer, offset, count); // --------------------------------- // Wait for the closing </html> tag // --------------------------------- Regex eof = new Regex ("</html>", RegexOptions.IgnoreCase); if (!eof.IsMatch (strBuffer)) { responseHtml.Append (strBuffer); } else { responseHtml.Append (strBuffer); string finalHtml = responseHtml.ToString (); // Transform the response and write it back out byte[] data = System.Text.UTF8Encoding.UTF8.« GetBytes (finalHtml); responseStream.Write (data, 0, data.Length); } } #endregion
As you can see most methods have more or less dummy code. The Write
method does all the heavy lifting. Before we transform the output stream we need to wait until the buffer is full. Therefore a Regex looks for the closing </html>
tag.
Now that we have the entire HTML response string we can transform it. I really liked Julian Roberts' approach as laid out in his Ensuring XHTML compliancy in ASP.NET article, although I chose to redo the regular expressions to my liking.
Forcing XHTML Compliancy
Basically, this particular filter simply tries to fix a few of the known inconsistencies:
- Place the
__VIEWSTATE
hidden input in a<div>
to make the validator happy. - Remove the
name
attribute from the main form. By default your server-side form gets aname
and anid
attribute. The validator is not happy about thename
attribute so we need to get rid of it.
My first take is to wrap the __VIEWSTATE
input in a <div>
:
// Wrap the __VIEWSTATE tag in a div to pass validation re = new Regex ("(<input.*?__VIEWSTATE.*?/>)", RegexOptions.IgnoreCase); finalHtml = re.Replace (finalHtml, new MatchEvaluator (ViewStateMatch));
The Regex
class allows you to wire a match evaluator delegate which kicks in every time a match is found. The ViewStateMatch
delegate is implemented as follows:
private static string ViewStateMatch (Match m) { return string.Concat ("<div>", m.Groups[1].Value, "</div>"); }
If you were to implement Step 2 and use this filter as-is right now you'd run into some issues with post-back processing. Why's that? View the page source. Your __doPostBack
method will look something like this:
function __doPostBack(eventTarget, eventArgument) { var theform; if (window.navigator.appName.toLowerCase().indexOf("netscape") > -1) { theform = document.forms["mainForm"];} else { theform = document.mainForm; } ... }
The gotcha here is that the form is referenced by its name, not id. If we get rid of the name
attribute it can't handle postbacks. With the name
attribute it's not valid XHTML code. Seems to be a catch-22 situation.
The following hack is of my own making. So far it has worked fine on this site and our www.custfeedback.com site, so I can't complain. However, keep in mind this is a hack so use it wisely and test your code well before going to production.
I decided to rewrite the __doPostback
method to use DOM as opposed to the "old ways". This is to say, "To hell with old and bad browsers". Browser usage stats show that the ones without DOM1 support are almost extinct. Therefore assess your audience and see if this is going to work for you.
// If __doPostBack is registered, replace the whole function if (finalHtml.IndexOf ("__doPostBack") > -1) { try { int pos1 = finalHtml.IndexOf ("var theform;"); int pos2 = finalHtml.IndexOf ("theform.__EVENTTARGET", pos1); string methodText = finalHtml.Substring (pos1, pos2-pos1); string formID = Regex.Match (methodText,« "document.forms//[/"(.*?)/"//];", RegexOptions.IgnoreCase).« Groups[1].Value.Replace (":", "_"); finalHtml = finalHtml.Replace (methodText, @"var theform = document.getElementById ('" + formID + "');"); } catch {} }
http://weblogs.asp.net/scottgu/archive/2003/11/25/39620.aspx
The transformed __doPostback
should look similar to this:
function __doPostBack(eventTarget, eventArgument) { var theform = document.getElementById ('mainForm'); ... }
This one will keep the validator happy. And last, but not least, we're supposed to remove the name
attribute from the main form.
// Remove the "name" attribute from <form> tag(s) re = new Regex("<form//s+(name=.*?//s)", RegexOptions.IgnoreCase); finalHtml = re.Replace(finalHtml, new MatchEvaluator(FormNameMatch));
A corresponding match evaluator delegate is implemented like this:
private static string FormNameMatch (Match m) { return m.ToString ().Replace (m.Groups[1].Value, string.Empty); }
Installing the Request Filter
I prefer to wire a request filter in an HttpModule
. The nuts and bolts of the HttpModule
and HttpApplication
classes are outside the scope of this article. You can find a brief overview in my other article, ASP.NET Custom Error Pages.
Below is bare-bones code of an HttpModule
:
// --------------------------------------------- public void Init (HttpApplication app) { app.ReleaseRequestState += new EventHandler(InstallResponseFilter); } // --------------------------------------------- private void InstallResponseFilter(object sender, EventArgs e) { HttpResponse response = HttpContext.Current.Response; if(response.ContentType == "text/html") response.Filter = new PageFilter (response.Filter); }
The app
parameter passed to the Init
method is of type System.Web.HttpApplication
. You tap into the ASP.NET HTTP pipeline by wiring handlers of the various HttpApplication
events. The diagram on the left illustrates the sequence of these events. See how late in the game your page filter is called? In the code sample above I install the response filter in the ReleaseRequestState
event handler. To make sure the filter processes only pages I explicitly check for content type:
if (response.ContentType == "text/html") response.Filter = new PageFilter (response.Filter);
The final step of plugging your HttpModule into the pipeline is listing it in web.config
(also explained in my other article):
<system.web> <httpModules> <add name="MyHttpModule" type="MyAssembly.MyHttpModule, MyAssembly" /> </httpModules> </system.web>
Remember to replace MyHttpModule
and MyAssembly
with appropriate module and assembly names from your project.
Performance Considerations
Back when we were implementing the Write method I used a string variable, finalHtml. Keep in mind that in .NET strings are immutable, i.e. you cannot change a string's length or modify any of its characters. For example:
finalHtml = re.Replace(finalHtml, new MatchEvaluator (FormNameMatch));
The finalHtml variable holds the entire HTML response. When the line of code above runs a whole new string will be allocated and assigned to finalHtml. If you manipulate large strings and do it again and again it may negatively effect performance and breed garbage in memory.
When Filters Don't Work At All
One last issue before I wrap up this article. Your filter won't be called at all if you call HttpApplication.CompleteRequest()
one way or another. The pipeline will bypass your filter and send an unmodified response. The following methods do call HttpApplication.CompleteRequest()
:
- Server.Transfer()
- Response.End()
- Response.Redirect()
The only one that doesn't call HttpApplication.CompleteRequest()
is Server.Execute()
.
"You lie!!!" No, see for yourselves:
// --- HttpServerUtility.Transfer --- public void Transfer(string path, bool preserveForm) { if (this._context == null) throw new HttpException(...); this.ExecuteInternal(path, null, preserveForm); this._context.Response.End(); }
// --- HttpServerUtility.Execute --- public void Execute(string path) { this.ExecuteInternal(path, null, 1); }
// --- HttpResponse.End --- public void End() { ... this.Flush(); this._ended = true; this._context.ApplicationInstance.CompleteRequest(); }
// --- HttpResponse.Redirect --- public void Redirect(string url, bool endResponse) { ... if (endResponse) this.End(); }
If HttpApplication.CompleteRequest()
is called during an event the ASP.NET HTTP pipeline will interrupt request processing once the event handling completes. If it's of any consolation it will fire the EndRequest
event.
Conclusion
I hope this article was a wake-up call in terms of programming with web standards in mind. We looked at a real-world example of writing a request filter and enforcing XHTML 1.0 compliancy. This is a highly experimental article as you will most likely discover other gotchas when you validate your pages against the W3C MarkUp Validation Service.
As indicated at the beginning of the article, ASP.NET 2.0 is supposed to bring to the table a host of useful features. Why bother with request filters then? Why all this hacking? XHTML compliancy is promised anyway. Well, we can just sit around and drool over the features of Whidbey, Yukon and what have you. We have real jobs, real projects and a real paycheck. Besides, I illustrated only one practical application of a filter. There are many more.
There's a common misconception that ASP.NET is easy to master and that it just takes care of everything for you. Not so. ASP.NET is not easy. It's powerful. It puts you in the driver's seat. Therefore hacking doesn't go away any time soon.