HTTP Header

Unacceptable Browser HTTP Accept Headers (Yes, You Safari and Internet Explorer)

Update: WebKit team responds to this post. Admits error, downplays importance.

When a web browser make a request it sends information to the server about what it is looking for in headers. One of these headers is the Accept header. The Accept header tells the server what file formats, or more correctly MIME-types, the browser is looking for. Let's take a look at Firefox's Accept header:

GET /page/routing-in-recess-screencast HTTP/1.1
Host: RecessFramework.org
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

Let's translate Firefox's request to English:

Dear RecessFramework.org,
I want the resource "/page/routing-in-recess-screencast" and I want it in an HTML or XHTML format. If you cannot serve me this way, I'll take "/page/routing-in-recess-screencast" in an XML instead. If you can't even give it to me in XML, well, I'll take anything you've got!
Love,
Firefox

The Accepts header gives the browser a chance to tell the server which format it wants for a resource. By giving a list of options this content negotiation happens in a single request. One of the key design goals of the HTTP spec is to minimize back-and-forth communication. The browser could ask for each of these formats one at a time but it would be wasteful.

How does the browser specify the preference Give me HTML/XHTML before XML before *? Preference is indicated by the "relative quality parameter" (q) and its value (qvalue), seen in application/xml;q=0.9,*/*;q=0.8. Here's how the HTTP spec defines it:

Each media-range MAY be followed by one or more accept-params, beginning with the "q" parameter for indicating a relative quality factor. The first "q" parameter (if any) separates the media-range parameter(s) from the accept-params. Quality factors allow the user or user agent to indicate the relative degree of preference for that media-range, using the qvalue scale from 0 to 1 (section 3.9). The default value is q=1.

For as brilliant as the spec is, it is a terrible read. What's going on is simple:

  1. Everything item's default preference value is 1.
    1: html, xhtml, xml, *
  2. If an item specifies q=X, its preference value is X.
    0.9: xml
    0.8: */*
    1: html, xhtml
  3. Order by preference value in descending order.
    1: html, xhtml
    0.9: xml
    0.8: *

The only other major detail is in cases where there are ambiguities the more specific one wins. For example if both application/xml and */* had a preference of 0.9 application/xml would still come first. Firefox chooses to make it explicit that */* is less preferred by giving it a preference of 0.8. Firefox's Accept header is sensible and well thought out. Opera's is too. Other browsers: not so much.

What in The Header Were You Thinking WebKit?

Don't relax yet IE, you're up next, and you're even more egregious. So, what's wrong with WebKit, the lauded engine behind Safari and Google's Chrome? Let's take a look:

GET /page/restful-php-framework HTTP/1.1
Host: RecessFramework.org
Accept: application/xml,application/xhtml+xml,text/html;q=0.9,
        text/plain;q=0.8,image/png,*/*;q=0.5

Note: Accept split to two lines for width. On quick glance it doesn't look too different from Firefox's. Let's try it again in English just to be sure.

Dear RecessFramework.org,
I want the resource "/page/restful-php-framework" and I want it in an XML, XHTML, or PNG format. If you cannot serve me this way, I'll take "/page/routing-in-recess-screencast" in HTML or plain text instead. If you can't do that for me I'll take whatever!
Thanks,
WebKit

Really WebKit? The browsing engine most responsible for killing XHTML prefers XHTML over HTML! It would also prefer PNG over HTML. That's a little embarrassing, but what is worse: Safari and Chrome accept XML over HTML (and, ambiguously, over XHTML, too). WebKit's Accept header forces web developers to work against the HTTP spec.

Suppose you are Twitter and want to be a good RESTful internet citizens following the HTTP spec. You've got a resource called a tweet that can be represented as XML or JSON or HTML. You wouldn't want Safari users to get an XML copy of a Tweet by browsing around, so you have to actively ignore WebKit's Accept header preferring XML above all else. Aside: It turns out Twitter's REST API ignores many REST/HTTP best practices like the Accept header, anyway, but that's another story for another post.)

Update from Maciej Stachowiak of Apple's WebKit team:

Most WebKit-based browsers (and Safari in particular) would probably do a better job rendering HTML than XHTML or generic XML, if only because the code paths are much better tested. So the Accept header is somewhat in error. On the other hand, this isn't a hugely important bug, and we design our Accept header mainly to give the best compatibility on Web sites, since content negotiation is not really used much in the wild. Our current header was copied from an old version of Firefox.

Internet Explorer Accepts Polluting the Internet

We've covered the good and the bad. Now let's talk about Internet Explorer. The IE team made great strides with being a nicer player on the web. Unfortunately, its Accepts header is downright ugly:

GET /book/html/index.html HTTP/1.1
Host: RecessFramework.org
Accept: image/jpeg, application/x-ms-application, image/gif,
        application/xaml+xml, image/pjpeg, application/x-ms-xbap,
        application/x-shockwave-flash, application/msword, */*

This is the Accepts header for IE8 on a Windows 7 machine. One peculiarity is the "application/msword" MIME-type. Office isn't installed but the Word Document Viewer is. This made me wonder, what does IE's Accept header look like on a machine with Office installed?Brace yourlselves:

GET /book/html/index.html HTTP/1.1
Host: RecessFramework.org
Accept: image/gif, image/jpeg, image/pjpeg, application/x-ms-application,
        application/vnd.ms-xpsdocument, application/xaml+xml,
        application/x-ms-xbap, application/x-shockwave-flash,
        application/x-silverlight-2-b2, application/x-silverlight,
        application/vnd.ms-excel, application/vnd.ms-powerpoint,
        application/msword, */*

Ok, now let's translate to English:

Dear RecessFramework.org,
I want the resource "/book/html/index.html". Now, bear with me, I'm Internet Explorer and Office is installed so I can accept this resource in a lot of formats, in this order of preference: GIF, JPG, Progressive JPG, Click Once App, Microsoft XPS Document, XAML, XAML Browser App, Flash, Silverlight 2, Silverlight 1, Excel Document, Powerpoint Document, or a Word Document. If you can't give me "/book/html/index.html" in any of those formats then give me anything you've got!
Thanks,
Internet Explorer

There are two things wrong with this picture. The lesser evil: IE has a hook for other applications to insert new MIME-types into its Accept header. This means if a resource could be represented on the server as a Word Document or as an HTML document, Word as an application can inject behavior into IE so that it always has higher precedence than HTML. All an application has to do is modify the registry (HKLM/Software/Microsoft/Windows/CurrentVersion/Internet Settings/Accepted Documents). (Hear that Cisco? You could increase internet consumption if you stuck a couple 255 character WebEx MIME-types in IE's Accept header.)

The greater evil is that IE sends this ~200-300byte Accept header for every single browsing request. 250 bytes isn't much, but on internet scale per every request of the most popular browser, it adds up. Internet Explorer's Accept header emissions pollute the information superhighway. Lets do some back-of-napkin calculation. Google gets 294 million searches a daynow. If IE has roughly 55% market share thats 162 million IE requests on Google a day for 38GB worth of garbage internet traffic. On Google searches alone, IE pollutes the internet with over a terabyte of traffic every month in its Accept header. Anyone want to estimate what this number looks like across the rest of the internet?

Update 1: IE team Program Manager Eric Lawrence "I strongly recommend that developers not list MIME types here." Yet Silverlight and Office do. Whoops.

Update 2: IE doesn't send the extended header on *every* request, it sends */* for refreshes and some subsequent visits. [IEBlog]

It is not just wasted bandwidth that is the problem, it is wasted server processing, too. If a server or framework wants to follow through on the HTTP protocol the server must be sure it can't respond with any of the requested formats before it can respond with HTML. Bottom line: IE's Accept header is extremely ugly.

If WebKit is Foolish and IE is Prodigal how valuable is the Accept header?

This was the question I asked myself about half-way through writing the Accept parsing and content-negotiation code going into the next release of Recess, a RESTful PHP framework.

Content-negotiation with the Accept header is an interesting idea in principle that is hard to use properly in practice because browsers misuse it. As stated, Twitter's REST API doesn't use the Accept header for content-negotiation, they use extensions on the URL '.json' and '.xml'. Rails disables the Accept header by default. Frameworks can enhance performance by ignoring the Accept header and relying on '.xml'-like extensions. As such the next release of the Recess Framework, too, will disable Accept header based content-negotiation by default.

So, when would you want to parse Accept headers for content negotiation? When your consumers are respectful of HTTP and REST (RESpecTful!). This could mean RIAs written in javascriptFlash, or Silverlight. It could also mean other other servers consuming your RESTful API.

Bottom line: If you're building APIs for other developers to consume, consider using Accept-based content-negotiation. If you're building consumer facing web apps: ignore the Accept header until WebKit and IE get their acts together.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值