The User-Agent String: Use and Abuse

When I first joined the IE team five years ago, I became responsible for the User-Agent string. While I’ve owned significantly more “important” features over the years, on a byte-for-byte basis, few have proved as complicated as the “simple” UA string.

I (and others) have written a lot about the UA string over the years. This post largely assumes that you’re familiar with what the user-agent string is and what it’s commonly (mis)used for. 

In this post, I’ll try to summarize why the UA string causes so many problems (beyond browser version sniffing), and expose the complex tradeoff between compatibility and extensibility.
 
Background
 
First things first-- you can check the UA string currently sent by your browser using my User-Agent string test page.

Do you see anything in there that you weren’t expecting?
 
Changing the User-Agent String at Runtime
 
For IE8, we fixed significant bugs in the UrlMkSetSessionOption API, which allows setting of the User-Agent for the current process. Before IE8, calling this API inside IE would (depending on timing) set the User-Agent sent to the server by WinINET, or set the User-Agent property in the DOM, but never properly set both.

I developed a simple User-Agent Picker Add-on for IE8 that allows you to change your User-Agent string to whatever you like. You can then easily see how websites react to various UA strings. For instance, try sending the GoogleBot UA string to MSDN to see how that site is optimized for search.
 
Internally, the add-on simply exercises the URLMon API:


UrlMkSetSessionOption(URLMON_OPTION_USERAGENT, szNewUA, strlen(szNewUA), 0)
 
Alternatively, Web Browser Control hosts can change the User-Agent string sent by hyperlink navigations by overriding the OnAmbientProperty method for DISPID_AMBIENT_USERAGENT. However, the overridden property is not used when programmatically calling the Navigate method, and it will not impact the userAgent property of the DOM's navigator or clientInformation objects.
 
Extending the User-Agent String in the Registry
 
It’s trivial to add tokens to the User-Agent string using simple registry modifications. Tokens added to the registry keys are sent by all requests from Internet Explorer and other hosts of the Web Browser control. These registry keys have been supported since IE5, meaning that all currently supported IE versions will send these tokens.
 
Other browsers (Firefox, Chrome, etc) do not offer the same degree of ease in extending the UA string, so it’s uncommon for software to extend the UA string in non-IE browsers.
 

Update 3/23/2010: IEBlog announces that IE9 will no longer send registry tokens to the server.
 
The Fiasco
 
Unfortunately, the ease of extending IE’s UA string means that it’s a very common practice. That, in turn, leads to a number of major problems that impact normal folks who don’t even know what a UA string is. 

A few of the problems include:
 1.Many websites will return only error pages upon receiving a UA header over a fixed length (often 256 characters).
2.In IE7 and below, if the UA string grows to over 260 characters, the navigator.userAgent property is incorrectly computed.
3.Poorly designed UA-sniffing code may be confused and misinterpret tokens in the UA.
4.Poorly designed browser add-ons are known to misinterpret how the registry keys are used, and shove an entire UA string into one of the tokens, resulting in a “nested” UA string.
5.Because UA strings are sent for every HTTP request, they entail a significant performance cost. In degenerate cases, sending the UA string might consume 50% of the overall request bandwidth.

Two real-world examples:
 
My bank has problem #1. They have security software on their firewall looking for “suspicious” requests, and the developers assumed that they’d never see a UA over 256 bytes.
 
Some major sites are using super-liberal UA parsing code (problem #3) to detect mobile browsers. Unfortunately, for instance, Creative Labs adds the token “Creative AutoUpdate” to the UA string. Naive server code sees the characters pda inside that token and decides that the user must be on a mobile browser. The server might then return WML content that the desktop browser will not even render, or provide an otherwise degraded experience. Worse still, some sites don’t send a Vary: User-Agent response header when returning the mobile content, meaning that network proxies will sometimes start sending everyone content designed for mobile devices.
 
Ultimately, the problem is what economists call the Tragedy of the Commons, although personally I prefer the visual representation. You might remember that the extensibility of the Accept header leads to the same problem, although that header is sent so unreliably that no sane website would depend upon it.
 
Standards
 
It’s tempting to look to the standards for restrictions on the UA string. Unfortunately, the RFC for HTTP has little to say on the topic:
 

14.43 User-Agent

The User-Agent request-header field contains information about the user agent originating the request. This is for statistical purposes, the tracing of protocol violations, and automated recognition of user agents for the sake of tailoring responses to avoid particular user agent limitations. User agents SHOULD include this field with requests. The field can contain multiple product tokens (section 3.8) and comments identifying the agent and any subproducts which form a significant part of the user agent. By convention, the product tokens are listed in order of their significance for identifying the application.

User-Agent = "User-Agent" ":" 1*( product | comment )

Example:

User-Agent: CERN-LineMode/2.15 libwww/2.17b3
 
Notably, the RFC does not define a maximum length for the header value, and does not provide much guidance into what “subproducts which form a significant part of the user agent” means. It suggests a few broad uses of the UA string on the server-side, without discussion of what problems such usage might introduce.

Motivations for UA Modification
 
OEMs and ISVs have a number of motivations for adding to the UA string.
 1.Metrics. Every server on the web can easily tell if your software is installed.
2.Client capability detection. JavaScript can easily detect if your (ActiveX control / Protocol Handler / Client application / etc) is available.
3.User Tracking. I don’t know of any current offenders, but at some point in the past some software would add a GUID token to the UA string. This token would effectively act as an invisible “super-cookie” that would be sent to every site the user ever visited.

Now, scenario #3 is clearly evil, and we have no desire to support it. Scenarios #1 and #2 aren’t inherently bad—but advertising to every site in the world that a given piece of software is available on the client is probably the wrong design.

 

注意:自己在开发中需要将用户本机的ie版本的user_agent拿到再添加自己的标示进去,在拿到user_agent的时候有两种做法,一种是从注册表读进来,另一种是从com接口,

因为我们的bug是用alert(navigator.userAgent);弹出的只有le的,没有添加我们自己的,所以后来我是通过 IHTMLWindow2的navigator接口来拿到user_agent的。但是这样拿到的user_agent,是有一个问题的,问题 就是你第一次设置进去的有了你的标志,第二次再通过页面访问,拿到的user_agent就已经是你第一次设置的了。就是说只要设置一次就ok了。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值