DOM Based Cross Site Scripting or XSS of the Third Kind

Summary

We all know what Cross Site Scripting (XSS) is, right? It’s that vulnerability wherein one sends malicious data (typically HTML stuff with Javascript code in it) that is echoed back later by the application in an HTML context of some sort, and the Javascript code gets executed. Well, wrong. There’s a kind of XSS which does not match this description, at least not in some fundamental properties. The XSS attacks described above are either “non-persistent”/“reflected” (i.e. the malicious data is embedded in the page that is returned to the browser immediately following the request) or “persistent”/“stored” (in which case the malicious data is returned at some later time). But there’s also a third kind of XSS attacks - the ones that do not rely on sending the malicious data to the server in the first place! While this seems almost contradictory to the definition or to common sense, there are, in fact, two well described examples for such attacks. This technical note discusses the third kind of XSS, dubbed “DOM Based XSS”. No claim is made to novelty in the attacks themselves, of course, but rather, the innovation in this write-up is about noticing that these belong to a different flavor, and that flavor is interesting and important.

Application developers and owners need to understand DOM Based XSS, as it represents a threat to the web application, which has different preconditions than standard XSS. As such, there are many web applications on the Internet that are vulnerable to DOM Based XSS, yet when tested for (standard) XSS, are demonstrated to be “not vulnerable”. Developers and site maintainers (and auditors) need to familiarize themselves with techniques to detect DOM Based XSS vulnerabilities, as well as with techniques to defend against them, both therewhich are different than the ones applicable for standard XSS.


Introduction

The reader is assumed to possess basic knowledge of XSS ([1], [2], [3],[4], [8]). XSS is typically categorized into “non-persistent” and “persistent” ([3], “reflected” and “stored” accordingly, as defined in [4]). “Non-persistent” means that the malicious (Javascript) payload is echoed by the server in an immediate response to an HTTP request from the victim. “Persistent” means that the payload is stored by the system, and may later be embedded by the vulnerable system in an HTML page provided to a victim. As mentioned in the summary, this categorization assumes that a fundamental property of XSS is having the malicious payload move from the browser to the server and back to the same (in non-persistent XSS) or any (in persistent XSS) browser. This paper points out that this is a misconception. While there are not many counterexamples in the wild, the mere existence of XSS attacks which do not rely on the payload embedded by the server in some response page, is of importance as it has a significant impact on detection and protection methods. This is discussed in the document.


Example and Discussion

Before describing the basic scenario, it is important to stress that the techniques outlined here were already demonstrated in public (e.g.[5], [6] and [7]). As such, it is not claimed that the below are new techniques (although perhaps some of the evasion techniques are).

在描述基本方案之前,需要强调的是此处列出的技术已经有现成的例子。所以,下面的技术不属于新技术,可能一些防护技术较新。

The prerequisite is for the vulnerable site to have an HTML page that uses data from the document.location or document.URL ordocument.referrer (or any various other objects which the attacker can influence) in an insecure manner.

具有漏洞的网站有HTML页面以不安全的方式使用从 document.location或者document.URL ordocument.referrer(或者其他攻击者可以控制的其他对象)中获取的数据。

NOTE for readers unfamiliar with those Javascript objects: when Javascript is executed at the browser, the browser provides the Javascript code with several objects that represent the DOM (Document Object Model). The document object is chief among those objects, and it represents most of the page’s properties, as experienced by the browser. This document object contains many sub-objects, such aslocation, URL and referrer. These are populated by the browser according to the browser’s point of view (this is significant, as we’ll see later with the fragments). So, document.URL anddocument.location are populated with the URL of the page, as the browser understands it. Notice that these objects are not extracted of the HTML body - they do not appear in the page data. The documentobject does contain a body object that is a representation of the parsed HTML.

对Javascript对象不熟悉的读者:当Javascript在浏览器端执行时,浏览器会为javascript代码提供多个可以呈现DOM(文档对象模型)的对象。在这些对象中,document对象是最主要的,该对象可以呈现浏览器表达的页面的大多数属性。document对象包含多个子对象,如location、URL和referrer。浏览器根据浏览器的点视图进行定位。所以,document.URL和document.location用页面的URl进行填充。注意的是这些对象不提取HTML主体,这些HTML主体不会再页面数据中出现。document对象包含一个主体对象,该主体对象表述解析的HTML。

It is not uncommon to find an application HTML page containing Javascript code that parses the URL line (by accessing document.URL or document.location) and performs some client side logic according to it. The below is an example to such logic.

含有解析URL行并根据解析的数据实现客户端的逻辑的Javascript代码的HTML页面不常见。下面的示例是这样的逻辑:

In analogy to the example in [2] (and as an essentially identical scenario to the one in [7]), consider, for example, the following HTML page (let’s say this is the content ofhttp://www.vulnerable.site/welcome.html): 

<HTML>
<TITLE>Welcome!</TITLE>
Hi
<SCRIPT>
var pos=document.URL.indexOf("name=")+5;
document.write(document.URL.substring(pos,document.URL.length));
</SCRIPT>
<BR>
Welcome to our system

</HTML>
 

Normally, this HTML page would be used for welcoming the user, e.g.:

  http://www.vulnerable.site/welcome.html?name=Joe

However, a request such as:

  http://www.vulnerable.site/welcome.html?name=
  <script>alert(document.cookie)</script> 

would result in an XSS condition. Let’s see why: the victim’s browser receives this link, sends an HTTP request to www.vulnerable.site, and receives the above (static!) HTML page. The victim’s browser then starts parsing this HTML into DOM. The DOM contains an object called document, which contains a property called URL, and this property is populated with the URL of the current page, as part of DOM creation. When the parser arrives to the Javascript code, it executes it and it modifies the raw HTML of the page. In this case, the code referencesdocument.URL, and so, a part of this string is embedded at parsing time in the HTML, which is then immediately parsed and the Javascript code found (alert(…)) is executed in the context of the same page, hence the XSS condition.

如果请求输入 http://www.vulnerable.site/welcome.html?name=<script>alert(document.cookie)</script>,可以导致XSS的情况。查看下原因:受害者浏览器收到这个链接,发送HTTP请求至www.vulnerable.site,并接受上述的HTML页面。受害者浏览器会解析HTML至DOM中。DOM包含document对象,该对象包含一个URL属性,该属性以当前页面的URL进行填充,作为创建DOM的一部分。当解析器遇到Javascript 代码时,会执行代码并修改原来的HTML页面。在这个例子中,代码引用document.URL,在HTML解析过程中会嵌入部分字符串,进而解析并执行javascript代码,造成xss的情况。


Notes:
1. The malicious payload was not embedded in the raw HTML page at any time (unlike the other flavors of XSS).

任何时候恶意代码不被嵌入在源HTML页面中。

2. This exploit only works if the browser does not modify the URL characters. Mozilla automatically encodes < and > (into %3C and %3E, respectively) in the document.URL when the URL is not directly typed at the address bar, and therefore it is not vulnerable to the attack as shown in the example. It is vulnerable to attacks if < and > are not needed (in raw form). Microsoft Internet Explorer 6.0 does not encode <and >, and is therefore vulnerable to the attack as-is.

Of course, embedding in the HTML directly is just one attack mount point, there are various scenarios that do not require < and >, and therefore Mozilla in general is not immune from this attack.

这种利用仅在浏览器不改变URL字符时有效。当URL不是直接输入至地址栏中时,Mozilla自动在document.URL中重编码<和>(一般为%3C %3E) ,所以在这个例子中就不存在漏洞。如果攻击不需要< 和> ,那么还可能被攻击。Microsof IE 6.0不会重编码<和> ,所以就存在漏洞。

当然,直接嵌入至HTML中仅是攻击的一种形式,存在很多种不需要<和>的方式,所以Mozilla对这个攻击也不具有免疫力。


Evading standard detection and prevention technologies

In the above example, it may be argued that still, the payload did arrive to the server (in the query part of the HTTP request), and so it can be detected just like any other XSS attack. But even that can be taken care of. Consider the following attack: 

上面的例子中,仍有讨论的余地,payload没有到达服务器,且如其他的XSS攻击一样可以被检测到。

http://www.vulnerable.site/welcome.html#name=<script>alert(document.cookie)<script> 

Notice the number sign (#) right after the file name. It tells the browser that everything beyond it is a fragment, i.e. not part of the query. Microsoft Internet Explorer (6.0) and Mozilla do not send the fragment to the server, and therefore, the server would see the equivalent of  http://www.vulnerable.site/welcome.html, so the payload would not even be seen by the server. We see, therefore, that this evasion technique causes the major browsers not to send the malicious payload to the server.

#右边的部分,仅是一个片段,并不是查询的一部分。Micrsoft IE(6.0)和Mozilla 不会将其发生给服务器,所以服务器仅能看到  http://www.vulnerable.site/welcome.html ,所以payload不会被服务器看到。所以这种规避技术使大多数浏览器不能将恶意payload发送到服务器。

Sometimes, it’s impossible to completely hide the payload: in [5] and[6], the malicious payload is part of the username, in a URL that looks like http://username@host/. The browser, in such case, sends a request with Authorization header containing the username (the malicious payload), and thus, the payload does arrive to the server (Base64 encoded - so IDS/IPS would need to decode this data first in order to observe the attack). Still, the server is not required to embed this payload in order for the XSS condition to occur.

有时,不可能完全隐藏payload:在文献[5]和[6]中,恶意payload在URL中作为用户名的一部分,看起来像 http://username@host/ 。在这种情况下,浏览器发送包含用户名(恶意代码伪造)的认证头的请求,那么恶意代码即可以到达服务器(Base64 编码 - 所以 IDS/IPS 为了阻止这类攻击需要解码这些数据)。

Obviously, in situations where the payload can be completely hidden, online detection (IDS) and prevention (IPS, web application firewalls) products cannot fully defend against this attack, assuming the vulnerable script can indeed be invoked from a known location. Even if the payload has to be sent to the server, in many cases it can be crafted in such way to avoid being detected, e.g. if a specific parameter is protected (e.g. the name parameter in the above example), then a slight variation of the attack may succeed: 

当然,在恶意代码被完全隐藏的情况,在线检测和防御产品不能阻止这种攻击,假设漏洞脚本需要在特定的位置调用。即使恶意代码已经发送至服务器,在很多情况下,以这种方式伪造避免被检测到,如某个特定的参数受到保护(如上个例子中的name参数),稍微进行下变化就可以成功攻击

  http://www.vulnerable.site/welcome.html?notname=<script>(document.cookie)</script>

A more strict security policy would require that the name parameter be sent (to avoid the above tricks with names and number sign). We can therefore send this:

如果采用了较为严格的安全措施,要求name参数必须被发送,避免上述修改名字和#的诡计,我们可以发送下面的形式:

  http://www.vulnerable.site/welcome.html?notname=
  <script>alert(document.cookie)<script>
&name=Joe

If the policy restricts the additional parameter name (e.g. to foobar), then the following variant would succeed:

如果限制附加了另外一个参数名(如 foobar),那么攻击形式可变为:

  http://www.vulnerable.site/welcome.html?foobar=
  name=<script>alert(document.cookie)<script>&name=Joe

Note that the ignored parameter (foobar) must come first, and it contains the payload in its value.

注意的是必须首先出现忽略的参数foobar,且其值中包含恶意代码。

The scenario in [7] is even better from the attacker’s perspective, since the full document.location is written to the HTML page (the Javascript code does not scan for a specific parameter name). Therefore, the attacker can completely hide the payload e.g. by sending:

文献[7]中的方案从攻击者的角度来看更好,因为document.location全部写进了HTML页面,所以,攻击者可以完全隐藏恶意代码,如发 送:

  /attachment.cgi?id=&action=
  foobar#<script>alert(document.cookie)</script>


Even if the payload is inspected by the server, protection can be guaranteed only if the request in its fullness is denied, or if the response is replaced with some error text. Consider [5] and [6] again, if the Authorization header is simply removed by an intermediate protection system, it has no effect as long as the original page is returned. Likewise, any attempt to sanitize the data on the server, either by removing the offending characters or by encoding them, is ineffective against this attack.

即使payload被服务器检测,只有请求信息被拒绝访问或者响应被错误信息取代,防护措施可以保证。考虑文献[5]和[6],如果认证头被防护系统移除,仅有的影响只是返回原来的页面。同样的,任何在服务器端验证数据,或者对可能的恶意字符进行过滤或者重编码,对该类攻击是无效的。

In the case of document.referrer , the payload is sent to the server through the Referer header. However, if the user’s browser, or an intermediate device eliminates this header, then there’s no trace of the attack - it may go completely unnoticed. 

在document.referrer例子中,payload通过Referer头发送到服务器端。但是,如果用户的浏览器或者中间设备消除了头,那么就无非跟踪该攻击  --- 攻击可以在不注意的情况下进行。

To generalize, traditional methods of:
  1. HTML encoding output data at the server side  HTML在服务器端编码输出数据
  2. Removing/encoding offending input data at the server side  在服务器端移除/编码可能的恶意输入数据
Do not work well against DOM Based XSS. 

Regarding automatic vulnerability assessment by way of fault injection (sometimes called fuzzing) won’t work, since products that use this technology typically evaluate the results according to whether the injected data is present in the response page or not (rather than execute the client side code in a browser context and observe the runtime effects). However, if a product is able to statically analyze a Javascript found in a page, then it may point out suspicious patterns (see below). And of course, if the product can execute the Javascript (and correctly populating the DOM objects), or simulate such execution, then it can detect this attack.

一般Fuzzing自动漏洞评估不可行,因为Fuzzing工具多通过评估注入数据是否显示在响应页面中来测定结果(而不是在浏览器中执行客户端代码并观察运行影响)。但是,如果产品能够静态分析页面中的Javascript,那么它可以给出可疑的模式。进而,如果该产品能够执行javascript 且能够准确的处理DOM对象,或者是模拟执行,那么可以检测这种攻击。

Manual vulnerability assessment using a browser would work because the browser would execute the client side (Javascript) code. Of course, a vulnerability assessment product may adopt this kind of technology and execute client side code to inspect the runtime effects.

人工漏洞分析可以起到效果,因为浏览器可以执行客户端代码。当然,漏洞分析产品可能采用这种技术并执行客户端代码来检测运行效果。

Effective defenses

1. Avoiding client side document rewriting, redirection, or other sensitive actions, using client side data. Most of these effects can be achieved by using dynamic pages (server side). 避免客户端document重写、重定向或者其他敏感操作、使用客户端数据。这些操作的效果可以通过服务器端的动态页面实现。

2. Analyzing and hardening the client side (Javascript) code. Reference to DOM objects that may be influenced by the user (attacker) should be inspected, including (but not limited to):

分析并硬编码客户端代码。可能被用户影响到的DOM对象引用应该被检测,包括但不限于这些:

  • document.URL
  • document.URLUnencoded
  • document.location (and many of its properties)
  • document.referrer
  • window.location (and many of its properties)

Note that a document object property or a window object property may be referenced syntactically in many ways - explicitly (e.g.window.location), implicitly (e.g. location), or via obtaining a handle to a window and using it (e.g. handle_to_some_window.location).

Special attention should be given to scenarios wherein the DOM is modified, either explicitly or potentially, either via raw access to the HTML or via access to the DOM itself, e.g. (by no means an exhaustive list, there are probably various browser extensions):

  • Write raw HTML, e.g.:
    • document.write(…)
    • document.writeln(…)
    • document.body.innerHtml=…
  • Directly modifying the DOM (including DHTML events), e.g.:
    • document.forms[0].action=… (and various other collections)
    • document.attachEvent(…)
    • document.create…(…)
    • document.execCommand(…)
    • document.body. … (accessing the DOM through the body object)
    • window.attachEvent(…)
  • Replacing the document URL, e.g.:
    • document.location=… (and assigning to location’s href, host and hostname)
    • document.location.hostname=…
    • document.location.replace(…)
    • document.location.assign(…)
    • document.URL=…
    • window.navigate(…)
  • Opening/modifying a window, e.g.:
    • document.open(…)
    • window.open(…)
    • window.location.href=… (and assigning to location’s href, host and hostname)
  • Directly executing script, e.g.:
    • eval(…)
    • window.execScript(…)
    • window.setInterval(…)
    • window.setTimeout(…)

To continue the above example, an effective defense can be replacing the original script part with the following code, which verifies that the string written to the HTML page consists of alphanumeric characters only: 

  <SCRIPT>
  var pos=document.URL.indexOf("name=")+5;
  var name=document.URL.substring(pos,document.URL.length);
  if (name.match(/^[a-zA-Z0-9]$/))
  {
       document.write(name);
  }
  else
  {
        window.alert("Security error");
  }
  </SCRIPT>

Such functionality can (and perhaps should) be provided through a generic library for sanitation of data (i.e. a set of Javascript functions that perform input validation and/or sanitation). The downside is that the security logic is exposed to the attackers - it is embedded in the HTML code. This makes it easier to understand and to attack it. While in the above example, the situation is very simple, in more complex scenarios wherein the security checks are less than perfect, this may come to play.

3. Employing a very strict IPS policy in which, for example, page welcome.html is expected to receive a one only parameter named “name”, whose content is inspected, and any irregularity (including excessive parameters or no parameters) results in not serving the original page, likewise with any other violation (such as an Authorization header or Referer header containing problematic data), the original content must not be served. And in some cases, even this cannot guarantee that an attack will be thwarted.


A note about redirection vulnerabilities

The above discussion is on XSS, yet in many cases, merely using a client side script to (insecurely) redirect the browser to another location is considered vulnerability in itself. In such cases, the above techniques and observations still apply.


Conclusion

While most XSS attacks described in public do indeed depend on the server physically embedding user data into the response HTML pages, there are XSS attacks that do not rely on server side embedding of the data. This has material significance when discussing ways to detect and prevent XSS. To date, almost all detection and prevention techniques discussed in public assume that XSS implies that the server receives malicious user input and embeds it in an HTML page. Since this assumption doesn’t hold (or only very partially holds) for the XSS attacks described in this paper, many of the techniques fail to detect and prevent this kind of attacks.

The XSS attacks that rely on server side embedding of user data are categorized into “non-persistent” (or “reflected”) and “persistent” (or “stored”). It is thus suggested that the third kind of XSS, the one that does not rely on server side embedding, be named “DOM Based XSS”. 

Here is a comparison between standard XSS and DOM Based XSS:

 

Standard XSS

DOM Based XSS

Root cause

Insecure embedding of client input in HTML outbound page

Insecure reference and use (in a client side code) of DOM objects that are not fully controlled by the server provided page

Owner

Web developer (CGI)

Web developer (HTML)

Page nature

Dynamic only (CGI script)

Typically static (HTML), but not necessarily.

Vulnerability Detection

  • Manual Fault injection
  • Automatic Fault Injection
  • Code Review (need access to the page source)
  • Manual Fault Injection
  • Code Review (can be done remotely!)

Attack detection

  • Web server logs
  • Online attack detection tools (IDS, IPS, web application firewalls)

If evasion techniques are applicable and used - no server side detection is possible

Effective defense

  • Data validation at the server side
  • Attack prevention utilities/tools (IPS, application firewalls)
  • Data validation at the client side (in Javascript)
  • Alternative server side logic

References

Note: the URLs below are up to date at the time of writing (July 4th, 2005). Some of these materials are live documents, and as such may be updated to reflect the insights of this paper.

[1] “CERT Advisory CA-2000-02 - Malicious HTML Tags Embedded in Client Web Requests”, CERT, February 2nd, 2000

http://www.cert.org/advisories/CA-2000-02.html

[2] “Cross Site Scripting Explained”, Amit Klein, June 2002

http://crypto.stanford.edu/cs155/CSS.pdf

[3] “Cross-Site Scripting”, Web Application Security Consortium, February 23rd, 2004

http://www.webappsec.org/projects/threat/classes/cross-site_scripting.shtml

[4] “Cross Site Scripting (XSS) Flaws”, The OWASP Foundation, updated 2004 http://www.owasp.org/documentation/topten/a4.html

[5] “Thor Larholm security advisory TL#001 (IIS allows universal CrossSiteScripting)”, Thor Larholm, April 10th, 2002

http://www.cgisecurity.com/archive/webservers/iis_xss_4_5_and_5.1.txt

(see also Microsoft Security Bulletin MS02-018http://www.microsoft.com/technet/security/bulletin/MS02-018.mspx)

[6] “ISA Server Error Page Cross Site Scripting”, Brett Moore, July 16th, 2003 http://www.security-assessment.com/Advisories/ISA%20XSS%20Advisory.pdf

(see also Microsoft Security Bulletin MS03-028http://www.microsoft.com/technet/security/bulletin/MS03-028.mspx and a more elaborate description in “Microsoft ISA Server HTTP error handler XSS”, Thor Larholm, July 16th, 2003http://www.securityfocus.com/archive/1/329273)

[7] “Bugzilla Bug 272620 - XSS vulnerability in internal error messages”, reported by Michael Krax, December 23rd, 2004

https://bugzilla.mozilla.org/show_bug.cgi?id=272620

[8] “The Cross Site Scripting FAQ”, Robert Auger, May 2002 (revised August 2003)

http://www.cgisecurity.com/articles/xss-faq.shtml


About the author

Amit Klein is a renowned web application security researcher. Mr. Klein has written many research papers on various web application technologies--from HTTP to XML, SOAP and web services--and covered many topics--HTTP request smuggling, insecure indexing, blind XPath injection, HTTP response splitting, securing .NET web applications, cross site scripting, cookie poisoning and more. His works have been published in Dr. Dobb's Journal, SC Magazine, ISSA journal, and IT Audit journal; have been presented at SANS and CERT conferences; and are used and referenced in many academic syllabi. 

Mr. Klein is a WASC (Web Application Security Consortium) member. 

The current copy of this document can be here:
http://www.webappsec.org/articles/

Information on the Web Application Security Consortium's Article Guidelines can be found here:

http://www.webappsec.org/projects/articles/guidelines.shtml


A copy of the license for this document can be found here:
http://www.webappsec.org/projects/articles/license.shtml

阅读更多
想对作者说点什么? 我来说一句

没有更多推荐了,返回首页

关闭
关闭
关闭