转载于:https://blog.csdn.net/nciaebupt/article/details/7644757
首先导入模块,用help查看相关文档
>>> from urlparse import urljoin
>>> help(urljoin)
Help on function urljoin in module urlparse:
urljoin(base, url, allow_fragments=True)
Join a base URL and a possibly relative URL to form an absolute
interpretation of the latter.
1 |
|
接下来,看几个例子,从例子中发现规律。
>>> urljoin("http://www.google.com/1/aaa.html","bbbb.html")
'http://www.google.com/1/bbbb.html'
>>> urljoin("http://www.google.com/1/aaa.html","2/bbbb.html")
'http://www.google.com/1/2/bbbb.html'
>>> urljoin("http://www.google.com/1/aaa.html","/2/bbbb.html")
'http://www.google.com/2/bbbb.html'
>>> urljoin("http://www.google.com/1/aaa.html","http://www.google.com/3/ccc.html")
'http://www.google.com/3/ccc.html'
>>> urljoin("http://www.google.com/1/aaa.html","http://www.google.com/ccc.html")
'http://www.google.com/ccc.html'
>>> urljoin("http://www.google.com/1/aaa.html","javascript:void(0)")
'javascript:void(0)'
规律不难发现,但是并不是万事大吉了,还需要处理特殊情况,如链接是其本身,链接中包含无效字符等
1 |
|