直接引用:
https://www.cnblogs.com/farewell-farewell/p/5902899.html
解析url地址正则表达式:
regexp = (r'^(?P<scheme>[a-z][\w\.\-\+]+)?:(//)?'
r'(?:(?P<username>\w+):(?P<password>[\w\W]+)@|)'
r'(?P<domain>[\w-]+(?:\.[\w-]+)*)(?::(?P<port>\d+))?/?'
r'(?P<path>\/[\w\.\/-]+)?(?P<query>\?[\w\.*!=&@%;:/+-]+)?'
r'(?P<fragment>#[\w-]+)?$')
match = re.search(regexp, url.strip(), re.U)
if match is None:
raise ValueError('Incorrent url: {0}'.format(url))
url_parts = match.groupdict()
url='https://blog.csdn.net/weixin_40907382/article/明细/79654372'
print(url_parts):
{'scheme': 'https', 'username': None, 'password': None, 'domain': 'blog.csdn.net', 'port': None, 'path': '/weixin_40907382/article/明细/79654372', 'query': None, 'fragment': None}