可以使用python中的内置模块urllib.parse中的urlparse
以下为urlparse说明
Help on function urlparse in module urllib.parse:
urlparse(url, scheme='', allow_fragments=True)
Parse a URL into 6 components:
<scheme>://<netloc>/<path>;<params>?<query>#<fragment>
Return a 6-tuple: (scheme, netloc, path, params, query, fragment).
Note that we don't break the components up in smaller bits
(e.g. netloc is a single string) and we don't expand % escapes.
urlparse将其解析为6个部分
获取www.baidu.com只需要将<scheme>://<netloc>/<path>;<params>?<query>#<fragment>
中netloc部分取出即可
netloc只会取到://到第一个/之间的字符串
代码演示:
import urllib.parse
url = 'https://www.baidu.com/?tn=98012088_5_dg&ch=12'
sp = urllib.parse.urlparse(url)
print(sp.netloc)
www.baidu.com