简单利用Python读取网页源代码
第一步:利用程序读取网页(读取哔哩哔哩)
rom urllib.request import urlopen
url = "https://www.so.com/s?ie=utf-8&src=hao_isearch2_cube&q=%E5%93%94%E5%93%A9%E5%93%94%E5%93%A9&eci=13201445&nlpv=lab606sc21g"
resp = urlopen(url)
resp.read()
第二步:代码字符转中文字符
from urllib.request import urlopen
url = "https://www.so.com/s?ie=utf-8&src=hao_isearch2_cube&q=%E5%93%94%E5%93%A9%E5%93%94%E5%93%A9&eci=13201445&nlpv=lab606sc21g"
resp = urlopen(url)
print(resp.read().decode("utf-8"))
第三步:读取网页源代码转为html文件
from urllib.request import urlopen
url = "https://www.so.com/s?ie=utf-8&src=hao_isearch2_cube&q=%E5%93%94%E5%93%A9%E5%93%94%E5%93%A9&eci=13201445&nlpv=lab606sc21g"
resp = urlopen(url)
with open("哔哩哔哩.html",mode="w",encoding="utf-8") as f:
f.write(resp.read().decode("utf-8"))
print("over!")