综合前面知识来尝试抓取页面
第一步添加百度搜索内容:
from urllib import request
from urllib import parse
from fake_useragent import UserAgent
url = 'http://www.baidu.com/s?wd={}'
headers= {'User-Agent':str(UserAgent().random)}
#想要搜索的内容
word = input('请输入搜索内容:')
params = parse.quote(word)
full_url = url.format(params)
requse = request.Request(url=full_url,headers=headers)
resquse = request.urlopen(requse)
#获取响应内容
html = resquse.read().decode("utf-8")
print(html)
输入CSDN会输出html代码:
添加html页面保存功能:
filename = word + '.html'
with open(filename,'w', encoding='utf-8') as file:
file.write(html)
可以看到以及保存到本地: