搜索爬虫
- 目录页
- 构造参数
- 参数1:url拆分=原址去除搜索参数+kw参数
- 参数2:headers() ua伪装浏览器信息
- 参数3:params= 参数字典
- 保存生成数据信息
# -*- coding: utf-8 -*-
import requests
url='https://www.baidu.com/s?'
kw=input("Enter a word:")
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3775.400 QQBrowser/10.6.4208.400'}
param={
'wd':kw
}
r=requests.get(url=url,params=param,headers=headers)
r=r.text
filename="{}.html".format(kw)
with open(filename,'w',encoding='utf8') as f:
f.write(r)
print(filename,"ok")
加油一起努力学习吧!