设置cookie:手动设置cookie值的方式有两种 一种通过往请求头headers里面添加cookie 另一种通过cookiejar设置cookie值 本文采取往请求头headers里面添加cookie
1-构造请求头headers
header={
'user-agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36',
'cookie':cookie
}
2-创建session对象 将cookie值存入 方便之后不需要重复写入cookie
sess = requests.Session()
sess.headers = header
url = 'https://www.zhihu.com/hot'
r = sess.get(url)
3-接下来就是获取知乎热搜新闻上我们需要的字段 如热搜标题 热搜热度 热搜URL 热搜图片
selector = etree.HTML(r.text)
eles = selector.cssselect('div.HotList-list>section')
for index,ele in enumerate(eles):
title = ele.xpath('./div[@class="HotItem-content"]/a/h2/text()')[0]
url = ele.xpath('./div[@class="HotItem-content"]/a/@href')[0]
hot = ele.xpath