学python爬虫第三天
requests携带cookie
这个就需要手动去网站里面找cookie
然后复制到代码里,超长cookie
import requests
# 定义请求的URL
url = 'https://www.lmonkey.com/my/order'
# 定义请求头信息
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36 Edg/83.0.478.54',
'cookie':'UM_distinctid=172d4bea13133d-00418c875ed70e-79657964-144000-172d4bea13236a; accessId=ad8e1ca0-2091-11ea-af9d-6523a0f144a7; Hm_lvt_676e52e2eddd764819cab505b21e9ee8=1592707162,1592709404,1592788421; CNZZDATA1277679765=1464177839-1592702676-https%253A%252F%252Fwww.baidu.com%252F%7C1592784330; qimo_seosource_ad8e1ca0-2091-11ea-af9d-6523a0f144a7=%E7%99%BE%E5%BA%A6%E6%90%9C%E7%B4%A2; qimo_seokeywords_ad8e1ca0-2091-11ea-af9d-6523a0f144a7=; href=https%3A%2F%2Fwww.lmonkey.com%2F; XSRF-TOKEN=eyJpdiI6IlJzaUNKdU5KdlROZnlFcjFRRzdxckE9PSIsInZhbHVlIjoiOEY0UW9ZVXRobXNWSFcrcXF6eWJCRStRRnNub05tZUsxbENSTWc0dnh6TFdYSjRJRm00OEhxanVGRUZZbmQxMyIsIm1hYyI6ImQ5NGZkYmJhN2Y1NmQ5OTdlOTdkZTNjMTlmNWJhNTE0YTk2N2VlODkxOGJiZjU4OWU2MGUxNjU5ZTFkYmFkMjUifQ%3D%3D; _session=eyJpdiI6IjRkcmxJSVhUR0VZTU9maWNBcDNPZUE9PSIsInZhbHVlIjoid1hGdW0xdzN2OWJ6eCtkRk1wU2pKem9cL0NvYjBlZ2U5TU1Ha1dWNDNTUHFcL2FRMjlYa0JZZVVzQXkzYkQ0cjBFIiwibWFjIjoiYWMxMGFhNzExYmY1N2ZhNjMyYjAxZTM2M2JkMDA3Mzk4YTJjYTNmM2U1M2QwY2UxODJlMTRkMGU0NDE0YmQwNSJ9; Hm_lpvt_676e52e2eddd764819cab505b21e9ee8=1592788545; pageViewNum=23'
}
# 发起get请求
res = requests.get(url=url,headers=headers)
# 获取相应状态码
code = res.status_code
print(code)
# 响应成功后把响应内容写到文件中
if code == 200:
with open('./test.html','w',encoding='utf-8') as fp:
fp.write(res.text)
使用requests中的session
import requests
# 需要请求的目标地址
url = 'http://www.rrys2019.com/user/user'
# 登录请求的地址
loginurl = 'http://www.rrys2019.com/User/Login/ajaxLogin'
# 请求头
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36 Edg/83.0.478.54'
}
# 如果希望爬虫程序主动记录cookie并且携带cookie,那么在使用requests之前先使用session方法
# 并且使用session方法返回的对象发送请求即可
rep = requests.session()
# 登录请求的数据
data = {
'account': 'yichuan@itxdl.cn',
'password': 'pyTHON123',
'remember': '1',
'url_back': 'http://www.rrys2019.com/'
}
# 发送登录请求
res = rep.post(url=loginurl,headers=headers,data=data)
# 判断状态
code = res.status_code
print('code:',code)
if code == 200:
# 发送新的请求,去获取目标数据
res = rep.get(url=url,headers=headers)
with open('rr.html','w',encoding='utf-8') as fp:
fp.write(res.text)
这个就是需要用到session方法,然后设置一个cookie,
small tips:
-
User-Agent这个嘛 可以百度
这个网址也可(https://finthon.com/python-spider-headers/)
当然还都没有亲试 -
with open('rr.html','w',encoding='utf-8') as fp: fp.write(res.text)
这个write()函数读入的必须是str类型,所open()函数中,需要有encoding=‘utf-8’ -
请求的data数据的话,就前端可以get的
The last
今天不想看了,不想学了,就看了半小时视频吧
明天好好的行不
憨憨,你要学习
你要进步
拉拉阿拉啦