04 python 爬虫cookie的处理

一.简介

爬虫无法像浏览器一样自动存取和发送cookie,需要我们手动处理

二.处理cookie方法

1.用 requests.utils.dict_from_cookiejar() 把返回的cookies转换成字典

import requests
#得到cookie
def login():
    login_url = 'http://www.xxx.com/login
    headers = {
        "Accept": "application/json, text/javascript, */*; q=0.01"
    }
    body = {
        "usercode": "liuzz05@****.com",
        "password": "123456"
    }
    try:
        res = requests.post(url=login_url, headers=headers, data=body)
        cookies = res.cookies

        cookie = requests.utils.dict_from_cookiejar(cookies)

        return cookie
    except Exception as err:
        print('获取cookie失败:\n{0}'.format(err))

#使用cookie
import requests
def get_data():
    cookie = login()
    res = requests.get(url=get_data_url, cookies=cookie)
    print(res.text)

2.遍历cookies的键值,拼接成cookie格式

import requests

def login():
    login_url = 'http://www.xxx.com/login
    headers = {
        "Accept": "application/json, text/javascript, */*; q=0.01"
    }
    body = {
        "usercode": "liuzz05@****.com",
        "password": "123456"
    }
    try:
        res = requests.post(url=login_url, headers=headers, data=body)
        cookies = res.cookies.items()

        cookie = ''
        for name, value in cookies:
            cookie += '{0}={1};'.format(name, value)

        return cookie
    except Exception as err:
        print('获取cookie失败:\n{0}'.format(err))



#使用cookie

def get_data():
    cookie = login()
    headers = {
        "cookie": cookie
    }
    res = requests.get(url=get_data_url, headers=headers)
    print(res.text)

3.直接拼接cookies,这种方法比较傻,前提是要知道cookies的键

import requests
#得到cookie
def login():
    login_url = 'http://www.xxx.com/login
    headers = {
        "Accept": "application/json, text/javascript, */*; q=0.01"
    }
    body = {
        "usercode": "liuzz05@****.com",
        "password": "123456"
    }
    try:
        res = requests.post(url=login_url, headers=headers, data=body)
        cookies = res.cookies

        phpsessid = cookies['phpsessid']
        env_orgcode = cookies['env_orgcode']
        acw_tc = cookies['acw_tc']
        aliyungf_tc = cookies['aliyungf_tc']
        last_env = cookies['last_env']

        cookie = 'phpsessid={0};env_orgcode={1};acw_tc{2};aliyungf_tc={3};last_env={4}'.format(
            phpsessid, env_orgcode, acw_tc, aliyungf_tc, last_env
        )

        return cookie
    except Exception as err:
        print('获取cookie失败:\n{0}'.format(err))
        
#使用cookie
def get_data():
    cookie = login()
    headers = {
        "cookie": cookie
    }
    res = requests.get(url=get_data_url, headers=headers)
    print(res.text)

4.使用session,正道的光(神器)

使用seesion存储cookie,不再需麻烦的cookie,只需要把请求交给session就可以了

#初始化session
session = requests.Session()
#以后请求直接用session请求就可以了
session.get()
seesion.post()
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值