新建douban项目
通过charles 获取到cookie后
粘贴进入pycharm, 尼玛格式混乱,无奈手工添加冒号,同时将=号换成:号。。。
cookies = {
"bid": "EP3q1inffgg",
"__yadk_uid" : "bIJElZgmoiojxX9iPANYuW69wOsqZtMC",
"ll" : "118282",
"_vwo_uuid_v2" : "FC33D3A160F773772BD0D3615F3DCAC3|744bce8c3d02e8ebcfe5c58fc91f033c",
"ps" : "y",
"push_noty_num" : "0",
"push_doumail_num" : "0",
"__utmv" : "30149280.18179",
"ap" : "1",
"_ga" : "GA1.2.980617011.1514707464",
"_gid" : "GA1.2.394721026.1532612698",
"ue" : "xxxxxx@qq.com",
"douban-profile-remind" : "1",
"__utma" : "30149280.980617011.1514707464.1532625684.1532662495.8",
"__utmc" : "30149280",
"__utmz" : "30149280.1532662495.8.6.utmcsr=baidu|utmccn=(organic)|utmcmd=organic",
"dbcl2" : "181794852:W8i5o4WaLXE",
"ck" : "Hm8k",
"_pk_ref.100001.8cb4" : "%5B%22%22%2C%22%22%2C1532694810%2C%22https%3A%2F%2Fwww.baidu.com%2Flink%3Furl%3DrYWBcWG4VS_zim1mZVy7wTOi2J8PrON8INtoQurnDx6Vo3yiM1o3P12FUeOjmP36 % 26wd % 3D % 26eqid % 3D87bd504200001569000000035b5b1115 % 22 % 5D",
"_pk_id.100001.8cb4" : "bd09bb9234658e2e.1514707463.8.1532694810.1532665176.",
"_pk_ses.100001.8cb4" : "*",
"__ads_session" : "46a/xu60Igm+tEEqLgA="
}
注意cookie粘贴后,会多出一些空格,会导致cookie无效;
源码如下:
# -*- coding: utf-8 -*-
import scrapy
'''
豆瓣cookie登录
'''
class DbCookieSpider(scrapy.Spider):
name = 'db_cookie'
allowed_domains = ['douban.com']
start_urls = ['https://www.douban.com/']
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36"
}
cookies = {
"bid": "EP3q1inffgg",
"__yadk_uid" : "bIJElZgmoiojxX9iPANYuW69wOsqZtMC",
"ll" : "118282",
"_vwo_uuid_v2" : "FC33D3A160F773772BD0D3615F3DCAC3|744bce8c3d02e8ebcfe5c58fc91f033c",
"ps" : "y",
"push_noty_num" : "0",
"push_doumail_num" : "0",
"__utmv" : "30149280.18179",
"ap" : "1",
"_ga" : "GA1.2.980617011.1514707464",
"_gid" : "GA1.2.394721026.1532612698",
"ue" : "xxxxxxx@qq.com",
"douban-profile-remind" : "1",
"__utma" : "30149280.980617011.1514707464.1532625684.1532662495.8",
"__utmc" : "30149280",
"__utmz" : "30149280.1532662495.8.6.utmcsr=baidu|utmccn=(organic)|utmcmd=organic",
"dbcl2" : "181794852:W8i5o4WaLXE",
"ck" : "Hm8k",
"_pk_ref.100001.8cb4" : "%5B%22%22%2C%22%22%2C1532694810%2C%22https%3A%2F%2Fwww.baidu.com%2Flink%3Furl%3DrYWBcWG4VS_zim1mZVy7wTOi2J8PrON8INtoQurnDx6Vo3yiM1o3P12FUeOjmP36 % 26wd % 3D % 26eqid % 3D87bd504200001569000000035b5b1115 % 22 % 5D",
"_pk_id.100001.8cb4" : "bd09bb9234658e2e.1514707463.8.1532694810.1532665176.",
"_pk_ses.100001.8cb4" : "*",
"__ads_session" : "46a/xu60Igm+tEEqLgA="
}
def start_requests(self):
return [scrapy.FormRequest(url="https://www.douban.com/people/xxxxx/", headers=self.headers, cookies=self.cookies, callback=self.parse_page)]
def parse_page(self, response):
print(response.status)
with open("dbcookie.json", "w", encoding="utf-8") as f:
f.write(response.text)
最后dbcookie.json 文件会保存douban的个人主页信息,success!