python模拟登陆遇到重定向_scrapy模拟登陆知乎出现重定向无法登陆问题

最新推荐文章于 2022-03-30 15:18:51 发布

weixin_39597868

最新推荐文章于 2022-03-30 15:18:51 发布

阅读量143

点赞数

文章标签： python模拟登陆遇到重定向

当我使用scrapy想进行模拟登陆, 然后抓取首页的问题和答案时, 一直显示重定向问题

pythonfrom scrapy.contrib.spiders import CrawlSpider, Rule

from scrapy.selector import Selector

from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor

from scrapy.http import Request, FormRequest

from zhihu.items import ZhihuItem

class ZhihuSipder(CrawlSpider) :

name = "zhihu"

allowed_domains = ["www.zhihu.com"]

start_urls = [

"http://www.zhihu.com"

]

rules = (

Rule(SgmlLinkExtractor(allow = r'http://www\.zhihu\.com/question/\d+'), callback = 'parse_page'),

)

def start_requests(self):

return [Request("https://www.zhihu.com/login", callback = self.post_login)]

#FormRequeset出问题了

def post_login(self, response):

print 'Preparing login'

xsrf = Selector(response).xpath('//input[@name="_xsrf"]/@value').extract()[0]

print xsrf

##############

return [FormRequest.from_response(response, #"http://www.zhihu.com/login",

formdata = {

'_xsrf': xsrf,

'email': '@qq.com',

'password': '123456',

'rememberme': 'y',

},

callback = self.parse_page

)]

def parse_page(self, response):

problem = Selector(response)

item = ZhihuItem()

item['url'] = response.url

item['title'] = problem.xpath('//h2[@class="zm-item-title zm-editable-content"]/text()').extract()

item['description'] = problem.xpath('//div[@class="zm-editable-content"]/text()').extract()

item['answer']= problem.xpath('//div[@class=" zm-editable-content clearfix"]/text()').extract()

return item

使用命令运行爬虫, 可以正确打印xsrf, 但无法成功登陆

$ scrapy crawl zhihu

错误结果如下

2014-12-18 14:45:11+0800 [zhihu] INFO: Spider opened

2014-12-18 14:45:11+0800 [zhihu] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)

2014-12-18 14:45:11+0800 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023

2014-12-18 14:45:11+0800 [scrapy] DEBUG: Web service listening on 127.0.0.1:6080

2014-12-18 14:45:11+0800 [zhihu] DEBUG: Redirecting (301) to from

2014-12-18 14:45:11+0800 [zhihu] DEBUG: Redirecting (302) to from

2014-12-18 14:45:12+0800 [zhihu] DEBUG: Crawled (200) (referer: None)

Preparing login

d117e46de0dcc5e8ee2f0c7031fcafe9

2014-12-18 14:45:12+0800 [zhihu] DEBUG: Redirecting (302) to from

2014-12-18 14:45:12+0800 [zhihu] DEBUG: Filtered duplicate request: - no more duplicates will be shown (see DUPEFILTER_DEBUG to show all duplicates)

2014-12-18 14:45:12+0800 [zhihu] INFO: Closing spider (finished)

2014-12-18 14:45:12+0800 [zhihu] INFO: Dumping Scrapy stats:

希望得到解答, 为什么不能成功登陆, 非常疑惑, 非常感谢

weixin_39597868

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。