python重定向网页_【Python网页分析】httplib库的重定向处理

1. 网页处理

下图是实际操作抓包分析结果,其他的步骤不再描述。

1、从选定的POST /main.aspx开始

2、后面服务器回复302重定向到/cd_chose.aspx页面

3、抓包数据有GET重定向URL,GET css和js文件不再赘述

4、POST到/cd_chose.aspx

L3Byb3h5L2h0dHAvaW1hZ2VzMjAxNS5jbmJsb2dzLmNvbS9ibG9nLzk1NzI0Ny8yMDE2MDYvOTU3MjQ3LTIwMTYwNjE4MTg1NzU5NDQ4LTE5NTQ1Nzc1MS5wbmc=.jpg

2. Python模拟

2.1 抓包分析,后面的GET方法发送不去

L3Byb3h5L2h0dHAvaW1hZ2VzMjAxNS5jbmJsb2dzLmNvbS9ibG9nLzk1NzI0Ny8yMDE2MDYvOTU3MjQ3LTIwMTYwNjE4MTg1ODAwOTc5LTIwMDY0NDg0OTYucG5n.jpg

再查看IE上抓包结果

L3Byb3h5L2h0dHAvaW1hZ2VzMjAxNS5jbmJsb2dzLmNvbS9ibG9nLzk1NzI0Ny8yMDE2MDYvOTU3MjQ3LTIwMTYwNjE4MTg1ODAyNDk1LTEyNTg5ODAwMy5wbmc=.jpg

没有出现GET方法

L3Byb3h5L2h0dHAvaW1hZ2VzMjAxNS5jbmJsb2dzLmNvbS9ibG9nLzk1NzI0Ny8yMDE2MDYvOTU3MjQ3LTIwMTYwNjE4MTg1ODA0NDMyLTQ1MTE2ODk5Ny5wbmc=.jpg

怀疑是需要直接POST,尝试了之后仍然失败,但仔细看了下POST内容,头里面有GET头,由于不太了解IE的头显示,不再深究。

L3Byb3h5L2h0dHAvaW1hZ2VzMjAxNS5jbmJsb2dzLmNvbS9ibG9nLzk1NzI0Ny8yMDE2MDYvOTU3MjQ3LTIwMTYwNjE4MTg1ODA1NzkyLTk0MjQ2ODMxOS5wbmc=.jpg

2.2 检查消息格式

由于GET这个重定向页面之前定义了HTTP头,

L3Byb3h5L2h0dHAvaW1hZ2VzMjAxNS5jbmJsb2dzLmNvbS9ibG9nLzk1NzI0Ny8yMDE2MDYvOTU3MjQ3LTIwMTYwNjE4MTg1ODA3MzU0LTEzNTQ1OTA4OTMucG5n.jpg

对比网页上实际操作成功发送的头,发现我在Python中多定义了一个头”Content-Type",主要是前面的POST方法需要和头

实际流程里面,前面其他GET消息需要这个头,但本消息中确实不需要这个头。

L3Byb3h5L2h0dHAvaW1hZ2VzMjAxNS5jbmJsb2dzLmNvbS9ibG9nLzk1NzI0Ny8yMDE2MDYvOTU3MjQ3LTIwMTYwNjE4MTg1ODA5NDAxLTE1NDkwNTUyNzAucG5n.jpg

去掉这个头

查看Python的消息流程正常

L3Byb3h5L2h0dHAvaW1hZ2VzMjAxNS5jbmJsb2dzLmNvbS9ibG9nLzk1NzI0Ny8yMDE2MDYvOTU3MjQ3LTIwMTYwNjE4MTg1ODExNDYzLTI4NzU4MjI1OS5wbmc=.jpg

这个问题由于自己http基础不踏实,遇到问题不太确定方向,总觉得重定向流程有什么其他的复杂处理。耽搁了很多时间,

结果其实就只是一个头的问题。

最后附上封装的http get和post方法,调用的httplib库,比较灵活方便,可以根据前端js代码,模仿自己生成一些特殊字段认证服务器。

def http_get(self,connDefault=None,url=‘‘,bodyFlag=False,refererFresh=False,referer = ‘‘):

status,infor = 1,‘‘

if connDefault is None:

conn = HTTPConnection(self.host,timeout=60)

else:

conn = connDefault

try:

print ‘http_get -> enter to get ‘,url

start = time.time()

print ‘http_get -> connect init OK‘

conn.request(‘GET‘,url,headers=self.headers)

print ‘http_get -> wait the response...‘

response = conn.getresponse()

end = time.time()

print "http_get -> info:",end - start,response.status

print ‘http_get -> response headers‘ ,response.getheaders()

#状态码

status = response.status

if status != 200:

print ‘http_get -> http status error‘,status

infor = ‘error‘

else:

#获取Cookie,格式如下ASP.NET_SessionId=pzt0bs55tc2fjrbv0canht45; path=/; HttpOnly

cookie=response.getheader(‘Set-Cookie‘,‘‘)

#print "http_get -> cookie -> ",cookie

"""

Cookie叠加

"""

if cookie != ‘‘:

#cookie键值分两种类型

print ‘http_get -> peer Set-Cookie‘ , cookie

pattern = re.compile(r‘(key=[\w=+/]+;|ASP.NET_SessionId=[\w=+/]+;)‘)

_list = pattern.search(cookie)

#print ‘http_get -> _list‘,_list

if _list is not None:

#print ‘http_get -> _list‘ ,url,_list.groups()

oCookie = self.headers.get(‘Cookie‘,‘‘)

if oCookie == ‘‘:

self.headers["Cookie"] = str(_list.groups()[0][:-1])

else:

self.headers["Cookie"] = oCookie + ‘;‘ + str(_list.groups()[0][:-1])

print ‘http_get -> request Cookie‘ ,self.headers["Cookie"]

else:

pass

else:

pass

"""

更新Referer

"""

if refererFresh:

if referer != ‘‘:

self.headers["Referer"] = "http://" + self.host + referer

else:

self.headers["Referer"] = "http://" + self.host + url

#获取编码格式,gzip编码会在头中显示定义

content_encoding = response.getheader(‘Content-Encoding‘,‘‘)

if bodyFlag:

"""

gzip解码

"""

if content_encoding == ‘gzip‘:

buf = StringIO(response.read())

infor = GzipFile(fileobj=buf).read()

else:

infor = response.read()

except Exception,ex:

print ‘http_get -> error:‘,ex

status,infor = 1,ex

finally:

if connDefault is None:

conn.close()

return status,infor

def http_post(self,connDefault=None,url=‘‘,PostStr=‘‘):

status,response = 1,‘‘

try:

headers = deepcopy(self.headers)

headers["Content-Type"] ="application/x-www-form-urlencoded"

start = time.time()

if connDefault is None:

conn = HTTPConnection(self.host,timeout=60)

else:

conn = connDefault

headers["Content-Length"] = len(PostStr)

conn.request(‘POST‘,url,PostStr,headers=headers)

response = conn.getresponse()

end = time.time()

print "http_post info:",end - start,response.status

#重定向

if response.status == 302:

Location=response.getheader(‘Location‘,‘‘)

status,response = 302,Location

#正常提交

elif response.status == 200:

status,response = 200,‘‘

else:

status,response = response.status,‘does not support‘

except Exception,ex:

print ‘http_post -> error:‘,ex

status,response = 1,ex

finally:

if connDefault is None:

conn.close()

return status,response

时间: 06-16

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值