爬虫：获取多次跳转后的页面url

最新推荐文章于 2023-01-29 09:49:04 发布

weixin_30776545

最新推荐文章于 2023-01-29 09:49:04 发布

阅读量1.5k

点赞数

文章标签：爬虫 python php

原文链接：http://www.cnblogs.com/wangyuyu/p/4046425.html

版权

　　案例：页面中的一个链接，审核元素得到的地址是“http://iphone.myzaker.com/l.php?l=54472e161bc8e0fd4a8b4573” ，点击之后页面自动跳转到另一个地址“

http://mp.weixin.qq.com/s?__biz=MjM5NjExNjI4MA==&mid=202695292&idx=1&sn=8638f15ba27381236641077a77d43e03&scene=4#wechat_redirect”。

wget 分析地址

apples-air:mzread apple$ wget http://iphone.myzaker.com/l.php?l=54472e161bc8e0fd4a8b4573
--2014-10-23 17:27:17--  http://iphone.myzaker.com/l.php?l=54472e161bc8e0fd4a8b4573
Resolving iphone.myzaker.com... 106.186.30.108
Connecting to iphone.myzaker.com|106.186.30.108|:80... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: http://mp.weixin.qq.com/s?__biz=MjM5NjExNjI4MA==&mid=202695292&idx=8&sn=f39c6c5dc2329e41eb58c71b53ba8a50&scene=4#wechat_redirect [following]
--2014-10-23 17:27:19--  http://mp.weixin.qq.com/s?__biz=MjM5NjExNjI4MA==&mid=202695292&idx=8&sn=f39c6c5dc2329e41eb58c71b53ba8a50&scene=4
Resolving mp.weixin.qq.com... 203.205.143.142
Connecting to mp.weixin.qq.com|203.205.143.142|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 42622 (42K) [text/html]

　　可以看到访问原地址之后，有一个302的跳转。

那么问题来了，怎么样获取到跳转之后的页面地址？

　　办法：利用方法Net::HTTP.get_response。

　　具体代码：

 require 'net/http'

 res=Net::HTTP.get_response(URI('http://iphone.myzaker.com/l.php?l=54472e161bc8e0fd4a8b4573'))

 res['location']
=> "http://mp.weixin.qq.com/s__biz=MjM5NjExNjI4MA==&mid=202695292&idx=1&sn=8638f15ba27381236641077a77d43e03&scene=4#wechat_redirect"

　　这样就可以得到跳转之后页面的url。　　

转载于:https://www.cnblogs.com/wangyuyu/p/4046425.html