解决重定向问题

最新推荐文章于 2023-08-07 13:49:27 发布

weixin_30477797

最新推荐文章于 2023-08-07 13:49:27 发布

阅读量1k

点赞数

文章标签： python javascript xhtml ViewUI

原文链接：http://www.cnblogs.com/liusx0303/p/10132878.html

版权

参考链接：https://blog.csdn.net/changjiale110/article/details/76145585

进度1：在库request下用get方法解决获取重定向后链接的问题。

尝试1：

模拟维基百科访问服务器，发起请求，获得请求后的链接。

步骤：

（1）查看网页headers

F12->netwaork->dot->查看headers

（2）将headers信息写进代码，然后headers向服务器发起请求获取url的内容，输出请求后的状态码和请求后的链接。

代码：通过设置allow_redirects为false禁用重定向

import requests
header = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding':'gzip, deflate, br',
'Connection':'keep-alive',
'Host':'en.wikipedia.org',
'Upgrade-Insecure-Requests':'1',
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'

}

url = 'https://en.wikipedia.org/wiki/Jamal_ad-Din_al-Afghani'
r = requests.get(url=url,headers=header,allow_redirects=True)
print(r.status_code)
print(r.history)
print(r.url)

结果：状态码是200，请求成功后返回的链接没有发生改变。

尝试2：防止多次重定向，在尝试1（只请求一次服务器）的基础上请求服务器10次，然后输出请求十次后的链接。

代码：

header = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding':'gzip, deflate, br',
'Connection':'keep-alive',
'Host':'en.wikipedia.org',
'Upgrade-Insecure-Requests':'1',
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'

}
 def get_real_url(url,try_count):
     if try_count > 0:
         rs = requests.get(url,headers=header,timeout=10)
         return rs.url,rs.status_code
     else:
         return url

url = 'https://en.wikipedia.org/wiki/Jamal_ad-Din_al-Afghani'
print(get_real_url(url,10))

结果：链接没有发生改变。

尝试1和尝试2的结论：用request的get方法请求url之后，没有获得重定向状态码，没有获得重定向后的链接。

进度2：通过.history可以追踪重定向的过程，得到的结果是null。

代码：通过设置allow_redirects为false禁用重定向

url = 'https://en.wikipedia.org/wiki/Jamal_ad-Din_al-Afghani'
r = requests.get(url=url,headers=header,allow_redirects=True)
print(r.status_code)
print(r.history)
print(r.url)

结果：.history追踪重定向过程为空。

结论：.history没有追踪到重定向过程。

进度3：用selenium解决获取重定向后链接的问题。

参考链接：http://www.qingpingshan.com/jb/javascript/300439.html

1、在pip下安装库selenium

pip install selenium

2、下载PhantomJs，这是个没有界面的浏览器，不用安装，将PhantomJs.exe的路径写入代码中。

3、代码：

from selenium import webdriver
import time

driver = webdriver.PhantomJS(executable_path=r'E:\Program Files (x86)\phantomjs-2.1.1-windows\phantomjs-2.1.1-windows\bin\phantomjs.exe')
driver.get('https://en.wikipedia.org/wiki/Jamal_ad-Din_al-Afghani')
# 等待加载完成
# time.sleep(1)
content = driver.find_element_by_id('content').text
print(driver.current_url)

driver.quit()

4、结果：计时12s得出结果，得到了跳转后的页面链接。

转载于:https://www.cnblogs.com/liusx0303/p/10132878.html

weixin_30477797

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
解决重定向问题

参考链接：https://blog.csdn.net/changjiale110/article/details/76145585进度1：在库request下用get方法解决获取重定向后链接的问题。尝试1：模拟维基百科访问服务器，发起请求，获得请求后的链接。步骤：（1）查看网页headersF12->netwaork->dot->查看header...
复制链接

扫一扫