python获取的网页的js_Python 爬取单个网页所需要加载的地址和CSS、JS文件地址

最新推荐文章于 2022-01-19 09:37:00 发布

weixin_39954889

最新推荐文章于 2022-01-19 09:37:00 发布

阅读量314

点赞数 1

文章标签： python获取的网页的js

Python 爬取单个网页所需要加载的URL地址和CSS、JS文件地址

通过学习Python爬虫，知道根据正式表达式匹配查找到所需要的内容（标题、图片、文章等等）。而我从测试的角度去使用Python爬虫，希望爬取到访问该网页所需要的CSS、JS、URL，然后去请求这些地址，根据响应的状态码判断是否都可以成功访问。

代码

'''

Created on 2017-08-02

@author: Lebb

'''

import sys

import urllib2

import re

reload(sys)

sys.setdefaultencoding('utf-8')

url = "https://www.szrtc.cn/"

http = "http"

request = urllib2.Request(url,headers=Headers)

responsecode = None

errorcount = 0

itemurl = url

def getResponse():

try:

response = urllib2.urlopen(request)

except urllib2.HTTPError,he:

print he.code

except urllib2.URLError,ue:

print ue.reason

else :

return response.read().decode('utf-8')

def getUrl():

html = getResponse()

patterncss ='

patternjs = '

patternpage = '

patternonclick = "openQuestion.*?'(.*?)'"

href = re.compile(patterncss, re.S).findall(html)

href += re.compile(patternimg, re.S).findall(html)

href += re.compile(patternpage, re.S).findall(html)

href += re.compile(patternjs, re.S).findall(html)

href += re.compile(patternonclick, re.S).findall(html)

return href

def reasonCode():

global errorcount

itemurl = getUrl()

for item1 in itemurl:

if http in item1:

sendurl = item1

else:

sendurl = url + item1

try:

print sendurl

responseurl = urllib2.urlopen(sendurl,timeout=8)

except urllib2.HTTPError,he:

responsecode = he.code

errorcount += 1

except urllib2.URLError,ue:

responsecode = ue.reason

errorcount += 1

else:

responsecode = responseurl.getcode()

if(responsecode != 200):

errorcount += 1

print responsecode

#return responsecode

print errorcount

运行的结果：

SouthEast

错误截图：

SouthEast

实际上这条请求复制到浏览器是可以访问的，但是Python 的urllib2访问时，因为请求带中文参数，没有进行编码转换，导致报400错误。

尝试在代码中加入utf-8，还是没有效果，仍然报错。

这个问题先记下来，后面去找到其他解决办法

weixin_39954889

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。