python万能获取url源码_2.4.5. 获得Url返回的HTML网页（源码）内容:getUrlRespHtml

最新推荐文章于 2022-09-24 11:04:33 发布

weixin_39883433

最新推荐文章于 2022-09-24 11:04:33 发布

阅读量373

点赞数

文章标签： python万能获取url源码

2.4.5. 获得Url返回的HTML网页(源码)内容:getUrlRespHtml

#------------------------------------------------------------------------------

# get response html==body from url

#def getUrlRespHtml(url, postDict={}, headerDict={}, timeout=0, useGzip=False) :

def getUrlRespHtml(url, postDict={}, headerDict={}, timeout=0, useGzip=True) :

resp = getUrlResponse(url, postDict, headerDict, timeout, useGzip);

respHtml = resp.read();

if(useGzip) :

#print "---before unzip, len(respHtml)=",len(respHtml);

respInfo = resp.info();

# Server: nginx/1.0.8

# Date: Sun, 08 Apr 2012 12:30:35 GMT

# Content-Type: text/html

# Transfer-Encoding: chunked

# Connection: close

# Vary: Accept-Encoding

# ...

# Content-Encoding: gzip

# sometime, the request use gzip,deflate, but actually returned is un-gzip html

# -> response info not include above "Content-Encoding: gzip"

# eg: http://blog.sina.com.cn/s/comment_730793bf010144j7_3.html

# -> so here only decode when it is indeed is gziped data

if( ("Content-Encoding" in respInfo) and (respInfo['Content-Encoding'] == "gzip")) :

respHtml = zlib.decompress(respHtml, 16+zlib.MAX_WBITS);

#print "+++ after unzip, len(respHtml)=",len(respHtml);

return respHtml;

例 2.24. getUrlRespHtml的使用范例：不带额外参数

respHtml = getUrlRespHtml(url);

例 2.25. getUrlRespHtml的使用范例：带额外参数

modifyUrl = gVal['blogEntryUrl'] + "/blog/submit/modifyblog";

#logging.debug("Modify Url is %s", modifyUrl);

#http://hi.baidu.com/wwwhaseecom/blog/item/79188d1b4fa36f068718bf79.html

foundSpBlogID = re.search(r"blog/item/(?P\w+?).html", url);

if(foundSpBlogID) :

spBlogID = foundSpBlogID.group("spBlogID");

logging.debug("Extracted spBlogID=%s", spBlogID);

else :

modifyOk = False;

errInfo = "Can't extract post spBlogID !";

return (modifyOk, errInfo);

newPostContentGb18030 = newPostContentUni.encode("GB18030");

categoryGb18030 = infoDict['category'].encode("GB18030");

titleGb18030 = infoDict['title'].encode("GB18030");

postDict = {

"bdstoken" : gVal['spToken'],

"ct" : "1",

"mms_flag" : "0",

"cm" : "2",

"spBlogID" : spBlogID,

"spBlogCatName_o": categoryGb18030, # old catagory

"edithid" : "",

"previewImg" : "",

"spBlogTitle" : titleGb18030,

"spBlogText" : newPostContentGb18030,

"spBlogCatName" : categoryGb18030, # new catagory

"spBlogPower" : "0",

"spIsCmtAllow" : "1",

"spShareNotAllow":"0",

"spVcode" : "",

"spVerifyKey" : "",

}

headerDict = {

# 如果不添加Referer，则返回的html则会出现错误："数据添加的一般错误"

"Referer" : gVal['blogEntryUrl'] + "/blog/modify/" + spBlogID,

}

respHtml = getUrlRespHtml(modifyUrl, postDict, headerDict);

weixin_39883433

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。