httplib下载http链接的DOC文档

最新推荐文章于 2024-08-08 07:09:07 发布

漩涡无度

最新推荐文章于 2024-08-08 07:09:07 发布

阅读量1.9k

点赞数

分类专栏： python

本文链接：https://blog.csdn.net/m_wuhua/article/details/50393288

版权

python 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

有时候用urllib库下载http链接的DOC文档时，得到的内容并不是预期想要的，也许是http链接的时候没有加header头伪装浏览器的原因，用httplib库可以避免这一问题：

import httplib
def sendhttp2():

   h = httplib.HTTP("www.nbjdfy.gov.cn" )

   h.putrequest('GET',"/File/cpws_import09/%E6%89%A7%E8%A1%8C/%EF%BC%882012%EF%BC%89%E7%94%AC"
               "%E4%B8%9C%E6%89%A7%E6%B0%91%E5%AD%97%E7%AC%AC1206%E5%8F%B7.doc")

   h.putheader('Content-Type', 'application/x-www-form-urlencoded')


   h.putheader('User-Agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 '
               '(KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36')

   h.putheader('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8')

   h.putheader('Accept-Language', 'zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3')

   h.putheader('Accept-Encoding', 'gzip, deflate, sdch')

   h.putheader('Connection', 'keep-alive')

   h.endheaders(None)

   errcode, errmsg, headers = h.getreply()

   content_file = h.getfile()

   with open("E:\http4.doc",'wb') as fp:

       fp.write(content_file.read())