有时候用urllib库下载http链接的DOC文档时,得到的内容并不是预期想要的,也许是http链接的时候没有加header头伪装浏览器的原因,用httplib库可以避免这一问题:
import httplib def sendhttp2(): h = httplib.HTTP("www.nbjdfy.gov.cn" ) h.putrequest('GET',"/File/cpws_import09/%E6%89%A7%E8%A1%8C/%EF%BC%882012%EF%BC%89%E7%94%AC" "%E4%B8%9C%E6%89%A7%E6%B0%91%E5%AD%97%E7%AC%AC1206%E5%8F%B7.doc") h.putheader('Content-Type', 'application/x-www-form-urlencoded') h.putheader('User-Agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 ' '(KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36') h.putheader('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8') h.putheader('Accept-Language', 'zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3') h.putheader('Accept-Encoding', 'gzip, deflate, sdch') h.putheader('Connection', 'keep-alive') h.endheaders(None) errcode, errmsg, headers = h.getreply() content_file = h.getfile() with open("E:\http4.doc",'wb') as fp: fp.write(content_file.read())