python爬虫下载电影百度文档_python爬虫下载文档

最新推荐文章于 2024-06-18 16:33:43 发布

weixin_39692830

最新推荐文章于 2024-06-18 16:33:43 发布

阅读量149

点赞数

文章标签： python爬虫下载电影百度文档

如果不想看我的思路和debug错误，可以直接看最后一句。

————————————————————————————

如题，我想下载的文档是那种输入网址自动下载的，比如这个：

http://app.sipo-reexam.gov.cn/books/2003/FS3641/DOC/FS3641.doc

现在想用Python的urllib.request.urlretrive函数下载下来这个文档，但是报错。

我的代码和报错信息如下：

这是代码：

import urllib.request, urllib.error, urllib.parse

import os

file_name = 'test.doc'

file_path = 'doc'

if os.path.exists(file_path) == False:

os.makedirs(file_path)

local = os.path.join(file_path,file_name)

url = 'http://app.sipo-reexam.gov.cn/books/2003/FS3641/DOC/FS3641.doc'

urllib.request.urlretrieve(url,local,Schedule)

这是报错信息：

Traceback (most recent call last):

File "C:\Users\zhushihao\Desktop\doc.py", line 25, in

urllib.request.urlretrieve(url,local,Schedule)

File "C:\Python34\lib\urllib\request.py", line 178, in urlretrieve

with contextlib.closing(urlopen(url, data)) as fp:

File "C:\Python34\lib\urllib\request.py", line 153, in urlopen

return opener.open(url, data, timeout)

File "C:\Python34\lib\urllib\request.py", line 455, in open

response = self._open(req, data)

File "C:\Python34\lib\urllib\request.py", line 473, in _open

'_open', req)

File "C:\Python34\lib\urllib\request.py", line 433, in _call_chain

result = func(*args)

File "C:\Python34\lib\urllib\request.py", line 1202, in http_open

return self.do_open(http.client.HTTPConnection, req)

File "C:\Python34\lib\urllib\request.py", line 1177, in do_open

r = h.getresponse()

File "C:\Python34\lib\http\client.py", line 1172, in getresponse

response.begin()

File "C:\Python34\lib\http\client.py", line 351, in begin

version, status, reason = self._read_status()

File "C:\Python34\lib\http\client.py", line 321, in _read_status

raise BadStatusLine(line)

http.client.BadStatusLine: ''

[Finished in 2.3s with exit code 1]

另外，我自己考虑到有可能是网站屏蔽非浏览器请求，就想封装一个header，结果提示urlretrieve函数第一个参数只能是string，不能是request对象。

—————————————————————我是最后一句——————————————————————

如何用Python 下载 http://app.sipo-reexam.gov.cn/books/2003/FS3641/DOC/FS3641.doc这样的文档？

weixin_39692830

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。