python下载url链接_使用Python从url地址下载所有pdf文件

最新推荐文章于 2023-09-25 15:56:31 发布

weixin_39606137

最新推荐文章于 2023-09-25 15:56:31 发布

阅读量1.2k

点赞数 1

文章标签： python下载url链接

我需要找到一种方法来下载给定url中的所有pdf文件，然后我找到了一个脚本，该脚本可能——我还没有测试过——完成了这个任务：import urllib.parse

import urllib2

import os

import sys

from bs4 import BeautifulSoup

from urllib3 import request

url = "https://...."

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0"}

i = 0

request = urlib2.request(url, None, headers)

html = urllib2.urlopen(request)

soup = BeuatifulSoup(html.read())

for tag in soup.findAll("a" , href = True)

tag["href"] = urlparse.urljoin(url, tag["href"])

if os.path.splitext(os.path.basename(tag["href"]))[1] == ".pdf"

current = urllib2.urlopen(tag["href"])

print("\n[*] Downloading: %s" %(os.path.basename(tag["href"])))

f = open(download_path + "\\" + os.path.basename(tag["href"], "wb"))

f.write(current.read())

f.close()

i += 1

print("\n[*] Downloaded %d files" %(i + 1))

raw_input("[+] Press any key to exit ... ")

问题是我安装了python3.3，而这个脚本不能与python3.3一起运行。E、 urllib2不适用于python3.3。在

你能告诉我如何修改这个脚本使之与Python3.3兼容吗？在

我将非常感谢你的帮助。在

优惠劵

关注关注