我需要找到一种方法来下载给定url中的所有pdf文件,然后我找到了一个脚本,该脚本可能——我还没有测试过——完成了这个任务:import urllib.parse
import urllib2
import os
import sys
from bs4 import BeautifulSoup
from urllib3 import request
url = "https://...."
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0"}
i = 0
request = urlib2.request(url, None, headers)
html = urllib2.urlopen(request)
soup = BeuatifulSoup(html.read())
for tag in soup.findAll("a" , href = True)
tag["href"] = urlparse.urljoin(url, tag["href"])
if os.path.splitext(os.path.basename(tag["href"]))[1] == ".pdf"
current = urllib2.urlopen(tag["href"])
print("\n[*] Downloading: %s" %(os.path.basename(tag["href"])))
f = open(download_path + "\\" + os.path.basename(tag["href"], "wb"))
f.write(current.read())
f.close()
i += 1
print("\n[*] Downloaded %d files" %(i + 1))
raw_input("[+] Press any key to exit ... ")
问题是我安装了python3.3,而这个脚本不能与python3.3一起运行。E、 urllib2不适用于python3.3。在
你能告诉我如何修改这个脚本使之与Python3.3兼容吗?在
我将非常感谢你的帮助。在