python下载图片、已知url_如何使用我已知道其URL地址的Python在本地保存图像？-CSDN博客

原英文标题

How to save an image locally using Python whose URL address I already know?

我知道Internet上图像的URL。

现在，如何使用Python下载此图像，而无需在浏览器中实际打开URL并手动保存文件。

10 个回复:

===============>>#1 票数：268 已采纳

Python 2

如果您只想将其保存为文件，这是一种更直接的方法：import urllib

urllib.urlretrieve("http://www.digimouth.com/news/media/2011/09/google-logo.jpg", "local-filename.jpg")

第二个参数是应保存文件的本地路径。

Python 3

正如SergO建议的，下面的代码应该适用于Python 3。import urllib.request

urllib.request.urlretrieve("http://www.digimouth.com/news/media/2011/09/google-logo.jpg", "local-filename.jpg")

===============>>#2 票数：26

import urllib

resource = urllib.urlopen("http://www.digimouth.com/news/media/2011/09/google-logo.jpg")

output = open("file01.jpg","wb")

output.write(resource.read())

output.close()

file01.jpg将包含您的图像。

===============>>#3 票数：16

我写了一个脚本来做这个，它可以在我的github上供你使用。

我利用BeautifulSoup允许我解析任何网站的图像。如果你要做很多网页抓取（或打算使用我的工具），我建议你sudo pip install BeautifulSoup 。有关BeautifulSoup的信息，请点击此处。

为方便起见，这是我的代码：from bs4 import BeautifulSoup

from urllib2 import urlopen

import urllib

# use this image scraper from the location that

#you want to save scraped images to

def make_soup(url):

html = urlopen(url).read()

return BeautifulSoup(html)

def get_images(url):

soup = make_soup(url)

#this makes a list of bs4 element tags

images = [img for img in soup.findAll('img')]

print (str(len(images)) + "images found.")

print 'Downloading images to current working directory.'

#compile our unicode list of image links

image_links = [each.get('src') for each in images]

for each in image_links:

filename=each.split('/')[-1]

urllib.urlretrieve(each, filename)

return image_links

#a standard call looks like this

#get_images('http://www.wookmark.com')

===============>>#4 票数：6

适用于Python 2和Python 3的解决方案：try:

from urllib.request import urlretrieve # Python 3

except ImportError:

from urllib import urlretrieve # Python 2

url = "http://www.digimouth.com/news/media/2011/09/google-logo.jpg"

urlretrieve(url, "local-filename.jpg")

或者，如果requests的附加要求是可接受的，并且如果它是http（s）URL：def load_requests(source_url, sink_path):

"""

Load a file from an URL (e.g. http).

Parameters

----------

source_url : str

Where to load the file from.

sink_path : str

Where the loaded file is stored.

"""

import requests

r = requests.get(source_url, stream=True)

if r.status_code == 200:

with open(sink_path, 'wb') as f:

for chunk in r:

f.write(chunk)

===============>>#5 票数：5

我在Yup。的脚本上扩展了一个脚本。我修了一些东西。它现在将绕过403：禁止的问题。当图像无法检索时，它不会崩溃。它试图避免损坏的预览。它获得了正确的绝对网址。它提供了更多信息。它可以使用命令行中的参数运行。# getem.py

# python2 script to download all images in a given url

# use: python getem.py http://url.where.images.are

from bs4 import BeautifulSoup

import urllib2

import shutil

import requests

from urlparse import urljoin

import sys

import time

def make_soup(url):

req = urllib2.Request(url, headers={'User-Agent' : "Magic Browser"})

html = urllib2.urlopen(req)

return BeautifulSoup(html, 'html.parser')

def get_images(url):

soup = make_soup(url)

images = [img for img in soup.findAll('img')]

print (str(len(images)) + " images found.")

print 'Downloading images to current working directory.'

image_links = [each.get('src') for each in images]

for each in image_links:

try:

filename = each.strip().split('/')[-1].strip()

src = urljoin(url, each)

print 'Getting: ' + filename

response = requests.get(src, stream=True)

# delay to avoid corrupted previews

time.sleep(1)

with open(filename, 'wb') as out_file:

shutil.copyfileobj(response.raw, out_file)

except:

print ' An error occured. Continuing.'

print 'Done.'

if __name__ == '__main__':

url = sys.argv[1]

get_images(url)

===============>>#6 票数：5

Python 3from urllib.error import HTTPError

from urllib.request import urlretrieve

try:

urlretrieve(image_url, image_local_path)

except FileNotFoundError as err:

print(err) # something wrong with local path

except HTTPError as err:

print(err) # something wrong with url

===============>>#7 票数：5

这可以通过请求完成。加载页面并将二进制内容转储到文件中。import os

import requests

url = 'https://apod.nasa.gov/apod/image/1701/potw1636aN159_HST_2048.jpg'

page = requests.get(url)

f_ext = os.path.splitext(url)[-1]

f_name = 'img{}'.format(f_ext)

with open(f_name, 'wb') as f:

f.write(page.content)

===============>>#8 票数：2

Python 3的版本

我为Python 3调整了@madprops的代码# getem.py

# python2 script to download all images in a given url

# use: python getem.py http://url.where.images.are

from bs4 import BeautifulSoup

import urllib.request

import shutil

import requests

from urllib.parse import urljoin

import sys

import time

def make_soup(url):

req = urllib.request.Request(url, headers={'User-Agent' : "Magic Browser"})

html = urllib.request.urlopen(req)

return BeautifulSoup(html, 'html.parser')

def get_images(url):

soup = make_soup(url)

images = [img for img in soup.findAll('img')]

print (str(len(images)) + " images found.")

print('Downloading images to current working directory.')

image_links = [each.get('src') for each in images]

for each in image_links:

try:

filename = each.strip().split('/')[-1].strip()

src = urljoin(url, each)

print('Getting: ' + filename)

response = requests.get(src, stream=True)

# delay to avoid corrupted previews

time.sleep(1)

with open(filename, 'wb') as out_file:

shutil.copyfileobj(response.raw, out_file)

except:

print(' An error occured. Continuing.')

print('Done.')

if __name__ == '__main__':

get_images('http://www.wookmark.com')

===============>>#9 票数：1

这是一个非常简短的答案。import urllib

urllib.urlretrieve("http://photogallery.sandesh.com/Picture.aspx?AlubumId=422040", "Abc.jpg")

===============>>#10 票数：-1

img_data=requests.get('https://apod.nasa.gov/apod/image/1701/potw1636aN159_HST_2048.jpg')

with open(str('file_name.jpg', 'wb') as handler:

handler.write(img_data)