python 网络数据采集——媒体文件

WolfgangBai

于 2016-09-29 23:23:51 发布

阅读量337

点赞数 1

分类专栏： Python 文章标签： python url 网络数据图片

本文链接：https://blog.csdn.net/qq_28692987/article/details/52706043

版权

Python 专栏收录该内容

2 篇文章

订阅专栏

声名：本文为学习笔记，内容来自于《python 网络数据采集》（英文名：Web Scraping with Python)

存储媒体文件有两种主要方式：只获取URL链接，或者直接把源文件下载下来。如果，文件需要多次使用，那么最好下载下来。

以下载图片为例，在Python 3.x 版本中，urllib.request.urlretrieve可以根据文件的URL下载文件：

from urllib.request import urlretrieve
from urllib.request import urlopen
from bs4 import BeautifulSoup


html = urlopen("http://www.pythonscraping.com")
bsObj = BeautifulSoup(html)
imageLocation = bsObj.find("a", {"id": "logo"}).find("img")["src"]
urlretrieve(imageLocation, "logo.jpg")