python抓取网站图片_使用python抓取网站

最新推荐文章于 2024-07-07 09:46:06 发布

cumei1658

最新推荐文章于 2024-07-07 09:46:06 发布

阅读量196

点赞数

文章标签： python 数据挖掘深度学习大数据机器学习

原文链接：https://www.pybloggers.com/2016/04/website-scraping-using-python/

版权

python抓取网站图片

Website scraping refers to reading of any website’s structure to extract needed information through an automated system, usually a script. There is a thin line between legal and illegal website scraping. If a content is available without logging in or performing any identity verification or unless explicitly mentioned by the content provider, scraping that website doesn’t fall under the radar.

网站抓取是指读取任何网站的结构以通过自动化系统（通常是脚本）提取所需信息。合法和非法网站抓取之间存在细微的界限。如果某个内容可用而无需登录或进行任何身份验证，或者除非内容提供者明确提及，否则抓取该网站不会受到关注。

Today, we are going to extract information from one of the best information provider – Wikipedia. We are going to make use of its random article feature to extract information from a random article. I am going to use following python tools:

今天，我们将从最佳信息提供商之一Wikipedia中提取信息。我们将利用其随机文章功能从随机文章中提取信息。我将使用以下python工具：

Python 2
urllib2 module
BeautifulSoup module

To install beautifulsoup, you can use pip install beautifulsoup and urllib2 is a python bultin module so you need not install it explicitly.

要安装beautifulsoup，可以使用pip install beautifulsoup并且urllib2是python bultin模块，因此无需明确安装它。

Okay, so first step now is to import required modules:

好的，所以现在的第一步是导入所需的模块：

from bs4 import BeautifulSoup
import urllib2

Now we are going to fetch content of a random wikipedia webpage using urllib2 and create a parseable object using beautifulsoup.

现在，我们将使用urllib2获取随机维基百科网页的内容，并使用beautifulsoup创建一个可解析的对象。

random_page = "https://en.wikipedia.org/wiki/Special:Random"
random_page_content = urllib2.urlopen(ranom_page)
parsed_page = BeautifulSoup(random_page_content)

翻译自: https://www.pybloggers.com/2016/04/website-scraping-using-python/

python抓取网站图片

cumei1658

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python抓取网站图片_使用python抓取网站

python抓取网站图片Website scraping refers to reading of any website’s structure to extract needed information through an automated system, usually a script. There is a thin line between legal and illegal we...
复制链接

扫一扫