python抓取网站图片_使用python抓取网站

python抓取网站图片

Website scraping refers to reading of any website’s structure to extract needed information through an automated system, usually a script. There is a thin line between legal and illegal website scraping. If a content is available without logging in or performing any identity verification or unless explicitly mentioned by the content provider, scraping that website doesn’t fall under the radar.

网站抓取是指读取任何网站的结构以通过自动化系统(通常是脚本)提取所需信息。 合法和非法网站抓取之间存在细微的界限。 如果某个内容可用而无需登录或进行任何身份验证,或者除非内容提供者明确提及,否则抓取该网站不会受到关注。

Today, we are going to extract information from one of the best information provider – Wikipedia. We are going to make use of its random article feature to extract information from a random article. I am going to use following python tools:

今天,我们将从最佳信息提供商之一Wikipedia中提取信息。 我们将利用其随机文章功能从随机文章中提取信息。 我将使用以下python工具:

Python 2
urllib2 module
BeautifulSoup module

To install beautifulsoup, you can use pip install beautifulsoup and urllib2 is a python bultin module so you need not install it explicitly.

要安装beautifulsoup,可以使用pip install beautifulsoup并且urllib2是python bultin模块,因此无需明确安装它。

Okay, so first step now is to import required modules:

好的,所以现在的第一步是导入所需的模块:

from bs4 import BeautifulSoup
import urllib2

Now we are going to fetch content of a random wikipedia webpage using urllib2 and create a parseable object using beautifulsoup.

现在,我们将使用urllib2获取随机维基百科网页的内容,并使用beautifulsoup创建一个可解析的对象。

random_page = "https://en.wikipedia.org/wiki/Special:Random"
random_page_content = urllib2.urlopen(ranom_page)
parsed_page = BeautifulSoup(random_page_content)

翻译自: https://www.pybloggers.com/2016/04/website-scraping-using-python/

python抓取网站图片

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值