【爬虫】网页图片爬虫工具——从谷歌必应上爬取图片

最新推荐文章于 2024-05-21 15:01:19 发布

weixin_30632883

最新推荐文章于 2024-05-21 15:01:19 发布

阅读量610

点赞数

原文链接：http://www.cnblogs.com/wlhr62/p/10607165.html

版权

最近需要从谷歌和必应上爬一批图片，但是基于不同网站有不同的规则，所以对于我这个爬虫小白来说，URL以及正则化表达式的理解和查改就很困难。

后来在github上发现了很好用的工具，简便快捷，正好分享给大家。

1.从谷歌上爬取图片数据——google-images-download

https://github.com/hardikvasa/google-images-download

下载图片的算法逻辑结构：

安装使用非常简单，可以使用以下几个方法之一进行安装：

使用pip安装：
```
pip install google_images_download
```

使用CLI安装：

git clone https://github.com/hardikvasa/google-images-download.git
cd google-images-download && sudo python setup.py install

手动下载安装：

转到Github上的repo=>
单击“Clone or Download”==>
单击“Download ZIP”并将其保存到本地磁盘上

安装或下载好之后，进行图片的爬取：

如果是使用pip或者CLI安装，使用如下命令进行操作：
```
googleimagesdownload [Arguments...]
```
如果是通过用户自己下载的方式，首先解压下载的文件，进入'google_images_download'目录下，使用如下命令进行操作：
```
python3 google_images_download.py [Arguments...]
```
或者
```
python google_images_download.py [Arguments...]
```

常见的参数及命令如下所示：

如果要从配置文件中传递参数，只需使用JSON文件名进行配置文件的参数传递即可：
```
googleimagesdownload -cf example.json
```

只使用关键字和限制参数的简单示例：

googleimagesdownload --keywords "Polar bears, baloons, Beaches" --limit 20

使用后缀关键字可以在主关键字之后指定单词。例如，如果关键字=car，后缀关键字=red，blue，则首先搜索car red，然后搜索car blue：
```
googleimagesdownload --k "car" -sk 'red,blue,white' -l 10
```

使用速记命令：

googleimagesdownload -k "Polar bears, baloons, Beaches" -l 20

下载具有特定图像扩展名或格式的图片：
```
googleimagesdownload --keywords "logo" --format svg
```

为图片使用颜色过滤器：

googleimagesdownload -k "playground" -l 20 -co red

使用非英文的关键字对图片进行搜索：
```
googleimagesdownload -k "北极熊" -l 5
```

从谷歌图片链接下载图片：

googleimagesdownload -k "sample" -u <google images page URL>

在特定主目录中保存图片（不是在“下载”中）：
```
googleimagesdownload -k "boat" -o "boat_new"
```

使用图像URL下载单个图像：

googleimagesdownload --keywords "baloons" --single_image <URL of the images>

下载带有大小和类型约束的图像：

googleimagesdownload --keywords "baloons" --size medium --type animated

下载具有特定使用权限的图像：

googleimagesdownload --keywords "universe" --usage_rights labeled-for-reuse

下载具有特定颜色类型的图像：

googleimagesdownload --keywords "flowers" --color_type black-and-white

下载具有特定纵横比的图像：

googleimagesdownload --keywords "universe" --aspect_ratio panoramic

下载与您提供的图像URL中的图像类似的图像：
```
googleimagesdownload -si <image url> -l 10
```

从特定网站或域名下载给定关键字的图像：

googleimagesdownload --keywords "universe" --specific_site example.com

2.从bing上爬取图片数据——Bulk-Bing-Image-downloader

https://github.com/ostrolucky/Bulk-Bing-Image-downloader

使用非常简单：

可以git clone或者直接下载到本地

进入文件目录，直接运行：

bbid.py [-h] [-s SEARCH_STRING] [-f SEARCH_FILE] [-o OUTPUT]
               [--adult-filter-on] [--adult-filter-off] [--filters FILTERS]
               [--limit LIMIT]

举个例子：
```
./bbid.py -s "hello world"
```

-----------------------持续补充-------------------------

转载于:https://www.cnblogs.com/wlhr62/p/10607165.html

weixin_30632883

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
【爬虫】网页图片爬虫工具——从谷歌必应上爬取图片

最近需要从谷歌和必应上爬一批图片，但是基于不同网站有不同的规则，所以对于我这个爬虫小白来说，URL以及正则化表达式的理解和查改就很困难。后来在github上发现了很好用的工具，简便快捷，正好分享给大家。1.从谷歌上爬取图片数据——google-images-download https://github.com/hardikvasa/google-images-downloa...
复制链接

扫一扫