17.6.5 如何用python爬虫百度图片里面可加关键词的搜索结果

最新推荐文章于 2024-09-26 19:15:00 发布

MQTXWD

最新推荐文章于 2024-09-26 19:15:00 发布

阅读量601

点赞数

分类专栏： python 文章标签： python 图片爬虫

本文链接：https://blog.csdn.net/u011732139/article/details/72867225

版权

python 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

好久没有发博客了，可能是最近一段时间不怎么想学习。这样不好不好！！
前一小段时间主要在弄的一件事情就是自己在整理一个caricature的库，但是做这种漫画的人脸识别的数据库现有的并不多，如果公开的话也只是少部分，并且已经公开的数据库都比较小。所以为了下一个小阶段的学习，自己整理了一个小的caricature dataset。所以简单的保存图片已经不能满足于我，所以就找了爬虫来爬图片。由于caricature数据库中每个人都是含有真实的图片，同时也含有漫画肖像图。漫画肖像图主要是从google image和国外的一个叫Pinterest上面获取的（需要翻墙）。在搞定了漫画图之后，其次的主要任务就是真实的图片，也就是爬百度图片的图片。
代码如下：

import re
import requests
import os

def downloadPic(html,keyword,name):
    pic_url = re.findall('"objURL":"(.*?)",'html,re.S)
    i = 0
    print 'Find the key words:' + keywprd
    print 'Now downloading...'
    for each in pic_url:
        print str(i) +'of all' +keywords + ', URL:' + str(each)
        try:
            pic = requests.get(each,timeout=30)
        except requests.exceptions.ConnectionError:
            print 'Error: can not download the image'
            continue

        string = 'pictures/' + name + '/' + str(i) + '.jpg'
        fp = open(string.decode('utf-8').encode('cp936'),'wb')
        fp.write(pic.content)
        i += 1
        if i == 10:
            continue

if __main__ == '__main__':
    namelist = open('关键词列表路径')
    for pername in namelist.readlines():
        dir_string = 'pictures/' + pername + '/'
        os.mkdir(dir_string)
        url = 'http://image.baidu.com/search/filp?tn=baiduimage&ie=utf-8&word=' + word + '&ct=201326592&v=flip'
        result = request.get(url)
        downloadPic(result.text,pername,pername)