如何在百度下载图片?


最近要在百度上下载图片座测试, 不想手动下载, 因此研究了一下自动下载脚本.


成果如下:


# -*- coding: utf-8 -*-



import os
import urllib2
import json



tags = ['运动服']

urls = [];

savePath = './'

for tag2 in tags:
    print 'start download theme :' , tag2
    
    startNum = 0 ;  # the index of the start image to download
    resultNum = 60  # the number of images one time can be got form baidu image by json , 60 is the upper bound 
    
    endnum = 3000

    totalNum = -1  # the total number of the theme images 
    downloadNum = 0
    
    path = unicode(savePath + '/' + tag2 + '/' , 'utf8')
    if not os.path.exists(path):
        os.makedirs(path)
    
    
    while totalNum == -1 or startNum < totalNum or startNum > endnum:
        
        oneRequeseNum = 0
        
        try:
            
            url = 'http://image.baidu.com/i?tn=baiduimagejson&width=&height=&ie=utf8&oe=utf-8&word=' + tag2 + '&pn=' + str(startNum) + '&rn=' + str(resultNum)
            
            user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
            headers = {"User-Agent" : user_agent}
            req = urllib2.Request(url , headers=headers)
            html = urllib2.urlopen(req , timeout=100)
            
            jsonData = json.loads(html.read())
            
            # print jsonData
            if totalNum == -1:
                totalNum = jsonData['displayNum']
                print 'toatl number :', totalNum
            
            data = jsonData['data']

            for index , item in enumerate(data):
                
                oneRequeseNum += 1
                
                if item.has_key("objURL"):
                    url = item['objURL']
                    urls.append(url);
        
        except Exception , e:
            print "Exception : " , str(e)
            print url
            oneRequeseNum = oneRequeseNum+100
        
        finally:
            startNum = startNum + oneRequeseNum    
            print 'Finish download theme : ' , tag2 
            print 'Download images number :' , startNum
        
        ff = open('urls.txt','w');
        for url in urls:
            ff.write('%s\n'% url)
        ff.close()


这里有个注意的地方: url中的utf8等关键字需要加载在str之前. 如果加载再之后, 我的程序报错.


参考:


http://blog.csdn.net/yuanwofei/article/details/16343743

http://www.devba.com/index.php/archives/3321.html

http://blog.csdn.net/viomag/article/details/38340993


以及原本代码是https://github.com/busz/BaiduImageDownloader

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值