百度图片下载

阿祺的阿铖呀

于 2021-01-10 23:51:48 发布

阅读量214

点赞数

分类专栏： Python爬虫文章标签： python

本文链接：https://blog.csdn.net/m0_48176011/article/details/112453077

版权

Python爬虫专栏收录该内容

8 篇文章 1 订阅

订阅专栏

import re
import os
from urllib.request import Request, urlopen, urlretrieve
import bs4
import json

def json_all(pn):
    links = []
    for i in range(0,pn+1):
        url = 'https://image.baidu.com/search/acjson?tn=resultjson_com&logid=6116183662344635930&ipn=rj&ct=201326592&is=&fp=result&queryWord=%E5%88%97%E7%BB%B4%E5%9D%A6&cl=2&lm=-1&ie=utf-8&oe=utf-8&adpicid=&st=&z=&ic=&hd=&latest=&copyright=&word=%E5%88%97%E7%BB%B4%E5%9D%A6&s=&se=&tab=&width=&height=&face=&istype=&qc=&nc=1&fr=&expermode=&force=&pn={}&rn=30&gsm=3c&1608184113114='.format(i * 30)
        header = {
            'Referer': 'https://image.baidu.com/search/index?tn=baiduimage',
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36'}
        info_all = Request(url=url, headers=header)
        f=urlopen(info_all).read().decode('utf-8')
        info = json.loads(f)
        for i in info['data']:
            if 'hoverURL' in i.keys():
                # print(i['hoverURL'])
                links.append(i['hoverURL'])
    return links

if __name__ == '__main__':
    linkk=json_all(5)
    a=1
    for llink in linkk:

        dir=os.path.abspath('C:\\Users\\user\\PycharmProjects\\pythonProject\\2\\')
        work_path=os.path.join(dir,str(a)+'.jpg')
        urlretrieve(llink,work_path)
        a+=1

python3中urllib.request模块提供的urlretrieve()函数。urlretrieve()方法直接将远程数据下载到本地。
urlretrieve(url, filename=None, reporthook=None, data=None)
参数url：下载链接地址
参数filename：指定了保存本地路径（如果参数未指定，urllib会生成一个临时文件保存数据。）
参数reporthook：是一个回调函数，当连接上服务器、以及相应的数据块传输完毕时会触发该回调，我们可以利用这个回调函数来显示当前的下载进度。
参数data：指post导服务器的数据，该方法返回一个包含两个元素的(filename, headers) 元组，filename 表示保存到本地的路径，header表示服务器的响应头

import os 
from urllib.request import urlretrieve
url='http://www.baidu.com' 
dir=os.path.abspath('.') 
work_path=os.path.join(dir,'baidu.html') 
urlretrieve(url,work_path)

import os
from urllib.request import urlretrieve
url='http://www.python.org/ftp/python/2.7.5/Python-2.7.5.tar.bz2'
dir=os.path.abspath('.')
work_path=os.path.join(dir,'Python-2.7.5.tar.bz2')
urlretrieve(url,work_path)