花瓣网爬虫Python

最新推荐文章于 2021-01-29 20:10:43 发布

Laughing不够好

最新推荐文章于 2021-01-29 20:10:43 发布

阅读量1.1k

点赞数

分类专栏： Python

本文链接：https://blog.csdn.net/u013388049/article/details/90922402

版权

Python 专栏收录该内容

0 篇文章 0 订阅

订阅专栏

花瓣主页

可爱头像

F12查看源码 script脚本中有很多链接

果然放的是宝宝图片

正则表达式匹配，get地址，大功告成。

import urllib.request  
import re  
import os  

dir = "E:/space/python/Request/huaban/pic/"
url_re=re.compile(r'"(https://img2..+?)"')   
url='https://huaban.com/explore/ertongtouxiang/' 

def url_open(url):  
    html=urllib.request.urlopen(url).read()  
    return html  

def get_img_adds(html):  
    img_addrs=url_re.findall(html)  
    img_addrs=list(set(img_addrs)) 
    return img_addrs  

def save_img(dir,img_addrs,filename=0):  
    for each in img_addrs:  
       print("正在下载" + str(filename) + ".jpg") 
       with open(dir + str(filename)+'.jpg','wb') as f:  
           filename+=1  
           img=url_open(each)  
           f.write(img)  

def download_huaban_img():
 foler = os.path.exists(dir)
 if not foler:     
    os.mkdir(foler)  
    os.chdir(foler)  
 html=url_open(url)  
 img_addrs=get_img_adds(html.decode('utf-8'))  
 save_img(dir,img_addrs)  

if __name__=='__main__':  
    download_huaban_img()