爬取优美图库里的照片，并存到文件夹中

最新推荐文章于 2024-01-02 17:26:46 发布

灯繁

最新推荐文章于 2024-01-02 17:26:46 发布

阅读量583

点赞数

文章标签： python

本文链接：https://blog.csdn.net/weixin_52300580/article/details/110680149

版权

爬取图片

前言

这篇博客里面主要写的就是爬取图片的实例，可能比较简单，大佬请指正啊，如果有不对的话

提示：以下是本篇文章正文内容，下面案例可供参考

一、代码如下

#https://www.umei.cc/bizhitupian/huyanbizhi/2.htm
#https://www.umei.cc/bizhitupian/huyanbizhi/13.htm

import requests
from bs4 import BeautifulSoup
import os
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.67 Safari/537.36 Edg/87.0.664.47'
}
urls = ['https://www.umei.cc/bizhitupian/huyanbizhi/{}.htm'.format(i) for i in range(2,13)]
urls=urls[0]
num = 0
for url in urls:
    def get_main(url):
         global num
         response = requests.get(url,headers=headers)
         response.encoding = 'utf-8'
         main_page = BeautifulSoup(response.text,"html.parser")
         alst = main_page.find("div",attrs={"class":"TypeList"}).find_all("a",attrs={"class":"TypeBigPics"})

         n=1
         for a in alst:
            href = a.get("href")
            response1 = requests.get(href)
            response1.encoding = 'utf-8'
            child_page = BeautifulSoup(response1.text,"html.parser")
            src = child_page.find("div",attrs={"class":"ImageBody"}).find("img").get("src")
            n+=1
    
            print('正在下载第'+str(url)+'中的第'+str(num+1)+'张' )
            try:
                if src is not None:
                    pic = requests.get(src,timeout=7)
                else:
                    continue
            except BaseException:
                    print('错误')
            else:
                string = file+r'\\'+str(num+1)+'jpg'
                fp= open(string,'wb')
                fp.write(pic.content)
                fp.close()
                num=num+1


if __name__ == '__main__':
    file= input('请输入你想建立的文件夹名称')
    y=os.path.exists(file)
    if y==1:
        print('请重新输入新的文件夹')
        file = input('新文件夹名称')
        os.mkdir(file)
    else:
        os.mkdir(file)
    get_main(url)

二、使用步骤

1.引入库

代码如下（示例）：

import requests
from bs4 import BeautifulSoup
import os

2.读入数据

其实大致思路和上一篇的相似，因为我比较喜欢用beautiful soup解析网页，其实是不太会正则，而且这种方法挺方便的

特别点就是用了建文件夹的方法，上篇里面没有怎么好好讲述，就是这样新建文件夹，其实可以提前定义好文件夹的名字的

file= input(‘请输入你想建立的文件夹名称’)
y=os.path.exists(file)
if y==1:
print(‘请重新输入新的文件夹’)
file = input(‘新文件夹名称’)
os.mkdir(file)
else:
os.mkdir(file)

还有一点就是这篇文章里面引用了函数

if name == ‘main’:

get_main(url)

总结

提示：这里对文章进行总结：
以上就是今天要讲的内容，本文仅仅简单介绍了爬取图片的使用，写的也比较清楚的吧，也比较简单，希望有不对的地方，请大佬指正

灯繁

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫