python爬虫实战(二)——优美图库(bs4)-CSDN博客

本文链接：https://blog.csdn.net/skyllerone/article/details/122451856

文章目录

1 . 前言
2 . 代码
3 . 注意点
4 . 最后

1 . 前言

这是一个正经的代码和正经的平台！！！
这次爬图基于beautifulSoup4来实现的

2 . 代码

# 1. 拿到主页面的源代码，提取子页面链接的位置,href
# 2. 通过herf拿到子页面的内容，从子页面中找到图片的下载地址 img -> src
# 3. 下载图片

from os import write
import requests
from bs4 import BeautifulSoup
import time

url = "https://www.umei.cc/p/gaoqing/"
res = requests.get(url)
res.encoding = 'utf-8' # 处理乱码
# print(res)
# print(res.text)
# 把解析教给BeautifulSoup
main_page = BeautifulSoup(res.text) # 将源码交给bs4处理
alist = main_page.find("div", class_ = "TypeList").find_all("a")
#print(alist)
for a in alist:
    href = "https://www.umei.cc" + a.get("href") # 通过get拿到属性的值

# 拿到子页面的源代码
    child_page_res = requests.get(href)
    child_page_res.encoding = 'utf-8'
    child_page_text = child_page_res.text
# 从子页面中拿到图片的下载途径
    child_page = BeautifulSoup(child_page_text)
    p = child_page.find("div", class_ = "ImageBody")
    img = p.find("img")
    src = img.get("src")
# 下载图片
    img_res = requests.get(src)
    #img_res.content # 这里拿到的是字节
    img_name = src.split("/")[-1] # 拿到URL中以"/"分割的后面的内容
    with open("img/"+ img_name,"wb") as f: # mode = 'wb'用于写入图片
        f.write(img_res.content) # 图片内容写入到文件
    
    print("over!!",img_name)
    time.sleep(1) # 防止服务器干掉你,睡一秒钟
print("all_over!!")