用python爬虫获取图片,会中文就能学会＜最详细＞

水水不是睡睡

已于 2022-11-30 13:23:31 修改

阅读量313

点赞数 1

分类专栏：爬虫 python 文章标签： python 爬虫开发语言

于 2022-11-30 13:20:25 首次发布

本文链接：https://blog.csdn.net/a2668070244/article/details/128113268

版权

爬虫同时被 2 个专栏收录

1 篇文章 0 订阅

订阅专栏

python

1 篇文章 0 订阅

订阅专栏

爬取图片第一步就是打开网站进行查看源码

注意爬取的东西不可以拿来牟利或恶意传播到网上

第一步打开python

点击python最下方的 Pyhton Packages 下载模块
请添加图片描述

第一个模块 requests

请添加图片描述

第二个模块 beautifulsoup4

在这里插入图片描述

第三个模块 selenium

在这里插入图片描述

创建一个python file

在这里插入图片描述
导入模块

import requests
from bs4 import BeautifulSoup
import selenium

定义一个方法提取网站的内容

def craw_html(url):
  resp = requests.get(url)
  resp.encoding='gbk'   #如果爬取出乱码就改一下编码
  print(resp.status_code) #如果输出结果为200 可以知道该网站可以直接爬取,没有任何的防爬措施
  html = resp.text
  return html

def parse_and_download(html):
    #解析图片的地址
  soup = BeautifulSoup(html, "html.parser")
  imgs = soup.find_all("img")
  for img in imgs:   #循环得到全部img
    src = img["src"]
    if "/uploads/" not in src:
         continue
    src = f"https://pic.netbian.com{src}"
    print(src)
        # 首先得到图片的本地文件的地址
    filename = os.path.basename(src)
    with open(f"美女图片/{filename}", "wb") as f:
        resp_img = requests.get(src)
        f.write(resp_img.content)

urls = ["https://pic.netbian.com/4kdongman/"]+[   #循环从第一页到123页
    f:= f"https://pic.netbian.com/4kdongman/index_{i}.html"
    for i in range(2,123)
]

for url in urls :
    print("正在爬取",url)
    html = craw_html(url)
    parse_and_download(html)