python爬取网络图片

最新推荐文章于 2023-10-06 22:07:56 发布

超哥--

最新推荐文章于 2023-10-06 22:07:56 发布

阅读量492

点赞数 1

文章标签： python

本文链接：https://blog.csdn.net/weixin_50835854/article/details/113845651

版权

本文记录了一位初学者使用Python爬虫从彼岸图网爬取4K美女图片的过程。首先，从指定URL获取页面，接着分析HTML，通过href爬取高清晰度图片链接，并实现翻页功能。最后，提供了完整代码示例，展示了如何保存图片。爬虫技术入门简单，多加练习即可掌握。

摘要由CSDN通过智能技术生成

前言

最近刚刚学习完网络爬虫，作为一个正常的男人（lsp）爬取网络美女图片自认必不可少，博主第一次自己写爬虫，历时十五个小时，勉强完成任务

一、获取url

博主不建议直接从百度爬取，百度图片较多，来源复杂，对html的分析能力较高。可以找一些中小型的图片网站。博主爬取的是彼岸图网。 url=http://pic.netbian.com/ 博主爬虫采用request爬取，所以对网页进一步深入，进入4K美女

url=http://pic.netbian.com/4kmeinv/

二、分析html

1.爬取href

博主尝试了直接爬取jpg，发现图片清晰度很低，弄成电脑壁纸一篇马赛克，所以爬取了这个图片指向的超链接，然后进行访问，在进行爬取。

2.翻页

这个通过页面url的分析，加一个循环即可。

三、编写代码

话不多说，代码奉上

from bs4 import BeautifulSoup
import requests
import os
import re

def GetPicture(lists):
    url='http://pic.netbian.com/'+lists
    root="./"
    path=root+url.split('/')[-1]
    try:
        if not os.path.exists(root):
            os.mkdir(root)
        if not os.path.exists(path):
            r=requests.get(url)
            with open(path,'wb') as f:
                f.write(r.content)
                f.close()
                print("文件保存成功")
        else:
            print("文件已存在")
    except:
        print("爬取失败")
    pass           
def FirstPage():
    r=requests.get("http://pic.netbian.com/4kmeinv/index.html",timeout=30)
    r.encoding=r.apparent_encoding
    ls=Gethtml(r.text)
    for i in ls:
        r=requests.get("http://pic.netbian.com"+i,timeout=30)
        r.encoding=r.apparent_encoding
        demo=r.text
        soup=BeautifulSoup(demo,"html.parser")
        for link in soup.find_all('img'):
                s=link.get('src')
                print(s)
                GetPicture(s)
                break
    pass

def untilLast():   
    for a in range(2,174):
        r=requests.get("http://pic.netbian.com/4kmeinv/index_"+str(a)+".html",timeout=30)
        r.encoding=r.apparent_encoding
        ls=Gethtml(r.text)
        for i in ls:
            r=requests.get("http://pic.netbian.com/"+i,timeout=30)
            r.encoding=r.apparent_encoding
            demo=r.text
            soup=BeautifulSoup(demo,"html.parser")
            for link in soup.find_all('img'):
                s=link.get('src')
                GetPicture(s)
                break
        pass

def Gethtml(html):
    ls=[]
    soup=BeautifulSoup(html,"html.parser")
    for link in soup.find("ul",class_="clearfix"):
        try:
            for i in link.find_all('a'):
                ls.append(i.get('href'))
        except AttributeError:
            pass
    # 跟NavigableString说拜拜
    return ls

def main():
    FirstPage()
    untilLast()
main()

如果网站没有更新，大家可以直接复制代码爬取照片。

总结

爬虫初级还是比较简单，多多练习就好。

超哥--

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
2
评论
python爬取网络图片

文章目录前言一、获取url二、分析html1.爬取href2.翻页三、编写代码总结前言最近刚刚学习完网络爬虫，作为一个正常的男人（lsp）爬取网络美女图片自认必不可少，博主第一次自己写爬虫，历时十五个小时，勉强完成任务一、获取url博主不建议直接从百度爬取，百度图片较多，来源复杂，对html的分析能力较高。可以找一些中小型的图片网站。博主爬取的是彼岸图网。url=http://pic.netbian.com/博主爬虫采用request爬取，所以对网页进一步深入，进入4K美女url=h
复制链接

扫一扫