利用Kali爬取小姐姐图片

编程瞬息全宇宙

于 2024-06-05 16:48:34 发布

阅读量255

点赞数 7

文章标签： web安全安全学习 php 人工智能

本文链接：https://blog.csdn.net/wholeliubei/article/details/139476517

版权

Python的强大之处在于网络爬虫。本文为大家分享一个简单的爬虫实例。爬取图片站中的小姐姐。

环境

Python3.8
BeautifulSoup 依赖

在kali中以上环境都是默认安装好的。我们无需安装，如果你是其他环境，缺少的依赖可以pip命令安装即可。如

pip3 install BeautifulSoup

使用

将下面代码保存为*.py文件。在kali中直接运行即可！

import os   import re   import time   from urllib import request   from bs4 import BeautifulSoup      def get_last_page(text):   return int(re.findall('[^/$]\d*', re.split('/', text)[-1])[0])           def html_parse(url, headers):   time.sleep(3)   resp = request.Request(url=url, headers=headers)   res = request.urlopen(resp)   html = res.read().decode("utf-8")   soup = BeautifulSoup(html, "html.parser")   return soup      headers = {       'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36 Edg/91.0.864.59'   }       url = "https://www.2meinv.com/"   for p in range(1, 5 + 1):#这里我只爬了前5页   next_url = url + "index-" + str(p) + ".html"   soup = html_parse(next_url, headers)   link_node = soup.findAll('div', attrs={"class": "dl-name"})   for a in link_node: #下面是图片存放的路劲   path = "/root/image/2meinv/"     href = a.find('a', attrs={'target': '_blank'}).get('href')   no = re.findall('[^-$][\d]', href)[1] + re.findall('[^-$][\d]', href)[2]     first_url = url + "/article-" + no + ".html"     title = a.find('a', attrs={'target': '_blank'}).text     path = path + title + "/"     soup = html_parse(href, headers)     count = soup.find('div', attrs={'class': 'des'}).find('h1').text     last_page = get_last_page(count)           for i in range(1, last_page + 1):               next_url = url + "/article-" + no + "-" + str(i) + ".html"               soup = html_parse(next_url, headers)               image_url = soup.find('img')['src']               image_name = image_url.split("/")[-1]               fileName = path + image_name               if not os.path.exists(path):                   os.makedirs(path)               if os.path.exists(fileName):                   continue               request.urlretrieve(image_url, filename=fileName)               request.urlcleanup()           print(title, "下载完成了")