使用python的opencv编写能够检测猫脸的模型可见,我前面的文章有较为详细的见解,链接如下:
python使用opencv对猫脸进行检测,并且框出猫脸_小琼带你轻松学编程的博客-CSDN博客
python爬取网页数据代码可详见我以前两篇文章,都有介绍,链接如下:
Python使用css方式爬去链家网数据写入csv文档_小琼带你轻松学编程的博客-CSDN博客
python使用xpath爬取网页数据_小琼带你轻松学编程的博客-CSDN博客
首先导入所需要的库,如果没有fake_useragent可把添加请求中的hedears注释掉,后面的request.get()里面的headers也可以去掉。
如果没有cv2、requests、lxml库可以分别使用pip install opencv-python
pip install requests
pip install lxml 分别下载这些库,导入库代码如下:
import cv2
import fake_useragent
import requests
from lxml import etree
import os
再设置请求头和鉴别猫的检测器,以前文章有讲解,这里不过多缀叙,代码如下:
# 添加请
headers = {"User-Agent": fake_useragent.FakeUserAgent().random} # 全局变量
classifier = cv2.CascadeClassifier("./haarcascade_frontalcatface.xml")
设置图片保存和读取的路径,还有爬虫的url。
path = r"../cat2/"
# 请求数据接口
uurl='https:'
爬取网页上的图片数据,并且保存在本地,每段代码都含义可看以前文章,为了节省时间,这次爬取的只有一页,想爬取多页可在for循环中i改为自己想爬取的页数。
for i in range(1,2):
url=f"https://www.com/creative-image/mao/?page={i}"#输入你需要爬取猫脸的网址
response = requests.get(url, headers=headers)
html = etree.HTML(response.text) # lxml实现
Area = html.xpath('//*[@id="imageContent"]/section/div/figure/a/img/@data-src')
for Area1 in Area:
print(Area1)
filename=Area1.split('/')[-1]
print(path+filename)
response1=requests.get(uurl+Area1, headers=headers)
with open(path+filename, 'wb') as f:
f.write(response1.content)
img = cv2.imread(path+filename) # 读取图片
读取猫脸,并且使用鉴别器鉴别图片是否有猫脸,如果有这保留,并且输出It is cat,没有则删除,并输出It is not cat。
img = cv2.imread(path+filename) # 读取图片
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faceRects = classifier.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=3, minSize=(5, 5))
if len(faceRects): # 大于0则检测到猫脸
print(path+filename,'It is cat!')
else:
os.remove(path+filename)
print(path + filename, 'It is not cat!')
运行结果如下:
全部代码如下所示:
import cv2
import fake_useragent
import requests
from lxml import etree
import os
# 添加请
headers = {"User-Agent": fake_useragent.FakeUserAgent().random} # 全局变量
classifier = cv2.CascadeClassifier("./haarcascade_frontalcatface.xml")
# 保存图片的地址
path = r"../cat2/"
# 请求数据接口
uurl='https:'
for i in range(1,2):
url=f"https://www.com/creative-image/mao/?page={i}"#输入你需要爬取猫脸的网址
response = requests.get(url, headers=headers)
html = etree.HTML(response.text) # lxml实现
Area = html.xpath('//*[@id="imageContent"]/section/div/figure/a/img/@data-src')
for Area1 in Area:
print(Area1)
filename=Area1.split('/')[-1]
print(path+filename)
response1=requests.get(uurl+Area1, headers=headers)
with open(path+filename, 'wb') as f:
f.write(response1.content)
img = cv2.imread(path+filename) # 读取图片
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faceRects = classifier.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=3, minSize=(5, 5))
if len(faceRects): # 大于0则检测到猫脸
print(path+filename,'It is cat!')
else:
os.remove(path+filename)
print(path + filename, 'It is not cat!')