pexels网站提供了大量贴图,从中搜索美女图片,编写爬虫进行下载,下载后图片中除了女人外,还包含男人,风景、静物和动物,调用百度人脸检测模块识别检测,将其中颜值大于60分的美女保存到另外一个文件夹。爬取图片共计1251张,最后过滤出的美女共计287张。上代码:
爬虫程序:
from bs4 import BeautifulSoup import requests import os import time save_path = 'F://photos/' url_path = 'https://www.pexels.com/search/' headers ={ 'accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 'user-agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36' } searchWord = 'beauty' urls = [url_path+searchWord+'/?page={}'.format(str(i)) for i in range(1,100)] if