死宅一枚。爬取5000张二次元妹子的图片,生成了头图。
接下来看看怎么实现的:
使用 Scrapy 框架爬取5000张二次元图
使用 opencv 批量格式化图片
将图片按照RGB值的均方根排序,实现效果
一、安装环境
1.安装 Scrapy 爬虫框架
pip install Scrapy
windows 安装可以 点击此处
2. 推荐使用 wheel 来安装 opencv 点击此处
3.安装 numpy 科学计算库
pip install numpy
4. 初始化一个 Scrapy 项目 acg
scrapy startproject acg
二、爬取图片
以下代码主要实现操作:
中间裁剪
统一大小
下载图片
重复抓取
/image.py
import scrapy
import urllib.request,urllib.parse
import numpy as np
import cv2
class acgimages(scrapy.Spider):
"""docstring for acgimages"""
name = 'images'
start_urls = [
"http://m.52dmtp.com/tupiandaquan/index_2.html"
]
count = 1
page = 2
def parse(self,response):
def imageSave(item,path):
try:
maxsize = 512
res = urllib.request.urlopen(item).read()
image = np.asarray(bytearray(res),dtype="uint8")
image = cv2.imdecode(image,cv2.IMREAD_COLOR)
height,width = image.shape[:2]
if height > width:
scalefactor = (maxsize*1.0) / width
res = cv2.resize(image,(int(width * scalefactor),(int(height * scalefactor))),interpolation = cv2.INTER_CUBIC)
cutImage = res[0:maxsize,0:maxsize]
if width >= height:
scalefactor = (maxsize*1.0) / height
res = cv2.resize(image,(int(width * scalefactor), int(height*scalefactor)), interpolation = cv2.INTER_CUBIC)
center_x = int(round(width*scalefactor*0.5))
cutImage = res[0:maxsize,int(center_x - maxsize/2):int(center_x + maxsize/2)]
cv2.imwrite(path,cutImage)
print('image is save in ' + path)
except:
print('image save error')
image_url = response.xpath("//div[@class='grid-wrap']//img/@src").extract()
for item in image_url:
item = item.split('?')[0]
item = urllib.parse.quote(item,safe='/:?=.')
if 'jpg' in item:
self.count = self.count