最近在忙于图像处理,经常会需要用到批重命名、批resize、对图像进行分割、对图像进行旋转拉伸加噪点等数据扩容方式、对图像数据集打乱、切分训练集和验证集等操作,下面就这些操作进行下面总结:
(1)rename
import os
path="I:\\Sample"
count=1
###这里也可以借助glob库来实现
###all_img=glob.glob(path+"\*.jpg")
for item in os.listdir(path):
src=os.path.join(path,item)
dst=os.path.join(path,'gan_'+str(count)+'.jpg')
try:
os.rename(src,dst)
print("convert %s to %s"%(src,dst))
count=count+1
except:
continue
我们只需要修改path为自己图片的路径即可。补充一句,有时候需要根据情况创建文件夹,可以直接用下面这行命令
os.makedirs(path,exist_ok=True)
(2)resize
import os
import cv2
def resize_pic():
src_path='原始图片位置'
dst_path='缩放后的位置'
for item in os.listdir(src_path):
if item.endswith('.jpg'):
pic_path=os.path.join(src_path,item)
img=cv2.imread(pic_path)
dst=cv2.resize(img,(256,256)) ##大小随需调整
ddst_path=os.path.join(dst_path,item)
cv2.imwrite(ddst_path,dst)
设置好初始位置和保存位置即可。
(3)crop
from PIL import Image
im = Image.open("需要分割图片的位置")
# 图片的宽度和高度
img_size = im.size
print("图片宽度和高度分别是{}".format(img_size))
xx = 5 ###xx和yy按需调整
yy = 5
x = img_size[0] // xx
y = img_size[1] // yy
for j in range(yy):
for i in range(xx):
left=i*x
up = y*j
right=left+x
low = up+y
region = im.crop((left,up,right,low))
print((left,up,right,low))
temp = str(i)+str(j)
region.save("I:\\Sample\\crop\\"+temp+".jpg")
对图像进行xx * yy分割。
(4)data augmentation
from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
datagen = ImageDataGenerator(
rotation_range=10,
width_shift_range=0.1,
height_shift_range=0.1,
shear_range=0.1,
zoom_range=0.1,
horizontal_flip=True,
fill_mode='nearest')
img = load_img('10.jpg') # 这是一个PIL图像
x = img_to_array(img) # 把PIL图像转换成一个numpy数组,形状为(3, 150, 150)
x = x.reshape((1,) + x.shape) # 这是一个numpy数组,形状为 (1, 3, 150, 150)
# 下面是生产图片的代码
# 生产的所有图片保存在 save_to_dir 目录下
i = 0
for batch in datagen.flow(x, batch_size=1,
save_to_dir='data_augmentation', save_prefix='datagen', save_format='jpg'):
i += 1
if i > 20:
break # 否则生成器会退出循环
利用Keras对图像进行旋转偏移等操作来实现对图像的数据增强,较少过拟合。
(5)打乱顺序shuffle
from sklearn.utils import shuffle
b_train,b_label = shuffle(b_train,b_label) #####b_train为训练的图像 b_label为对应的标签
####针对三个或三个变量同时shuffle而不改变对应关系,可以采用设置相同随机种子点
x=["a","b","c","d"]
y=[0,1,2,3]
z=["m","n","o","p"]
from random import shuffle
random.seed(2019)
shuffle(x)
random.seed(2019)
shuffle(y)
random.seed(2019)
shuffle(z)
x,y,z ###(['c', 'd', 'a', 'b'], [2, 3, 0, 1], ['o', 'p', 'm', 'n'])
(6)数据集分割
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.3,random_state=0) ####测试集为30%,训练集为70%
(7)one_hot编码
import numpy as np
from keras.utils import to_categorical
data=[1,3,2,0,3,2,2,1,0,1] ###demo,替换成自己的标签数据
data=np.array(data)
encoded=to_categorical(data) ####one_hot编码
print(encoded)
inverted=np.argmax(encoded[0]) ###解码
print(inverted)
(8)将数据封批训练(结合队列来实现)
tf.train.slice_input_producer定义了样本放入文件名队列的方式,包括迭代次数,是否乱序等,要真正将文件放入文件名队列,还需要调用tf.train.start_queue_runners 函数来启动执行文件名队列填充的线程,之后计算单元才可以把数据读出来,否则文件名队列为空的,计算单元就会处于一直等待状态,导致系统阻塞。
import tensorflow as tf
images = ['img1', 'img2', 'img3', 'img4', 'img5']
labels= [1,2,3,4,5]
epoch_num=8
f = tf.train.slice_input_producer([images, labels],num_epochs=None,shuffle=False)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
for i in range(epoch_num):
k = sess.run(f)
print '************************'
print (i,k)
coord.request_stop()
coord.join(threads)
tf.train.batch是一个tensor队列生成器,作用是按照给定的tensor顺序,把batch_size个tensor推送到文件队列,作为训练一个batch的数据,等待tensor出队执行计算。结合起来例子如下:
import tensorflow as tf
import numpy as np
# 样本个数
sample_num=5
# 设置迭代次数
epoch_num = 2
# 设置一个批次中包含样本个数
batch_size = 3
# 计算每一轮epoch中含有的batch个数
batch_total = int(sample_num/batch_size)+1
# 生成4个数据和标签
def generate_data(sample_num=sample_num):
labels = np.asarray(range(0, sample_num))
images = np.random.random([sample_num, 224, 224, 3])
print('image size {},label size :{}'.format(images.shape, labels.shape))
return images,labels
def get_batch_data(batch_size=batch_size):
images, label = generate_data()
# 数据类型转换为tf.float32
images = tf.cast(images, tf.float32)
label = tf.cast(label, tf.int32)
#从tensor列表中按顺序或随机抽取一个tensor
input_queue = tf.train.slice_input_producer([images, label], shuffle=False)
image_batch, label_batch = tf.train.batch(input_queue, batch_size=batch_size, num_threads=1, capacity=64)
return image_batch, label_batch
image_batch, label_batch = get_batch_data(batch_size=batch_size)
with tf.Session() as sess:
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess, coord)
try:
for i in range(epoch_num): # 每一轮迭代
print ('************')
for j in range(batch_total): #每一个batch
print ('--------')
# 获取每一个batch中batch_size个样本和标签
image_batch_v, label_batch_v = sess.run([image_batch, label_batch])
# for k in
print(image_batch_v.shape, label_batch_v)
except tf.errors.OutOfRangeError:
print("done")
finally:
coord.request_stop()
coord.join(threads)
(9)增加维度
常见的就是RGB图片是三维的,而我们验证图片时输入是四维的(图片数量维度),这个时候我们就需要对图片数据进行维度提升,
通常大家会用np.expand_dims(input,axis=0) ###升维度为[batch,height,width,channels]形式;
如果用TensorFlow库,也可以直接用tf.expand_dims(input,dim,name=None)来操作。如果是pytorch,则可以通过torch.unsqueeze(input,dim=0)来实现增维。
同样的减少维可以通过np.squeeze(arr, 0)来实现,或者torch.squeeze(arr,dim=0)等。
(10)几种读取图片方式
》利用PIL中的Image函数,这个函数读取出来不是array格式。这时候需要用 np.asarray(im) 或者np.array()函数,区别是 np.array() 是深拷贝,np.asarray() 是浅拷贝。
from PIL import Image
import numpy as np
I = Image.open('./cc_1.png')
I.show()
I.save('./save.png')
I_array = np.array(I)
print(I_array.shape)
image=Image.fromarray(I)
image.show()
》利用matplotlib.pyplot as plt用于显示图片
# matplotlib.image as mpimg 用于读取图片
# 并且读取出来就是array格式
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
I = mpimg.imread('./cc_1.png')
print (I.shape)
plt.imshow(I)
plt.show()
需要说明的是:
彩色图片plt.imshow(image_show)
黑白图片plt.imshow(image_show,cmap='gray')
》利用opencv-python接口
#cv2.imread()读出来同样是array形式,但是如果是单通道的图,读出来的是三通道的(GBR)
import cv2
I = cv2.imread('./cc_1.png')
print(I.shape)
》图像的存取我一般喜欢用scipy这个库里的东西,读出来是矩阵形式,并且按照(H,W,C)形式保存
import matplotlib.pyplot as plt
from scipy import misc
import scipy
I = misc.imread('./cc_1.png')
scipy.misc.imsave('./save1.png', I)
plt.imshow(I)
plt.show()
》用skimage库
from skimage import io,data
img=data.lena()
io.imshow(img)
(11)将图片根据类别放到不同文件夹下
# 按类别存放图片
import shutil
import pandas as pd
# 各类别文件夹路径
train_classify_dir = r"save_path"
# 所有图片所在路径
train_dir = r"img_data_path"
res_data=pd.read_csv("xxx.csv")
# res_data.groupby('id')根据id类别信息对图片分类
for i, (label, group) in enumerate(res_data.groupby('id')):
each_ID_dir = r"save_path{}".format(label)
if not os.path.exists(each_ID_dir):
os.mkdir(each_ID_dir)
print(each_ID_dir)
for img_path in group["image"]:
shutil.copy(os.path.join(train_dir, img_path), each_ID_dir)
补充一点,python实现对文件的复制/剪切/删除
import shutil,os
shutil.copy(source_path,aim_path) ###copy a file
shutil.copytree(source_path,aim_path) ###copy a folder
shutil.move('folder1','./') ###类似与mv命令
os.remove(aim_path) ###Delete a file
shutil.rmtree(aim_path) ###Delete a folder
shutil.make_archive(base_name, format,...) ##创建压缩包并返回文件路径,例如:zip、tar
(12)通过字符串形式运行代码(可以嵌入服务器运行)
import os
print("Generating .rec files...")
os.system("python ./im2rec.py xx(rec名字) yy(图片路径) --list --recursive --train-ratio 0.8")
os.system('python ./im2rec.py --num-thread=4 xx_train.lst yy')
os.system('python ./im2rec.py --num-thread=4 xx_val.lst yy')
print("Generating over!")
参考链接:python读取图像的几种方法_hjxu2016的博客-CSDN博客_python 读取图像
(13)kmeans聚类
def kmeans(boxes, k, dist=np.median):
"""
Calculates k-means clustering with the Intersection over Union (IoU) metric.
param:
boxes: numpy array of shape (r, 2), where r is the number of rows
k: number of clusters
dist: distance function
return:
numpy array of shape (k, 2)
"""
rows = boxes.shape[0]
distances = np.empty((rows, k))
last_clusters = np.zeros((rows,))
np.random.seed()
# the Forgy method will fail if the whole array contains the same rows
clusters = boxes[np.random.choice(rows, k, replace=False)]
while True:
for row in range(rows):
distances[row] = 1 - iou(boxes[row], clusters)
nearest_clusters = np.argmin(distances, axis=1)
if (last_clusters == nearest_clusters).all():
break
for cluster in range(k):
clusters[cluster] = dist(boxes[nearest_clusters == cluster], axis=0)
last_clusters = nearest_clusters
return clusters
(14)对list数据进行多线程处理
def mul_process():
file_list=get_list() ###获得要处理的list信息
m=8
n = int(math.ceil(len(file_list) / float(m))) # 向上取整
result = []
pool = multiprocessing.Pool(processes=m) # 32进程
for i in range(0, len(file_list), n):
result.append(pool.apply_async(process_list, (file_list[i: i+n],))) ##process_list为处理list的函数
pool.close()
pool.join()
(15)通过文件窗口选择目标文件
import tkinter as tk
def get_path():
tt=tk.Tk()
tt.withdraw()
path1=tk.filedialog.askopenfilename()
path2=tk.filedialog.askdirectory()
print("选择文件路径:",path1)
print("选择文件夹路径:",path2)
get_path()
(16)opencv的那些事
》当你用cv2.imread()读取图片,而不是用cv2.imwrite()进行图像保存时,这时候可能会出现保存的图像颜色变成蓝精灵等情况,这是用为opencv在读取图片时是以BGR的顺序,而plt是按照RGB的顺序,所以在操作时需要对通道进行转换
import cv2
import matplotlib.pyplot as plt
###注解:img[:,:,0]表示图片的蓝色通道,熟悉Python的同学应该知道,对一个字符串s进行翻转用的是###s[::-1],同样img[:,:,::-1]就表示BGR通道翻转,变成RGB
img = cv2.imread('lena.jpg')
img2 = img[:, :, ::-1]
# 或使用
# img2 = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# 显示不正确的图
plt.subplot(121),plt.imshow(img)
# 显示正确的图
plt.subplot(122)
plt.xticks([]), plt.yticks([]) # 隐藏x和y轴
plt.imshow(img2)
plt.show()
》在利用cv2.VideoWriter对处理视频进行保存时,发现内存为0,而且还不报错?在网上找了一圈,最后发现一种替换方法
!pip install sk-video
import cv2
from skvideo.io import vwrite
from skvideo.io import FFmpegWriter
cap = cv2.VideoCapture('Input.mp4')
fps=cap.get(cv2.CAP_PROP_FPS)
W=int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
H=int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
out = FFmpegWriter('Output.avi',
inputdict={'-r': str(fps), '-s':'{}x{}'.format(W,H)},
outputdict={'-r': str(fps), '-c:v': 'libx264', '-preset': 'ultrafast', '-pix_fmt': 'yuv444p'})
while True:
success , frame = cap.read()
if success==True:
out.writeFrame(frame)
else:
break
cap.release()
case2:
from cv2 import cv2
from datetime import datetime
import os,sys
# 详解cv2.VideoWriter_fourcc对象(摘自Learning OpenCV3 Computer Vision with Python,坦白讲不太懂)
# fourcc意为四字符代码(Four-Character Codes),顾名思义,该编码由四个字符组成,下面是VideoWriter_fourcc对象一些常用的参数,注意:字符顺序不能弄混
# cv2.VideoWriter_fourcc('I', '4', '2', '0'),该参数是YUV编码类型,文件名后缀为.avi
# cv2.VideoWriter_fourcc('P', 'I', 'M', 'I'),该参数是MPEG-1编码类型,文件名后缀为.avi
# cv2.VideoWriter_fourcc('X', 'V', 'I', 'D'),该参数是MPEG-4编码类型,文件名后缀为.avi
# cv2.VideoWriter_fourcc('T', 'H', 'E', 'O'),该参数是Ogg Vorbis,文件名后缀为.ogv
#捕获摄像头帧
save_file_path=str(datetime.now().date())
if not os.path.exists(save_file_path):
os.makedirs(save_file_path)
try:
cameraCapture = cv2.VideoCapture(0)
except:
cameraCapture = cv2.VideoCapture(1)
size=(int(cameraCapture.get(cv2.CAP_PROP_FRAME_WIDTH)),int(cameraCapture.get(cv2.CAP_PROP_FRAME_HEIGHT)))
videoWriter=cv2.VideoWriter(os.path.join(save_file_path,datetime.now().time().strftime("%H_%M_%S")+".avi"),cv2.VideoWriter_fourcc('X', 'V', 'I', 'D'),30,size)
try:
success,frame = cameraCapture.read()
cv2.namedWindow('cap', cv2.WINDOW_NORMAL)
while success:
if cv2.getWindowProperty('cap',1) < 0:
break
cv2.putText(frame,str(datetime.now()),(10,30),cv2.FONT_HERSHEY_PLAIN,1,(0,255,0), 1)
cv2.imshow("cap",frame)
videoWriter.write(frame)
success,frame=cameraCapture.read()
if cv2.waitKey(10) & 0xFF == ord('q'):
cameraCapture.release()
videoWriter.release()
cv2.destroyAllWindows()
sys.exit()
break
except Exception as e:
print(e)
print("cannot find camera")
(17)数据清洗
#!/usr/bin/env python
#-*- coding:utf-8 -*-
# author: albert time: 2020/7/16
import cv2
import os,glob
import xml.etree.ElementTree as ET
####检测图片数据是否损坏
def select_error_img(img_path):
img_paths=glob.glob(img_path+"/*.jpg")
for i in img_paths:
img=cv2.imread(i)
if img==None:
with open("error_img.txt","a+") as f:
f.write(i+"\n")
####检测图样本标注文件是否错误
def select_error_lab(lab_path):
lab_paths=glob.glob(lab_path+'/*.xml')
for label in lab_paths:
tree = ET.parse(label)
image_size = tree.find('size')
image_width = int(float(image_size.find('width').text))
image_height = int(float(image_size.find('height').text))
if image_height==0 or image_width ==0:
with open("error_label","a+") as f:
f.write(label+"\n")
####检测样本数据和标签是否对其,匹配,以voc格式来进行说明
def match_img_lab(img_path,lab_path):
img_paths=glob.glob(img_path+"/*.jpg")
lab_paths=glob.glob(lab_path+"/*.xml")
assert len(img_paths)==len(lab_paths),"图片数量小于标注数量"
for i in img_paths:
lab=i.replace("JEPGImages","Annotations").replace("jpg","xml")
if lab not in lab_paths:
with open("error_img_label","a+") as f:
f.write(i+"\n")
if __name__=="__main__":
img_path="./data/my_data/source_data/JPEGImages"
lab_path="./data/my_data/source_data/Annotations"
##先测样本质量
select_error_img(img_path)
##再测标注质量
select_error_lab(lab_path)
##最后测样本和标注对齐问题
match_img_lab(img_path,lab_path)
(18)路径问题
有时候我们从GitHub上clone下一些项目后,发现一些博主会根据项目应用场景进行分类保存,这样会导致一些脚本运行时会出现未发现包的错误,对待这种问题,常见的解决方法有两种,
- 将脚本所需要的包直接copy到运行脚本的目录下,或者直接修改脚本配置路径。
- 改变当前脚本的当前运行路径
import os
#os.chdir("E:\\nanodet\\nanodet-main")
#os.getcwd()
#import sys
#sys.path.append("E:\\nanodet\\nanodet-main") #绝对确切地址
#sys.path.append(os.path.dirname(os.path.abspath('__file__')))
(19)训练结果可视化
基于tensorboard进行训练结果的可视化,tf可以查看tf.summary.scalar使用方法_z2539329562的博客-CSDN博客_tf.summary.scalar()
from torch.utils.tensorboard import SummaryWriter
# default 'log_dir' is 'runs'
writer = SummaryWriter("./logs")
writer.add_graph("src",image)
# write to tensorbard
writer.add_image("image", img_grid)
# x is input data
writer.add_graph(model, x)
#add loss
for epoch in range(epochs):
for i, data in enumerate(db):
...
if i % 1000 == 999:
# ... log the running loss
writer.add_scalar(
"Training loss",
running_loss / 1000,
epoch * len(db) + i
)
对保存的结果进行可视化指令
tensorboard --logdir="./logs"
常见的另一种方法就是借助logging来记录一些训练结果
import logging
###参考https://blog.csdn.net/qq_38765642/article/details/109716675
logging.basicConfig(level=logging.INFO, filename='./test.log', filemode='w')
logging.info('aaaa') ###aaaa为写入的内容
(20)显示实时处理进度
有时候在处理较大数据时,可能会需要花费一段时间,但是我们又很想知道处理的进度,这时候除了tqdm外,
# -*- coding: utf-8 -*-
from tqdm import tqdm
from collections import OrderedDict
total = 10000 #总迭代次数
loss = total
with tqdm(total=total, desc="进度条") as pbar:
for i in range(total):
loss -= 1
# pbar.set_postfix(OrderedDict(loss='{0:1.5f}'.format(loss)))
pbar.set_postfix({'loss' : '{0:1.5f}'.format(loss)}) #输入一个字典,显示实验指标
pbar.update(1)
我们还可以借助下面代码实现
import sys,time
for i in range(100): ###100换成dataset.size()
k = i + 1
str = '>'*(i//2)+' '*((100-k)//2)
sys.stdout.write('\r'+str+'[%s%%]'%(i+1))
sys.stdout.flush()
time.sleep(0.1)
(21) base64 ----cv2
import base64
import numpy as np
import cv2
from PIL import Image
from io import BytesIO
import time
# cv2转base64
def cv2_to_base64(img):
img = cv2.imencode('.jpg', img)[1]
image_code = str(base64.b64encode(img))[2:-1]
return image_code
# base64转cv2
def base64_to_cv2(base64_code):
img_data = base64.b64decode(base64_code)
img_array = np.fromstring(img_data, np.uint8)
img = cv2.imdecode(img_array, cv2.COLOR_RGB2BGR)
return img
# base64转PIL
def base64_to_pil(base64_str):
image = base64.b64decode(base64_str)
image = BytesIO(image)
image = Image.open(image)
return image
# PIL转base64
def pil_to_base64(image):
img = cv2.cvtColor(np.asarray(image), cv2.COLOR_RGB2BGR)
base64_str = cv2_to_base64(img)
return base64_str
# PIL转cv2
def pil_to_cv2(image):
img = cv2.cvtColor(np.asarray(image), cv2.COLOR_RGB2BGR)
return img
# cv2转PIL
def cv2_to_pil(img):
image = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
return image
if __name__ == '__main__':
cv2_img = cv2.imread('./test.jpg')
start_time =time.time()
code = cv2_to_base64(cv2_img)
print(code)
img = base64_to_cv2(code)
cv2.imwrite('test.jpg', img)
未完待续!