机器学习训练中图像处理常用到的几个小函数

xiaomu_347

已于 2024-04-18 16:33:20 修改

阅读量2.2k

点赞数 5

分类专栏：图像处理 # 机器学习文章标签：批resize 分割 rec/lst 读取图片

于 2020-04-20 09:59:36 首次发布

本文链接：https://blog.csdn.net/xiaomu_347/article/details/84847319

版权

机器学习同时被 2 个专栏收录

16 篇文章 2 订阅

订阅专栏

图像处理

10 篇文章 0 订阅

订阅专栏

最近在忙于图像处理，经常会需要用到批重命名、批resize、对图像进行分割、对图像进行旋转拉伸加噪点等数据扩容方式、对图像数据集打乱、切分训练集和验证集等操作，下面就这些操作进行下面总结：

（1）rename

import os

path="I:\\Sample"
count=1
###这里也可以借助glob库来实现
###all_img=glob.glob(path+"\*.jpg")

for item in os.listdir(path):
	src=os.path.join(path,item)
	dst=os.path.join(path,'gan_'+str(count)+'.jpg')
	try:
		os.rename(src,dst)
		print("convert %s to %s"%(src,dst))
		count=count+1
	except:
		continue

我们只需要修改path为自己图片的路径即可。补充一句，有时候需要根据情况创建文件夹，可以直接用下面这行命令

os.makedirs(path,exist_ok=True)

（2）resize

import os
import cv2

def resize_pic():
    src_path='原始图片位置'
    dst_path='缩放后的位置'
    for item in os.listdir(src_path):
        if item.endswith('.jpg'):
            pic_path=os.path.join(src_path,item)
            img=cv2.imread(pic_path)
            dst=cv2.resize(img,(256,256))  ##大小随需调整
            ddst_path=os.path.join(dst_path,item)
            cv2.imwrite(ddst_path,dst)

设置好初始位置和保存位置即可。

（3）crop

from PIL import Image

im = Image.open("需要分割图片的位置")
# 图片的宽度和高度
img_size = im.size
print("图片宽度和高度分别是{}".format(img_size))

xx = 5  ###xx和yy按需调整
yy = 5
x = img_size[0] // xx
y = img_size[1] // yy
for j in range(yy):
	for i in range(xx):
		left=i*x
		up = y*j
		right=left+x
		low = up+y
		region = im.crop((left,up,right,low))
		print((left,up,right,low))
		temp = str(i)+str(j)
		region.save("I:\\Sample\\crop\\"+temp+".jpg")

对图像进行xx * yy分割。

（4）data augmentation

from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img

datagen = ImageDataGenerator(
        rotation_range=10,
        width_shift_range=0.1,
        height_shift_range=0.1,
        shear_range=0.1,
        zoom_range=0.1,
        horizontal_flip=True,
        fill_mode='nearest')

img = load_img('10.jpg')  # 这是一个PIL图像
x = img_to_array(img)  # 把PIL图像转换成一个numpy数组，形状为(3, 150, 150)
x = x.reshape((1,) + x.shape)  # 这是一个numpy数组，形状为 (1, 3, 150, 150)

# 下面是生产图片的代码
# 生产的所有图片保存在 save_to_dir 目录下
i = 0
for batch in datagen.flow(x, batch_size=1,
                          save_to_dir='data_augmentation', save_prefix='datagen', save_format='jpg'):
    i += 1
    if i > 20:
        break  # 否则生成器会退出循环

利用Keras对图像进行旋转偏移等操作来实现对图像的数据增强，较少过拟合。

（5）打乱顺序shuffle

from sklearn.utils import shuffle

b_train,b_label = shuffle(b_train,b_label)  #####b_train为训练的图像   b_label为对应的标签

####针对三个或三个变量同时shuffle而不改变对应关系，可以采用设置相同随机种子点
x=["a","b","c","d"]
y=[0,1,2,3]
z=["m","n","o","p"]
from random import shuffle
random.seed(2019)
shuffle(x)
random.seed(2019)
shuffle(y)
random.seed(2019)
shuffle(z)
x,y,z    ###(['c', 'd', 'a', 'b'], [2, 3, 0, 1], ['o', 'p', 'm', 'n'])

（6）数据集分割

from sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.3,random_state=0)    ####测试集为30%，训练集为70%

（7）one_hot编码

import numpy as np
from keras.utils import to_categorical


data=[1,3,2,0,3,2,2,1,0,1]  ###demo，替换成自己的标签数据
data=np.array(data)

encoded=to_categorical(data)  ####one_hot编码
print(encoded)

inverted=np.argmax(encoded[0])  ###解码
print(inverted)

（8）将数据封批训练(结合队列来实现)

tf.train.slice_input_producer定义了样本放入文件名队列的方式，包括迭代次数，是否乱序等，要真正将文件放入文件名队列，还需要调用tf.train.start_queue_runners 函数来启动执行文件名队列填充的线程，之后计算单元才可以把数据读出来，否则文件名队列为空的，计算单元就会处于一直等待状态，导致系统阻塞。

import tensorflow as tf
 
images = ['img1', 'img2', 'img3', 'img4', 'img5']
labels= [1,2,3,4,5]
 
epoch_num=8
 
f = tf.train.slice_input_producer([images, labels],num_epochs=None,shuffle=False)
 
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(sess=sess, coord=coord)
    for i in range(epoch_num):
        k = sess.run(f)
        print '************************'
        print (i,k)
 
    coord.request_stop()
    coord.join(threads)

tf.train.batch是一个tensor队列生成器，作用是按照给定的tensor顺序，把batch_size个tensor推送到文件队列，作为训练一个batch的数据，等待tensor出队执行计算。结合起来例子如下：

import tensorflow as tf
import numpy as np
 
# 样本个数
sample_num=5
# 设置迭代次数
epoch_num = 2
# 设置一个批次中包含样本个数
batch_size = 3
# 计算每一轮epoch中含有的batch个数
batch_total = int(sample_num/batch_size)+1
 
# 生成4个数据和标签
def generate_data(sample_num=sample_num):
    labels = np.asarray(range(0, sample_num))
    images = np.random.random([sample_num, 224, 224, 3])
    print('image size {},label size :{}'.format(images.shape, labels.shape))
 
    return images,labels
 
def get_batch_data(batch_size=batch_size):
    images, label = generate_data()
    # 数据类型转换为tf.float32
    images = tf.cast(images, tf.float32)
    label = tf.cast(label, tf.int32)
 
    #从tensor列表中按顺序或随机抽取一个tensor
    input_queue = tf.train.slice_input_producer([images, label], shuffle=False)
 
    image_batch, label_batch = tf.train.batch(input_queue, batch_size=batch_size, num_threads=1, capacity=64)
    return image_batch, label_batch
 
image_batch, label_batch = get_batch_data(batch_size=batch_size)
 
with tf.Session() as sess:
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(sess, coord)
    try:
        for i in range(epoch_num):  # 每一轮迭代
            print ('************')
            for j in range(batch_total): #每一个batch
                print ('--------')
                # 获取每一个batch中batch_size个样本和标签
                image_batch_v, label_batch_v = sess.run([image_batch, label_batch])
                # for k in
                print(image_batch_v.shape, label_batch_v)
    except tf.errors.OutOfRangeError:
        print("done")
    finally:
        coord.request_stop()
    coord.join(threads)

(9)增加维度

常见的就是RGB图片是三维的，而我们验证图片时输入是四维的（图片数量维度），这个时候我们就需要对图片数据进行维度提升，

通常大家会用np.expand_dims(input,axis=0) ###升维度为[batch，height，width,channels]形式；

如果用TensorFlow库，也可以直接用tf.expand_dims（input，dim,name=None）来操作。如果是pytorch，则可以通过torch.unsqueeze(input,dim=0)来实现增维。

同样的减少维可以通过np.squeeze(arr, 0)来实现，或者torch.squeeze(arr,dim=0)等。

(10)几种读取图片方式

》利用PIL中的Image函数，这个函数读取出来不是array格式。这时候需要用 np.asarray(im) 或者np.array（）函数，区别是 np.array() 是深拷贝，np.asarray() 是浅拷贝。

from PIL import Image
import numpy as np
 
I = Image.open('./cc_1.png') 
I.show()    
I.save('./save.png')
I_array = np.array(I)
print(I_array.shape)

image=Image.fromarray(I)
image.show()

》利用matplotlib.pyplot as plt用于显示图片
# matplotlib.image as mpimg 用于读取图片
# 并且读取出来就是array格式

import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
I = mpimg.imread('./cc_1.png')
print (I.shape)
plt.imshow(I)
plt.show()

需要说明的是：

彩色图片plt.imshow(image_show)
黑白图片plt.imshow(image_show,cmap='gray')

》利用opencv-python接口
#cv2.imread()读出来同样是array形式，但是如果是单通道的图，读出来的是三通道的（GBR）

import cv2
I = cv2.imread('./cc_1.png')
print(I.shape)

》图像的存取我一般喜欢用scipy这个库里的东西，读出来是矩阵形式，并且按照（H，W，C）形式保存

import matplotlib.pyplot as plt
from scipy import misc
import scipy
I = misc.imread('./cc_1.png')
scipy.misc.imsave('./save1.png', I)
plt.imshow(I)
plt.show()

》用skimage库

from skimage import io,data
img=data.lena()
io.imshow(img)

(11)将图片根据类别放到不同文件夹下

# 按类别存放图片
import shutil
import pandas as pd

# 各类别文件夹路径
train_classify_dir = r"save_path" 
# 所有图片所在路径
train_dir = r"img_data_path"
res_data=pd.read_csv("xxx.csv")

# res_data.groupby('id')根据id类别信息对图片分类
for i, (label, group) in enumerate(res_data.groupby('id')):
    each_ID_dir = r"save_path{}".format(label)
    if not os.path.exists(each_ID_dir):
        os.mkdir(each_ID_dir)
    print(each_ID_dir)
    for img_path in group["image"]:
        shutil.copy(os.path.join(train_dir, img_path), each_ID_dir)

补充一点，python实现对文件的复制/剪切/删除

import shutil,os

shutil.copy(source_path,aim_path)        ###copy a file
shutil.copytree(source_path,aim_path)    ###copy a folder

shutil.move('folder1','./')    ###类似与mv命令

os.remove(aim_path)            ###Delete a file
shutil.rmtree(aim_path)        ###Delete a folder

shutil.make_archive(base_name, format,...)  ##创建压缩包并返回文件路径，例如：zip、tar

（12）通过字符串形式运行代码（可以嵌入服务器运行）

import os
print("Generating .rec files...")
os.system("python ./im2rec.py xx(rec名字) yy(图片路径) --list --recursive --train-ratio 0.8")
os.system('python ./im2rec.py --num-thread=4 xx_train.lst yy')
os.system('python ./im2rec.py --num-thread=4 xx_val.lst yy')
print("Generating over!")

参考链接：python读取图像的几种方法_hjxu2016的博客-CSDN博客_python 读取图像

（13）kmeans聚类

def kmeans(boxes, k, dist=np.median):
    """
    Calculates k-means clustering with the Intersection over Union (IoU) metric.
    param:
        boxes: numpy array of shape (r, 2), where r is the number of rows
        k: number of clusters
        dist: distance function
    return:
        numpy array of shape (k, 2)
    """


    rows = boxes.shape[0]
    distances = np.empty((rows, k))
    last_clusters = np.zeros((rows,))

    np.random.seed()

    # the Forgy method will fail if the whole array contains the same rows
    clusters = boxes[np.random.choice(rows, k, replace=False)]

    while True:
        for row in range(rows):
            distances[row] = 1 - iou(boxes[row], clusters)

        nearest_clusters = np.argmin(distances, axis=1)

        if (last_clusters == nearest_clusters).all():
            break

        for cluster in range(k):
            clusters[cluster] = dist(boxes[nearest_clusters == cluster], axis=0)

        last_clusters = nearest_clusters

    return clusters

（14）对list数据进行多线程处理

def mul_process():
    file_list=get_list()  ###获得要处理的list信息
    m=8
    n = int(math.ceil(len(file_list) / float(m)))  # 向上取整
    result = []
    pool = multiprocessing.Pool(processes=m)  # 32进程
    for i in range(0, len(file_list), n):
        result.append(pool.apply_async(process_list, (file_list[i: i+n],)))  ##process_list为处理list的函数
    pool.close()
    pool.join()

（15）通过文件窗口选择目标文件

import tkinter as tk

def get_path():
    tt=tk.Tk()
    tt.withdraw()
    path1=tk.filedialog.askopenfilename()
    path2=tk.filedialog.askdirectory()
    print("选择文件路径：",path1)
    print("选择文件夹路径：",path2)

get_path()

（16）opencv的那些事

》当你用cv2.imread()读取图片，而不是用cv2.imwrite()进行图像保存时，这时候可能会出现保存的图像颜色变成蓝精灵等情况，这是用为opencv在读取图片时是以BGR的顺序，而plt是按照RGB的顺序，所以在操作时需要对通道进行转换

import cv2
import matplotlib.pyplot as plt


###注解：img[:,:,0]表示图片的蓝色通道，熟悉Python的同学应该知道，对一个字符串s进行翻转用的是###s[::-1]，同样img[:,:,::-1]就表示BGR通道翻转，变成RGB
 
img = cv2.imread('lena.jpg')
img2 = img[:, :, ::-1]    
# 或使用
# img2 = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
 
# 显示不正确的图
plt.subplot(121),plt.imshow(img) 
# 显示正确的图
plt.subplot(122)
plt.xticks([]), plt.yticks([]) # 隐藏x和y轴
plt.imshow(img2)
 
plt.show()

》在利用cv2.VideoWriter对处理视频进行保存时，发现内存为0，而且还不报错？在网上找了一圈，最后发现一种替换方法

!pip install sk-video
import cv2
from skvideo.io import vwrite
from skvideo.io import FFmpegWriter
cap = cv2.VideoCapture('Input.mp4')
fps=cap.get(cv2.CAP_PROP_FPS)
W=int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
H=int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
out = FFmpegWriter('Output.avi', 
            inputdict={'-r': str(fps), '-s':'{}x{}'.format(W,H)},
            outputdict={'-r': str(fps), '-c:v': 'libx264', '-preset': 'ultrafast', '-pix_fmt': 'yuv444p'})
while True:
  success , frame = cap.read()
  if success==True:
    out.writeFrame(frame)
  else:
    break
cap.release()

case2:

from cv2 import cv2
from datetime import datetime
import os,sys

# 详解cv2.VideoWriter_fourcc对象(摘自Learning OpenCV3 Computer Vision with Python,坦白讲不太懂)
# fourcc意为四字符代码（Four-Character Codes），顾名思义，该编码由四个字符组成,下面是VideoWriter_fourcc对象一些常用的参数，注意：字符顺序不能弄混
# cv2.VideoWriter_fourcc('I', '4', '2', '0'),该参数是YUV编码类型，文件名后缀为.avi
# cv2.VideoWriter_fourcc('P', 'I', 'M', 'I'),该参数是MPEG-1编码类型，文件名后缀为.avi
# cv2.VideoWriter_fourcc('X', 'V', 'I', 'D'),该参数是MPEG-4编码类型，文件名后缀为.avi
# cv2.VideoWriter_fourcc('T', 'H', 'E', 'O'),该参数是Ogg Vorbis,文件名后缀为.ogv

#捕获摄像头帧
save_file_path=str(datetime.now().date())
if not os.path.exists(save_file_path):
    os.makedirs(save_file_path)
try:
    cameraCapture = cv2.VideoCapture(0)
except:
    cameraCapture = cv2.VideoCapture(1)
    
size=(int(cameraCapture.get(cv2.CAP_PROP_FRAME_WIDTH)),int(cameraCapture.get(cv2.CAP_PROP_FRAME_HEIGHT)))
videoWriter=cv2.VideoWriter(os.path.join(save_file_path,datetime.now().time().strftime("%H_%M_%S")+".avi"),cv2.VideoWriter_fourcc('X', 'V', 'I', 'D'),30,size)

try:
    success,frame = cameraCapture.read()
    cv2.namedWindow('cap', cv2.WINDOW_NORMAL)
    while success:
        if cv2.getWindowProperty('cap',1) < 0:
            break
            
        cv2.putText(frame,str(datetime.now()),(10,30),cv2.FONT_HERSHEY_PLAIN,1,(0,255,0), 1)
        cv2.imshow("cap",frame)
        videoWriter.write(frame)
        success,frame=cameraCapture.read()
            
        
        if cv2.waitKey(10) & 0xFF == ord('q'):
            cameraCapture.release()
            videoWriter.release()
            cv2.destroyAllWindows()
            sys.exit()
            break
        
except Exception as e:
    print(e)
    print("cannot find camera")

（17）数据清洗

#!/usr/bin/env python
#-*- coding:utf-8 -*-
# author: albert time: 2020/7/16

import cv2
import os,glob
import xml.etree.ElementTree as ET

####检测图片数据是否损坏
def select_error_img(img_path):
    img_paths=glob.glob(img_path+"/*.jpg")
    for i in img_paths:
        img=cv2.imread(i)
        if img==None:
            with open("error_img.txt","a+") as f:
                f.write(i+"\n")

####检测图样本标注文件是否错误
def select_error_lab(lab_path):
    lab_paths=glob.glob(lab_path+'/*.xml')
    for label in lab_paths:
        tree = ET.parse(label)
        image_size = tree.find('size')
        image_width = int(float(image_size.find('width').text))
        image_height = int(float(image_size.find('height').text))
        if image_height==0 or image_width ==0:
            with open("error_label","a+") as f:
                f.write(label+"\n")


####检测样本数据和标签是否对其，匹配,以voc格式来进行说明
def match_img_lab(img_path,lab_path):
    img_paths=glob.glob(img_path+"/*.jpg")
    lab_paths=glob.glob(lab_path+"/*.xml")
    assert len(img_paths)==len(lab_paths),"图片数量小于标注数量"
    for i in img_paths:
        lab=i.replace("JEPGImages","Annotations").replace("jpg","xml")
        if lab not in lab_paths:
            with open("error_img_label","a+") as f:
                f.write(i+"\n")



if __name__=="__main__":
    img_path="./data/my_data/source_data/JPEGImages"
    lab_path="./data/my_data/source_data/Annotations"
    ##先测样本质量
    select_error_img(img_path)
    ##再测标注质量
    select_error_lab(lab_path)
    ##最后测样本和标注对齐问题
    match_img_lab(img_path,lab_path)

（18）路径问题

有时候我们从GitHub上clone下一些项目后，发现一些博主会根据项目应用场景进行分类保存，这样会导致一些脚本运行时会出现未发现包的错误，对待这种问题，常见的解决方法有两种，

将脚本所需要的包直接copy到运行脚本的目录下，或者直接修改脚本配置路径。
改变当前脚本的当前运行路径

import os

#os.chdir("E:\\nanodet\\nanodet-main")
#os.getcwd()

#import sys
#sys.path.append("E:\\nanodet\\nanodet-main") #绝对确切地址
#sys.path.append(os.path.dirname(os.path.abspath('__file__')))

（19）训练结果可视化

基于tensorboard进行训练结果的可视化，tf可以查看tf.summary.scalar使用方法_z2539329562的博客-CSDN博客_tf.summary.scalar()

from torch.utils.tensorboard import SummaryWriter

# default 'log_dir' is 'runs'
writer = SummaryWriter("./logs")

writer.add_graph("src",image)

# write to tensorbard
writer.add_image("image", img_grid)

# x is input data
writer.add_graph(model, x)

#add loss
for epoch in range(epochs):
    for i, data in enumerate(db):
        ...
        if i % 1000 == 999:

            # ... log the running loss
            writer.add_scalar(
                    "Training loss",
                     running_loss / 1000,
                     epoch * len(db) + i
                )

对保存的结果进行可视化指令

tensorboard --logdir="./logs"

常见的另一种方法就是借助logging来记录一些训练结果

import logging
 
###参考https://blog.csdn.net/qq_38765642/article/details/109716675

logging.basicConfig(level=logging.INFO, filename='./test.log', filemode='w')

logging.info('aaaa')   ###aaaa为写入的内容

（20）显示实时处理进度

有时候在处理较大数据时，可能会需要花费一段时间，但是我们又很想知道处理的进度，这时候除了tqdm外，

# -*- coding: utf-8 -*-
 
from tqdm import tqdm
from collections import OrderedDict
 
total = 10000 #总迭代次数
loss = total
with tqdm(total=total, desc="进度条") as pbar:
    for i  in range(total):
        loss -= 1 
#        pbar.set_postfix(OrderedDict(loss='{0:1.5f}'.format(loss)))
        pbar.set_postfix({'loss' : '{0:1.5f}'.format(loss)}) #输入一个字典，显示实验指标
        pbar.update(1)

我们还可以借助下面代码实现

import sys,time
for i in range(100):  ###100换成dataset.size()
    k = i + 1
    str = '>'*(i//2)+' '*((100-k)//2)
    sys.stdout.write('\r'+str+'[%s%%]'%(i+1))
    sys.stdout.flush()
    time.sleep(0.1)

(21) base64 ----cv2

import base64
import numpy as np
import cv2
from PIL import Image
from io import BytesIO
import time


# cv2转base64
def cv2_to_base64(img):
    img = cv2.imencode('.jpg', img)[1]
    image_code = str(base64.b64encode(img))[2:-1]

    return image_code


# base64转cv2
def base64_to_cv2(base64_code):
    img_data = base64.b64decode(base64_code)
    img_array = np.fromstring(img_data, np.uint8)
    img = cv2.imdecode(img_array, cv2.COLOR_RGB2BGR)

    return img


# base64转PIL
def base64_to_pil(base64_str):
    image = base64.b64decode(base64_str)
    image = BytesIO(image)
    image = Image.open(image)

    return image


# PIL转base64
def pil_to_base64(image):
    img = cv2.cvtColor(np.asarray(image), cv2.COLOR_RGB2BGR)
    base64_str = cv2_to_base64(img)

    return base64_str


# PIL转cv2
def pil_to_cv2(image):
    img = cv2.cvtColor(np.asarray(image), cv2.COLOR_RGB2BGR)

    return img


# cv2转PIL
def cv2_to_pil(img):
    image = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))

    return image


if __name__ == '__main__':
    cv2_img = cv2.imread('./test.jpg')
    start_time =time.time()
    code = cv2_to_base64(cv2_img)
    print(code)

    img = base64_to_cv2(code)
    cv2.imwrite('test.jpg', img)

未完待续！

xiaomu_347

关注

5
点赞
踩
23

收藏

觉得还不错? 一键收藏
1
评论
机器学习训练中图像处理常用到的几个小函数

最近在忙于图像处理，经常会需要用到批重命名、批resize、对图像进行分割、对图像进行旋转拉伸加噪点等数据扩容方式、对图像数据集打乱、切分训练集和验证集等操作，下面就这些操作进行下面总结：（1）renameimport ospath="I:\\Sample"count=1for item in os.listdir(path): src=os.path.join...
复制链接

扫一扫

专栏目录