【python计算机视觉】1、常用库及基本操作

最新推荐文章于 2024-09-05 10:22:51 发布

孤单中颤抖

最新推荐文章于 2024-09-05 10:22:51 发布

阅读量2.5k

点赞数 4

分类专栏： python计算机视觉文章标签： python 深度学习神经网络

本文链接：https://blog.csdn.net/weixin_42780429/article/details/119183130

版权

python计算机视觉专栏收录该内容

3 篇文章 2 订阅

订阅专栏

一、python计算机视觉中常用的库

一般我们在处理计算机视觉任务时会同时调用多种库，初学者在复现论文中的代码时往往会不知所措，不明白各种库的作用。这里笔者简单介绍一下自己平时在处理计算机视觉任务时的是如何将这些库函数与我们的计算机视觉任务进行关联，不足之处也请各位可以指出。
计算机视觉任务需要调用的库大致可分为三类：图像处理类（PIL、OpenCV、Matplotlib等），数学类（Numpy），神经网络类（Tensorflow、Pytorch等）。首先，我们通过调用图像处理类的库函数，进行图像处理的基本操作，例如：图像读取、颜色空间变换、以及常用的图像处理算法。然后，当你需要实现自己的图像处理算法时，需要将图像数据类型转换为数组类型进行运算，这时候需要使用数学类的库函数。最后，使用神经网络时，则需要将数组类型的数据转换到张量类型。

（一）PIL（Python Image Library）图像处理库

（二）Matplotlib

（三）Numpy

（四）Pytorch

（五）torchvision

（六）SKimage

（七）OpenCV

SKimage（scikit-image SciKit
(toolkit for SciPy)）、OpenCV等都是优秀的计算机视觉库

二、基本操作

（一）利用PIL读取图像数据

from PIL import Image

imgPath = "F:/path/test.png"#图像路径
img = Image.open(imgPath)   #读取图像，保存为PIL.Image类型，默认为RGB格式
#<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1728x2304 at 0x241C742AA90>

（二）使用Matplotlib显示图像

import matplotlib.pyplot as plt
plt.imshow(img)#plt.imshow()函数负责对图像进行处理，并显示其格式，但是不能显示图像内容。
plt.axis('off')#显示的图像不展示轴线
plt.show()     #显示图像内容

（三）PIL类型与Numpy类型转换

import numpy as np
img = np.array(img)#将PIL.Image类型转换为np类型
img = Image.fromarray(img.astype('uint8'))#将np类型转换为PIL.Image类型

（四) Numpy类型与torch类型互换

import torch
img = torch.from_numpy(img).float()#将np类型转换为张量
img = img.numpy()#将张量转换为np类型

（五）保存张量为图像

import torchvision

imgPath = "F:/path/test.png"
torchvision.utils.save_image(img,imgPath)

三、常用的函数

（一）打开图像，返回张量

from PIL import Image
import numpy as np
import torch

from PIL import Image
import numpy as np
import torch

#输入图片路径，返回四维张量
def openImage(path, w=-1, h=-1, mode='RGB'):
    img = Image.open(path)                 #打开路径下的图片，保存维PIL.Image类型
    if(w==-1):
        w,_ = img.size
    if(h==-1):
        _,h = img.size
    img = img.resize((w,h),Image.ANTIALIAS)#修改图像尺寸
    img = img.convert(mode)                #转换颜色空间
    img = np.array(img)                    #将Image类型转换维Numpy类型
    img = img/255.0                        #将图像进行归一化
    img = torch.from_numpy(img).float()    #将Numpy类型转换维张量
    d = img.dim()
    if(d==2):
        img = img.unsqueeze(0).unsqueeze(0)
    elif(d==3):
        img = img.permute(2,0,1)           #更换维度，因为Image表示通道在第三维，变为张量后转换到第一维
        img = img.unsqueeze(0)             #增加维度
    return img

（二）显示使用张量表示的图像

from PIL import Image
import numpy as np
import torch
import matplotlib.pyplot as plt

#输入一个四维张量，转换为PIL格式后显示
def showImage(img, mode='RGB'):
    _,c,_,_ = img.size()                                     #获取img的尺寸
    if(c==1):                                                #判断是否为单通道
        img = img[0][0]
    else:
        img = img[0]
        img = img.permute(1,2,0)
    img = img.numpy()                                        #将张量转换为np数组
    img = img*255.0
    img = Image.fromarray(img.astype('uint8')).convert(mode)#将np数组转换为PIL类型
    plt.imshow(img)
    plt.axis('off')
    plt.show()

（三）保存张量为图像

from PIL import Image
import numpy as np
import torch
import matplotlib.pyplot as plt

#输入一个四维张量，转换为PIL格式后保存
def saveImage(img,path,mode='RGB'):
    _,c,_,_ = img.size()                                     #获取img的尺寸
    if(c==1):                                                #判断是否为单通道
        img = img[0][0]
    elif(c==3):
        img = img[0]
        img = img.permute(1,2,0)
    img = img.numpy()                                        #将张量转换为np数组
    img = img*255.0
    img = Image.fromarray(img.astype('uint8')).convert(mode)#将np数组转换为PIL类型
    img.save(path)