人工智能开发实战之计算机视觉处理

最新推荐文章于 2024-10-07 06:31:57 发布

天涯幺妹

最新推荐文章于 2024-10-07 06:31:57 发布

阅读量782

点赞数 25

分类专栏：大数据与人工智能大数据挖掘与分析深度学习与算法文章标签：人工智能计算机视觉机器学习 python kmeans 深度学习神经网络

本文链接：https://blog.csdn.net/sinat_30844883/article/details/141750759

版权

大数据挖掘与分析同时被 3 个专栏收录

31 篇文章 0 订阅

订阅专栏

大数据与人工智能

30 篇文章 0 订阅

订阅专栏

深度学习与算法

22 篇文章 0 订阅

订阅专栏

内容提要

计算机视觉开发介绍
手写数字识别
人脸识别

一、计算机视觉开发介绍

计算机视觉是一个跨学科的领域，涉及的部分学科如图：

计算机视觉是深度学习最先取得突破性成就的领域。2012年，在ILSVRC大赛上，基于卷积神经网络的AlexNet模型获得了当年图像分类的冠军。

历年ILSVRC比赛冠军模型错误率如图所示：

1、图像分类

在图像分类问题中，图像上只有单一类别，将很多带有标记的数据集进行训练之后，可以对新的、未知的、具有单一类别的图像进行预测，类似于教小孩子看图识物，这种方法是数据驱动的方法，也是图像分类最常用的方法。

2、目标检测

进行目标检测的图像中并不一定只有单一类别的物体。在处理这类问题时，需要在数据上针对各个对象画出边界框和标签，训练完成后可以对新的图像进行预测，目标检测如图所示，方框可以圈出猫的位置。

3、语义分割

语义分割与目标检测不同，语义分割需要对每个像素进行语义上的理解，由于需要对每个像素属于图像上的哪个部分做出分类，所以每个像素都拥有标签，语义分割如图所示。

4、计算机视觉比较突出的应用领域

医学图像检验：从图像数据中提取信息以诊断患者患病类别。

工业领域：在该领域，计算机视觉有时被称为机器视觉，如产品质量把控，机器视觉也大量运用于农业上，以去除不良幼苗或除虫。

安防、娱乐领域：传统机器学习的方法运用于人脸识别时并不能很好地满足精度要求，并且同一个人在不同光照、姿态下的特征会有差异，在深度学习运用于计算机视觉后，算法能够提升识别准确率。

光学字符识别：将计算机无法理解的图像形式转换成计算机可以理解的文本格式。

自动驾驶：可以在马路上无人驾驶汽车，还可以进行自动泊车等操作。

二、手写数字识别

2.1 项目介绍

本项目采用卷积神经网络，为了保证整个项目的完整性，在训练过程中不仅要显示损失或者准确率，而且在训练完成后需要保存得到的模型，然后调用摄像头来实时预测新的图像，新图像可以是数据集中的，也可以是自己手写的。

通过实现整个过程，将OpenCV、神经网络以及TensorFlow结合起来学习，项目流程图如图所示。

本次训练过程依然使用该网络，并且在最后训练出模型，模型文件以变量的形式存储参数，该变量需要在代码中初始化。

在训练过程中，将更新的参数存储到变量中，使用tf.train.Saver()对象将所有的变量添加到Graph中。

保存模型的函数为：

save_path = saver.save(sess, model_path)

如果每隔一定的迭代步数就保存一次模型，就把迭代步数作为参数传进去：

save_path = saver.save(sess, model_path, global_step=step,write_meta_graph=False)

在模型保存之后，调用该模型可以完成新数据的分类预测，模型在保存后会生成4个文件，TensorFlow模型如图所示。

2.2 图像获取以及预处理

1．从图像文件中读取并处理

2．从摄像头获取图像

在mnist_predict目录下新建文件，命名为camera.py，使用摄像头拍摄图像，处理为二值化图并显示，在PyCharm中编写以下代码。

import cv2
def start():
    cap = cv2.VideoCapture(0)     # 使用摄像头
    while (True):
        ret, frame = cap.read() # 读取一帧的图像
        img_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) # 灰度化
        ret, img_threshold = cv2.threshold(img_gray, 127, 255, cv2.THRESH_BINARY_INV)
        cv2.imshow('img_threshold', img_threshold)
        key = cv2.waitKey(30) & 0xff
        if key == 27:  sys.exit(0)
        cap.release() # 释放摄像头
        cv2.destroyAllWindows()
        # start()函数可调用摄像头，捕捉并显示视频帧。
if __name__ == '__main__':   start()

2.3 图像识别

1、从图像文件中读取并识别

在mnist_predict目录下新建文件，命名为predict_pic.py，识别图像。

import os
import cv2
import numpy as np
import tensorflow as tf
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
# 将输入的彩色图像转换为二值化图
def color_input(endimg):
      img_gray = cv2.cvtColor(endimg, cv2.COLOR_BGR2GRAY)   # 灰度化
      ret, img_threshold = cv2.threshold(img_gray, 127, 255, cv2.THRESH_BINARY_INV)
      return img_threshold
# 读取图像并显示
def read_pic(path):
      img = cv2.imread(path, cv2.IMREAD_COLOR)
      cv2.imshow('img', img)
      cv2.waitKey(0)
    img_threshold = color_input(img)
      cv2.imshow('img_threshold', img_threshold)
      cv2.waitKey(0)
      return img_threshold
if __name__ == '__main__':
      with tf.Session() as sess:
            saver = tf.train.import_meta_graph('model_data/model.meta')
            saver.restore(sess, 'model_data/model') # 模型恢复
            graph = tf.get_default_graph()
            input_x = sess.graph.get_tensor_by_name("Mul:0") 
            # 获取变量
            y_conv2 = sess.graph.get_tensor_by_name("final_result:0")
            # 读取图像
            img_threshold = read_pic("nine.png")
            # 将图像进行缩放
            im = cv2.resize(img_threshold, (28, 28), interpolation=cv2.INTER_CUBIC)
            x_img = np.reshape(im, [-1, 784])
            # 识别
            output = sess.run(y_conv2, feed_dict={input_x: x_img})
            result = np.argmax(output)
            print("识别结果为：{}".format(result))

2、从摄像头实时识别

在目录下新建文件，命名为predict_camera.py，完成识别。

首先导入需要的类，包括OpenCV、NumPy和TensorFlow。

import os
import cv2
import sys
import time
import numpy as np
import tensorflow as tf
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

接下来需要封装一个函数，作用是将输入的RGB图像转换为二值化图像，并将转换后的二值化图像返回。

# 将输入的彩色图像转换为二值化图
def color_input(endimg):
      img_gray = cv2.cvtColor(endimg, cv2.COLOR_BGR2GRAY)   
      ret, img_threshold = cv2.threshold(img_gray, 127, 255, 
                           cv2.THRESH_BINARY_INV)
      return img_threshold

然后恢复模型。实例化一个saver，并使用saver.restore()函数恢复模型，将得到的变量返回。

# 恢复模型并实例化saver
def restore_model():
      sess = tf.Session()
      saver = tf.train.import_meta_graph('model_data/model.meta')
      # 使用saver.restore()函数模型恢复
      saver.restore(sess, 'model_data/model')
      # 获取变量
      input_x = sess.graph.get_tensor_by_name("Mul:0")
      y_conv2 = sess.graph.get_tensor_by_name("final_result:0")
      return sess, input_x, y_conv2

接下来构建预测函数，将变量和二值化图像传入，将图像进行缩放，调用sess.run()函数实现预测，并将结果返回。

# 图像预测
def mnist_predict(sess, input_x, y_conv2, img_thre):
      # 将图像进行缩放
      im = cv2.resize(img_thre, (28, 28), interpolation=cv2.INTER_CUBIC)
      x_img = np.reshape(im, [-1, 784])
      # 识别
      output = sess.run(y_conv2, feed_dict={input_x: x_img})
      result = np.argmax(output)
      return result

最后在主函数中调用摄像头，调用模型恢复函数，使用cv2.putText()函数在显示的界面上显示识别结果、帧Q数等提示。

if __name__ == '__main__':
      # 使用默认字体
      font = cv2.FONT_HERSHEY_SIMPLEX
      # 使用摄像头
      cap = cv2.VideoCapture(0)
      # 初始化用于计算fps的变量
      fps = "FPS: ??"
      start_time = time.time()
      counter = 0
      # 调用模型恢复函数
      sess, input_x, y_conv2 = restore_model()
      # 循环显示识别结果图像
      while (True):
           # 读取一帧的图像
           ret, frame = cap.read()
           cv2.rectangle(frame, (180, 100), (460, 380), (0, 255, 0), 2)
           frame = cv2.putText(frame, 'Please', (0, 40), font, 1.2, (0, 255, 255), 2)
           frame = cv2.putText(frame, 'Put the number in the box:', (0, 80), font, 1.2, (0, 255, 255), 2)
           endimg = frame[100: 380, 180: 460]
endimg_threshold = color_input(endimg)
result = mnist_predict(sess, input_x, y_conv2, endimg_threshold)
   counter += 1
   if(time.time() - start_time) > 1:
      print("FPS: ", counter / (time.time() - start_time))
      counter = 0
      start_time = time.time()
      cv2.putText(frame, "%d" % result, (460, 380), font, 3, (0, 0, 255), 2)
      cv2.putText(frame, fps, (50, 120), font, 0.8, (0, 0, 255), 2)
      cv2.imshow('Number Recognition', frame)
      key = cv2.waitKey(30) & 0xff
      if key == 27:
         sys.exit(0)
sess.close()
# 释放摄像头
cap.release()
cv2.destroyAllWindows()

2.4 结果显示

使用代码画一个方形框，可以使结果更加直观。将摄像头对准手写的数字，将数字放在框里，调用识别函数完成识别，并将帧数FPS显示在方形框左上角，将结果显示在方形框右下角，识别结果如图。

三、人脸识别

3.1 项目介绍

本项目使用读取图像以及调用摄像头两种方式完成图像中人脸检测、人脸关键点检测、人脸对比、人脸搜索与人脸识别。

本项目基于face_recognition项目开发。face_recognition项目的人脸识别是基于C++开源库dlib中的深度学习模型实现的，用LFW（Labeled Faces in the Wild Home）人脸数据集进行测试时，准确率可达到99.38%。

在项目开始之前需要安装face_recognition第三方库：Python版本需要在Python 3.3及以上。

在Mac、Linux或者Windows上安装时，首先需要安装dlib，然后在交互界面输入“pip3 install face_recognition”命令安装项目源码，或者直接在GitHub网站下载项目源码。

3.2 人脸数据集介绍

人脸数据集很多，这里介绍LFW（Labeled Faces in the Wild Home）数据集。

由美国马萨诸塞州立大学阿默斯特分校计算机视觉实验室整理完成的，主要用来研究非受限情况下的人脸识别问题。

在LFW数据集中，由于多姿态、光照、表情、年龄、遮挡等因素的影响，即使是同一人的照片差别也很大。

该数据集包含5746个人的13233张照片，其中1680个人拥有多张照片，每张照片的大小为250像素×250像素。

下载LFW数据集，解压后的目录如图所示：

LFW数据集文件夹按名字命名，人脸照片在每个名字的文件夹下，人脸照片命名方式为“名字_xx.jpg”，如Aaron_Eckhart_0001.jpg。

3.3 人脸识别流程

1、人脸图像采集及检测

人脸识别首先需要采集人脸图像，可以通过读取图像或者通过摄像头直接采集来完成。

2、人脸图像预处理

预处理可以尽量避免环境条件限制和随机干扰，提高特征提取准确率。

3、人脸图像特征提取

人脸图像特征提取也称为人脸表征，它是对人脸进行特征建模的过程，可以将图像信息数字化，根据人脸器官的形状描述以及它们之间的距离特性来获得有助于人脸分类的特征数据。

4、匹配与识别

匹配是将所提取的人脸图像的特征数据与数据库中存储的特征模板进行搜索匹配

3.4 人脸识别方案

1、怎么找到人脸

人脸识别的第一步是找到人脸，即人脸检测，在一张照片或一个视频帧中，首先要知道是否存在人脸以及人脸的位置。

找到人脸之后需要提取整体图像特征，提取特征的方法有方向梯度直方图（Histogram of Oriented Gradient，HOG）、局部二值（Local Binary Pattern，LBP）以及Haar-like。

2、简单的面部识别分类

根据之前的步骤将脸部从图像中分离，如果直接将两张照片进行对比，当两者中人脸的角度、位置不同时，接下来的网络或者算法在做分类时准确率降低，所以通常需要先对脸部图像进行预处理。

预处理方法是瓦希德•卡泽米（VahidKazemi）和约瑟芬•沙利文（Josephine Sullivan）提出的面部特征点估计，该方法的主要思路是找到面部中普遍存在的68个特征点，包括下巴、每只眼睛的外部轮廓、每条眉毛的内部轮廓等，然后基于这些特征点的位置对图像进行仿射变换等操作，让人脸尽量居中。

脸部居中之后可以进行识别，最简单的方法是将要识别的人脸与数据库被标注的人脸进行比较，看是否相似。

最后一步就是人脸识别，有了前面的铺垫，这一步就很简单了。得到需要识别的人脸并将其编码之后，使用分类算法就可以完成识别，如KNN。需要注意的是，这里的KNN并不是对比两张照片的像素距离，而是对比编码后的128个参数值的距离。

3.5 人脸识别应用

1、人脸检测

在face_predict下新建face-find.py文件，读取照片并将检测后的人脸标注出来。

import cv2
import face_recognition
# 加载被比较的图像
frame = face_recognition.load_image_file("Face_database/hyz/hyz.png")
# 使用CPU获得人脸边界框的数列
face_locations = face_recognition.face_locations(frame)
# 使用CNN并利用GPU/CUDA加速获得人脸边界框的数列
# 相对更准确
# face_locations = face_recognition.face_locations(frame, number_of_times_to_
upsample=0, model="cnn")
print("该张图像中有 {} 张人脸。".format(len(face_locations)))
# 圈出人脸边界框
for (top, right, bottom, left) in face_locations:
      cv2.rectangle(frame, (left, top), (right, bottom), (0, 255, 0), 2)
# 显示得到人脸后的图像
frame = frame[:, :, ::-1]
cv2.imshow("image", frame)
cv2.waitKey(0)

2、人脸关键点检测

在face_predict下新建face-feature.py文件，读取照片并标记特征点。

# 加载被比较的图像
frame = face_recognition.load_image_file("Face_database/hyz/hyz.png")
# 查找图像中的所有面部特征
face_landmarks_list = face_recognition.face_landmarks(frame, face_locations = None, model ='large')
# 查找图像中的鼻子、左眼、右眼面部特征
# face_landmarks_list = face_recognition.face_landmarks(frame, face_locations=None, model='small')
print("该张图像中有 {} 张人脸。".format(len(face_landmarks_list)))
for face_landmarks in face_landmarks_list:
      # 打印此图像中每个面部特征的位置
      # 查找图像中所有面部特征的列表
      facial_features = [
            'chin', 'left_eyebrow', 'right_eyebrow', 'nose_bridge',
            'nose_tip',
            'left_eye',
            'right_eye',
            'top_lip',
            'bottom_lip'
      ]

3、人脸对比

在face_predict下新建face-compare.py，完成人脸对比。

# 人脸比较：将两张人脸图像进行对比
# 将两者之间的相似值进行打印
# 阈值为0.6，阈值越小，条件越苛刻
import cv2
import face_recognition
# 加载被比较的图像
source_image = face_recognition.load_image_file("Face_database/hyz/hyz.png")
# 加载测试图像
compare_image = face_recognition.load_image_file("Face_database/hyz/hyz_near.png")
# 获取人脸位置并做单人脸容错处理
source_locations = face_recognition.face_locations(source_image)
if len(source_locations) != 1:
       print("注意：图像一只能有一张人脸哦！")
       exit(0)
# 获取人脸位置并做单人脸容错处理
compare_locations = face_recognition.face_locations(compare_image)
if len(compare_locations) != 1:
       print("注意：图像二只能有一张人脸哦！")
       exit(0)
    # 绘制图像一的人脸
for (top, right, bottom, left) in source_locations:
       print(top, right, bottom, left)
       cv2.rectangle(source_image, (left, top), (right, bottom), (0, 255, 0), 2)
# 绘制图像二的人脸
for (top, right, bottom, left) in compare_locations:
       print(top, right, bottom, left)
       cv2.rectangle(compare_image, (left, top), (right, bottom), (0, 255, 0), 2)
# 获取图像一的面部编码
source_face_encoding = face_recognition.face_encodings(source_image)[0]
source_encodings = [      source_face_encoding, ]
# 获取图像二的面部编码
compare_face_encoding =         face_recognition.face_encodings(compare_image)[0]
# 显示两张得到人脸后的图像
source_image = source_image[:, :, ::-1]
cv2.imshow("image", source_image)
cv2.waitKey(0)
compare_image = compare_image[:, :, ::-1]
cv2.imshow("image", compare_image)
cv2.waitKey(0)
# 查看面部一与面部二的比较结果，阈值为0.6，阈值越小越苛刻
face_distances = face_recognition.compare_faces(source_encodings, compare_face_
encoding, 0.6)
# 输出结果
print("正常阈值为0.6时，测试图像是否与已知图像{}匹配!".format("是" if face_distances else "不是"))

4、人脸搜索

在face_predict下新建face-seek.py，完成人脸搜索。

# 查找人脸：查找图像中的人脸并标记出来
import os
import face_recognition
file_name = []
known_faces = []
# 加载文件中的人脸库图像
image_dir = "Face_database/hyz/"
for parent, dirnames, filenames in os.walk(image_dir):
      for filename in filenames:
             # print(filename)
             # 加载图像
             frame = face_recognition.load_image_file(image_dir + filename)
             face_bounding_boxes = face_recognition.face_locations(frame)
             if len(face_bounding_boxes) != 1:
                     # 如果训练图像中没有人（或人太多），请跳过图像
                     print("{} 这张图像不适合训练: {}。".format(image_dir + filename, "因为它上面没找到人脸" if len(face_bounding_boxes) < 1 else "因为它不止一张人脸"))
             else:
             # encoding
                   frame_face_encoding = face_recognition.face_encodings(frame)[0]
                   # 加到列表里
                   known_faces.append(frame_face_encoding)
                   file_name.append(filename)
# 加载未知图像
frame = face_recognition.load_image_file("unknown/unknown1.png")
# encoding
frame_face_encoding = face_recognition.face_encodings(frame)[0]
# 比较获得结果
results = face_recognition.compare_faces(known_faces, frame_face_encoding)
print(results)

5、人脸识别

在face_predict下新建face-knn-train.py，使用KNN实现人脸库的训练。

# 训练K近邻分类器
import math
from sklearn import neighbors
import os
import os.path
import pickle
import face_recognition
from face_recognition.face_recognition_cli import image_files_in_folder
def train(train_dir, model_save_path=None, n_neighbors=None, knn_algo='ball_tree', verbose=False):
       """
       训练K近邻分类器进行人脸识别
       param train_dir：包含每个已知人员的子目录及人员名称的目录
       param model_save_path：（可选）将模型保存在磁盘上的路径
       param n_neighbors：（可选）在分类中称重的邻居数。如果未指定，则自动选择
       param knn_algo：（可选）支持knn.default的底层数据结构是ball_tree
       param verbose：训练时是否根据图像数量取n_neighbors的值
       return：返回在给定数据上训练的KNN分类器
       """
       X = []
       y = []
    # 循环遍历训练集中的每个人
for class_dir in os.listdir(train_dir):
      # 如果train_dir/class_dir不是一个目录，就继续
      if not os.path.isdir(os.path.join(train_dir, class_dir)):
         continue
         # 循环浏览当前人员的每个训练图像
         for img_path in image_files_in_folder(os.path.join(train_dir, class_dir)):
                   image = face_recognition.load_image_file(img_path)
                   face_bounding_boxes = face_recognition.face_locations(image)
                   if len(face_bounding_boxes) != 1:
                           # 如果训练图像中没有人（或人太多），请跳过图像
                           if verbose:
                                 print("{} 这张图像不适合训练: {}。".format(img_path, "因为它上面没找到人脸" if len(face_bounding_boxes) < 1 else "因为它不止一张人脸"))
                   else:
                         # 将当前图像的面部编码添加到训练集
                         X.append(face_recognition.face_encodings(image, 
                                  known_face_locations=face_bounding_boxes)[0])
                         y.append(class_dir)
       # 确定KNN分类器中用于加权的近邻
       if n_neighbors is None:
             n_neighbors = int(round(math.sqrt(len(X)))) 
           # 面部编码长度开平方后四舍五入取整数
             if verbose:
                   print("自动选择n_neighbors:", n_neighbors)
       # 创建并训练KNN分类器
       knn_clf = neighbors.KNeighborsClassifier(n_neighbors=n_neighbors, 
                                                algorithm=knn_alo, weights='distance')
       knn_clf.fit(X, y)
# 保存训练后的KNN分类器
        if model_save_path is not None:
              with open(model_save_path, 'wb') as f:
                    pickle.dump(knn_clf, f)
        return knn_clf
if __name__ == "__main__":
      # 训练的KNN分类，并将其保存到磁盘
      print("训练KNN分类器...")
      classifier = train("Face_database", model_save_path="trained_knn_model.clf", n_neighbors=1)
      print("训练完成！")

在face_predict下新建face-knn-predict.py文件，实现人脸识别。

# 摄像头测试K近邻分类器
import os
import cv2
import os.path
import pickle
import face_recognition
ALLOWED_EXTENSIONS = {'png', 'jpg', 'jpeg'}
def predict(X_img, knn_clf=None, model_path=None, distance_threshold=0.6):
      """
      使用训练后的KNN分类器识别给定图像中的面部
      param X_img：要识别的图像
      param knn_clf：（可选）一个KNN分类器对象。如果未指定，则必须指定model_save_path
      param model_path：（可选）pickle KNN分类器的路径。如果未指定，则model_save_path必须为knn_clf
      param distance_threshold：（可选）面部分类的距离阈值。它越大，机会就越大，就会将一个不知名的人误分类为已知人员
      图像中已识别面部的名称和面部位置列表：[（名称，边界框），...]。对于未被识别人员的面孔，将返回“未知”的名称
      """
      if knn_clf is None and model_path is None:  raise Exception("必须提供KNN分类器knn_clf或model_path")
      # 加载训练后的KNN模型（如果传入了一个）
      if knn_clf is None:
            with open(model_path, 'rb') as f:
                  knn_clf = pickle.load(f)
      # 加载图像并查找面部位置
      X_face_locations = face_recognition.face_locations(X_img)
      # X_face_locations = face_recognition.face_locations(X_img, number_of_
times_to_upsample=0, model="cnn")
      # 如果图像中未找到面，则返回空结果
      if len(X_face_locations) == 0:
              print("没有检测到人脸！")
              return []
      # 在测试image中查找面部的编码
      faces_encodings = face_recognition.face_encodings(X_img, known_face_locations=X_face_locations)
      # 使用KNN模型找到测试的最佳匹配
      # 找到一个点的K近邻，返回每个点的邻居的索引和距离
      closest_distances = knn_clf.kneighbors(faces_encodings, n_neighbors=1)
      # print(closest_distances)
      are_matches = [closest_distances[0][i][0] <= distance_threshold for i in range(len(X_face_locations))]
      # print(are_matches)
      # 预测类并删除不在阈值范围内的分类
      # predict：返回分类的标签
      return [(pred, loc) if rec else ("unknown", loc) for pred, loc, rec in zip(knn_clf.predict(faces_encodings), X_face_locations, are_matches)]
      if __name__ == "__main__":
         video_capture = cv2.VideoCapture(0)
          while True:
              ret, frame = video_capture.read()
              small_frame = cv2.resize(frame, (0, 0), fx=0.25, fy=0.25)
              predictions = predict(small_frame, model_path="trained_knn_model.clf")
              for name, (top, right, bottom, left) in predictions:
                 top *= 4
                 right *= 4
                 bottom *= 4
                 left *= 4
                cv2.rectangle(frame, (left, top), (right, bottom), (0, 255, 0), 2)
                cv2.putText(frame, name, (left + 6, bottom - 6), cv2.FONT_HERSHEY_DUPLEX, 1.0, (255, 255, 255), 1)
                cv2.imshow('Video', frame)
          if cv2.waitKey(1) & 0xFF == ord('q'):
                break
      video_capture.release()
      cv2.destroyAllWindows()

更多精彩内容请持续关注本站！