用Numba加速OpenCV Python视频处理代码,提升6.5倍性能

__弯弓__

已于 2023-03-20 10:28:07 修改

阅读量3.5k

点赞数 6

分类专栏：高性能Python编程文章标签： opencv python 音视频

于 2023-02-09 11:58:06 首次发布

本文链接：https://blog.csdn.net/captain5339/article/details/128950515

版权

高性能Python编程专栏收录该内容

15 篇文章 10 订阅

订阅专栏

在这里插入图片描述

使用Numba对OpenCV Python视频处理代码加速。性能提升6.5倍

1、目标问题：

在 OpenCV Python 中视频处理是比较耗资源的，从而造成画面卡顿，如果跳帧处理可能造成丢失关键数据。用 Numba对 OpenCV代码加速是1个较好的改进方法。
Numba是1个Python编译器，主要功能是对数组类型(Array, Numpy, bytes等）、数值类型的函数进行加速，支持GPU计算，以及能避开GIP限制。使用只须加入简单的导入与函数装饰器代码即可，非常方便。

实际效果如何呢？本文将通过实例代码来比较，对于 OpenCV显示与处理视频流的代码，未优化前，与 Numba 优化后的速度来进行对比分析。

2、Numba安装与使用方式简介

2.1 安装Numba库

pip install numba

2.2 Numba 的使用方式

2.2.1 基本使用方式

Numba的使用，间通过在函数代码前加个装饰器，如下

# 导入相当包
from numba import jit
import numpy as np
 
@jit    # jit装饰器
def sum(a, b):
    return a + b

2.2.2 No GIL模式

如果函数内运算量比较大，而调用者希望尽可能短时间处理，可以采用多CPU来运算, 装饰器内添加参数 nogil=True

@jit(nogil=True)
def sum(a, b):
    return a + b

2.2.3 No python 模式

即用git装饰的函数在运行时，不需要解释器介入，直接以机器码的方式运行，其实就是按C的方式运行，这种方式最快。装饰器传入参数 nopython=True。示例

@jit(nopython=True)
def sum(a, b):
    return a + b

3、项目要求

OpenCV中的视频帧都是由NumPy数组表示的图像。在此示例中，使用网络摄像头捕获视频流，并对视频流上实时进行计算和修改，这样对每帧的处理时间提出了很高的要求。
为了保持流畅的视频，需要在 1/25 秒内显示每一帧。这样，每一帧最多需要 0.04 秒，从捕获、处理和使用视频流更新窗口。
虽然捕获和更新窗口需要时间，但它留下了很大的不确定性，帧处理（计算和修改）的速度应该有多快，但上限是每帧 0.04 秒。

4、对每帧进行计算和修改

为了测试。增加1个对图像处理方法，功能如下。

计算。我们将每帧划分为6×16像素的小区域，并计算每个区域的平均颜色。为了获得平均颜色，我们计算每个通道的平均值（BGR）。
修改。对于每个区域，我们将更改每个区域的颜色，并完全用平均颜色填充它。
这可以通过添加此功能来处理每一帧来完成。

def process(frame, box_height=6, box_width=16):
    height, width, _ = frame.shape
    for i in range(0, height, box_height):
        for j in range(0, width, box_width):
            roi = frame[i:i + box_height, j:j + box_width]
            b_mean = np.mean(roi[:, :, 0])
            g_mean = np.mean(roi[:, :, 1])
            r_mean = np.mean(roi[:, :, 2])
            roi[:, :, 0] = b_mean
            roi[:, :, 1] = g_mean
            roi[:, :, 2] = r_mean
    return frame

画面将划分为矩形区域（box_height x box_width）。对于每个框（roi：感兴趣区域）3个颜色通道（b_mean，g_mean，r_mean）中每个的平均值，并将该区域覆盖为颜色平均值

5、测试处理函数的性能

为了估计函数过程中花费的时间，使用了cProfile 库。它提供了每个函数调用所花费时间的分析。

import cv2
import numpy as np
import cProfile

def process(frame, box_height=6, box_width=16):
    height, width, _ = frame.shape
    for i in range(0, height, box_height):
        for j in range(0, width, box_width):
            roi = frame[i:i + box_height, j:j + box_width]
            b_mean = np.mean(roi[:, :, 0])
            g_mean = np.mean(roi[:, :, 1])
            r_mean = np.mean(roi[:, :, 2])
            roi[:, :, 0] = b_mean
            roi[:, :, 1] = g_mean
            roi[:, :, 2] = r_mean
    return frame

def main(iterations=300):
    # Get the webcam (default webcam is 0)
    cap = cv2.VideoCapture(0)
    # If your webcam does not support 640 x 480, this will find another resolution
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)
    for _ in range(iterations):
        # Read the a frame from webcam
        _, frame = cap.read()
        # Flip the frame
        frame = cv2.flip(frame, 1)
        frame = cv2.resize(frame, (640, 480))
        frame = process(frame)
        # Show the frame in a window
        cv2.imshow('WebCam', frame)
        # Check if q has been pressed to quit
        if cv2.waitKey(1) == ord('q'):
            break
    # When everything done, release the capture
    cap.release()
    cv2.destroyAllWindows()
cProfile.run("main()")

输出

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      300    7.716    0.026   50.184    0.167 test_numba.py:8(process)

从输出中可以看出，process函数中每次调用使用 0.026 秒，而主循环中其他函数的开销累积到 0.014 秒。

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      300    5.132    0.017    5.132    0.017 {method 'read' of 'cv2.VideoCapture' objects}
      300    0.073    0.000    0.073    0.000 {resize}
      300    2.848    0.009    2.848    0.009 {waitKey}
      300    0.120    0.000    0.120    0.000 {flip}
      300    0.724    0.002    0.724    0.002 {imshow}

另外，每次迭代中从读取、调整大小、翻转、显示和 waitKey 调用中产生大约 0.028 秒（0.017 + 0.009 + 0.002）的开销。
每帧处理时间，加起来总共为每帧 0.054 秒，或者只能达到每秒 18.5 帧（FPS）的帧速率，这太慢了，无法达到每秒24帧的平滑播放。

当然，cProfile 会增加一些开销来测量时间，暂时忽略。

6、引入 Numba 以优化性能

Numba 库旨优势在于编译代码，使 NumPy 循环更快。而 opencv-python图像正是以numpy数组与运算为基础，所以非常适合用Numba来加速。下面是添加了number语句的代码。

import cv2
import numpy as np
from numba import jit
import cProfile

@jit(nopython=True)
def process(frame, box_height=6, box_width=16):
    height, width, _ = frame.shape
    for i in range(0, height, box_height):
        for j in range(0, width, box_width):
            roi = frame[i:i + box_height, j:j + box_width]
            b_mean = np.mean(roi[:, :, 0])
            g_mean = np.mean(roi[:, :, 1])
            r_mean = np.mean(roi[:, :, 2])
            roi[:, :, 0] = b_mean
            roi[:, :, 1] = g_mean
            roi[:, :, 2] = r_mean
    return frame

def main(iterations=300):
    # Get the webcam (default webcam is 0)
    cap = cv2.VideoCapture(0)
    # If your webcam does not support 640 x 480, this will find another resolution
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)
    for _ in range(iterations):
        # Read the a frame from webcam
        _, frame = cap.read()
        # Flip the frame
        frame = cv2.flip(frame, 1)
        frame = cv2.resize(frame, (640, 480))
        frame = process(frame)
        # Show the frame in a window
        cv2.imshow('WebCam', frame)
        # Check if q has been pressed to quit
        if cv2.waitKey(1) == ord('q'):
            break
    # When everything done, release the capture
    cap.release()
    cv2.destroyAllWindows()
main(iterations=1)
cProfile.run("main(iterations=300)")

输出。


   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      300    1.187    0.004    1.187    0.004 test_numba.py:7(pixels)

每次调用需要 0.004 秒。这导致每次迭代的总时间为 0.032 秒（0.028 + 0.004）。这足以保持每秒 24 帧（FPS）以上的性能。

此外，这将性能提高了 6.5 倍（7.717 / 1.187）。

7、结论

从网络摄像头捕获实时流并处理及显示，使用 Numba 来加速后。处理速度提升约为 6.5 倍。
上述测试基于集成显卡电脑。如果电脑配置有支持CUDA的显卡，速度提升更加明显，请自行测试。

后续将继续推出 cython对opencv-python代码优化后对性能提升测试, 敬请关注作者

__弯弓__

关注

6
点赞
踩
43

收藏

觉得还不错? 一键收藏
2
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录