简单的numba + CUDA 实测

最新推荐文章于 2024-11-30 12:15:22 发布

风海流

最新推荐文章于 2024-11-30 12:15:22 发布

阅读量1.2w

点赞数 7

CC 4.0 BY-SA版权

本文链接：https://blog.csdn.net/huyaoyu/article/details/89742577

本文记录了一次尝试使用numba和CUDA加速Python中4k图像像素级处理的过程。作者首先发现纯Python处理速度慢，耗时约520s。然后转向numba的CUDA功能，但发现CUDA对Python支持有限，特别是在kernel函数内部，大部分NumPy函数不被支持。经过一些尝试，最终成功运行测试代码，处理时间降至约0.4s，显著提高了效率。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

简单的numba + CUDA 实测

起因

一时兴起，是我太闲了吧。

最近需要对一个4k图像进行单个像素级别的处理，由于用python用得人有点懒，直接上python在所有像素上循环一遍。每个像素做的工作其实很简单，就是判断一下这个像素是否符合某一准则，如果符合就将这个像素mask上。噼里啪啦写好一个脚本，一运行居然等了很久，都没有结果。一开还以为是不是哪里写错了，进入了无限循环什么的，但是最后发现其实执行效率就是那么低。我做了一个实例，在一个3008x4112像素的图像上进行简单的分类。（实例中的两个分类功能都可以用cv2直接实现，这里仅作实例进行测试使用。）

from __future__ import print_function

import cv2
import math
import numpy as np
from numpy.linalg import norm
import time

H = 3008
W = 4112

class Validator(object):
    def __init__(self):
        pass

    def is_valid(self, x, y):
        return False

class RadiusValidator(Validator):
    def __init__(self, center, R, width):
        super(RadiusValidator, self).__init__()

        self.R = R
        self.center = center # A two element NumPy array. [x, y].
        self.width = width

        if ( self.width <= 0 ):
            raise Exception("self.width wrong. self.width = {}".format(self.width))
    
    # Overide parent's function.
    def is_valid(self, x, y):
        x = x - self.center[0]
        y = y - self.center[1]

        r = math.sqrt( x * x + y * y )

        if ( r >= self.R - self.width and r <= self.R + self.width ):
            return True
        else:
            return False

class PolarLineSegmentValidator(Validator):
    def __init__(self, center, theta, length, width):
        super(PolarLineSegmentValidator, self).__init__()

        self.center = center # A two element NumPy array. [x, y].
        self.theta  = theta
        self.length = length
        self.width  = width

        self.endP    = np.zeros((2,), dtype=np.float32)
        self.endP[0] = self.length * math.cos( self.theta )
        self.endP[1] = self.length * math.sin( self.theta