目标
从基于CPU的Cython代码 -> 基于CPU+GPU的Cython+PyCuda代码
原因
源程序用了Python的PyFITS、Astropy等库,打算简单粗暴的把CPU并行部分改成GPU并行,所以加入PyCuda。(代码应该被搞得更复杂了,若大家有更简单的方法,请留言)
准备工作
1. 粗略看完Hetland,M.L.的《Python基础教程》的前10章。(花了一天)
2. 搭建环境:(网上找教程,很多)
- Note:
- 安装NVIDIA CUDA ToolKit 要注意显卡的版本,不支持NVIDIA GeForce GTX 300及之前的版本。
- 若是Windows用户,安装Visual Studio 也要注意版本,NVIDIA CUDA ToolKit 支持VS15及之前的版本。(建议使用Linux)
Cython+PyCuda的测试
Cython
1. 初始文件目录
(初始文件是我编辑的,其他编译后新增文件后面会列出)
/* --testCuda/ | --setup.py --test/ | --constants.h --constants.pxd --test.pyx --__init__.py */
Note: pyx文件、 pxd文件
2.各文件内容
- constants.h
//constants.h #ifndef CONSTANTS_H #define CONSTANTS_H #define PI (3.1415926535897932384626433832) #define TWOPI (PI * 2.0) #endif
- constants.pxd
#!python cdef extern from "./constants.h": long double PI long double TWOPI
- test.pyx
#!python # import python3 compat modules from __future__ import absolute_import from __future__ import division from __future__ import print_function from __future__ import unicode_literals # import std lib import sys import traceback # import cython specifics cimport cython from cython.parallel import prange from cython.operator cimport dereference as deref, preincrement as inc from cpython cimport bool as python_bool cimport openmp # import C/C++ modules from libc.math cimport exp, cos, sin, sqrt, asin, acos, atan2, fabs, fmod from libcpp.vector cimport vector from libcpp.pair cimport pair from libcpp.set cimport set as cpp_set from libcpp cimport bool from libcpp.unordered_map cimport unordered_map # import numpy/data types import numpy as np from numpy cimport ( int8_t, int16_t, int32_t, int64_t, uint8_t, uint16_t, uint32_t, uint64_t, float32_t, float64_t ) cimport numpy as np from .constants cimport PI, TWOPI print ("test: Cython") print (TWOPI)
- __init__.py
from .test import *
- setup.py
#!/usr/bin/env python from setuptools import setup from setuptools.extension import Extension from Cython.Distutils import build_ext import numpy import platform import os EX_COMP_ARGS = [] TEST_EXT = Extension( //只是照搬一下,求解每句话的意义? 'test.test', ['test/test.pyx'], extra_compile_args=['-fopenmp', '-O3', '-std=c++11'] + EX_COMP_ARGS, extra_link_args=['-fopenmp'], language='c++', include_dirs=[ numpy.get_include(), ] ) setup( name='test_Cython', packages=['test'], cmdclass={'build_ext': build_ext}, ext_modules=[ TEST_EXT, ] )
3.编译
$ python setup.py build_ext --inplace
4. 运行
打开python
$ python
运行
>>> import test test: Cython 6.28318530718 >>>
5. 此时文件目录
/* --testCuda/ | --setup.py --test/ | --constants.h --constants.pxd --test.pyx --test.cpp --test.so --__init__.py --__init__.pyc --build/ | --temp.linux-x86_64-2.7 */
PyCuda
把PyCuda样例加入test.pyx
#!python # import python3 compat modules from __future__ import absolute_import from __future__ import division from __future__ import print_function from __future__ import unicode_literals # import std lib import sys import traceback # import cython specifics cimport cython from cython.parallel import prange from cython.operator cimport dereference as deref, preincrement as inc from cpython cimport bool as python_bool cimport openmp # import C/C++ modules from libc.math cimport exp, cos, sin, sqrt, asin, acos, atan2, fabs, fmod from libcpp.vector cimport vector from libcpp.pair cimport pair from libcpp.set cimport set as cpp_set from libcpp cimport bool from libcpp.unordered_map cimport unordered_map # import numpy/data types import numpy as np from numpy cimport ( int8_t, int16_t, int32_t, int64_t, uint8_t, uint16_t, uint32_t, uint64_t, float32_t, float64_t ) cimport numpy as np from .constants cimport PI, TWOPI print ("test: Cython") print (TWOPI) import pycuda.driver as cuda import pycuda.autoinit from pycuda.compiler import SourceModule a = np.random.randn(4,4) a = a.astype(numpy.float32) a_gpu = cuda.mem_alloc(a.size * a.dtype.itemsize) cuda.memcpy_htod(a_gpu, a) mod = SourceModule(""" __global__ void doublify(float *a) { int idx = threadIdx.x + threadIdx.y*4; a[idx] *= 2; } """) func = mod.get_function(str("doublify")) func(a_gpu, block=(4,4,1)) a_doubled = np.empty_like(a) cuda.memcpy_dtoh(a_doubled, a_gpu) print ("original array:") print (a) print ("doubled with kernel:") print (a_doubled)
编译->运行,结果如下:
>>> import test test: Cython 6.28318530718 original array: [...省略 ] doubled with kernel: [...省略 ] >>>
总结:
PyCuda是可以和Cython结合的!希望路过的大牛能甩几个言简意赅的帖子,让我深入理解一下。