python numba cuda,CUDA函数不会在带有Numba的Python上执行For循环

I'm trying to run a simple update loop of a simulation on the GPU. Basically there are a bunch of "creatures" represented by circles that in each update loop will move and then there will be a check of whether any of them intersect.

import numpy as np

import math

from numba import cuda

@cuda.jit('void(float32[:], float32[:], float32[:], uint8[:], float32[:], float32[:], float32, uint32, uint32)')

def update(p_x, p_y, radii, types, velocities, max_velocities, acceleration, num_creatures, cycles):

for c in range(cycles):

for i in range(num_creatures):

velocities[i] = velocities[i] + acceleration

if velocities[i] > max_velocities[i]:

velocities[i] = max_velocities[i]

p_x[i] = p_x[i] + (math.cos(1.0) * velocities[i])

p_y[i] = p_y[i] + (math.sin(1.0) * velocities[i])

for i in range(num_creatures):

for j in range(i, num_creatures):

delta_x = p_x[j] - p_x[i]

delta_y = p_y[j] - p_y[i]

distance_squared = (delta_x * delta_x) + (delta_y * delta_y)

sum_of_radii = radii[types[i]] + radii[types[i]]

if distance_squared < sum_of_radii * sum_of_radii:

pass

acceleration = .1

creature_radius = 10

spacing = 20

food_radius = 3

max_num_creatures = 1500

num_creatures = 0

max_num_food = 500

num_food = 0

max_num_entities = max_num_creatures + max_num_food

num_entities = 0

cycles = 1

p_x = np.empty((max_num_entities, 1), dtype=np.float32)

p_y = np.empty((max_num_entities, 1), dtype=np.float32)

radii = np.array([creature_radius, creature_radius, food_radius], dtype=np.float32)

types = np.empty((max_num_entities, 1), dtype=np.uint8)

velocities = np.empty((max_num_creatures, 1), dtype=np.float32)

max_velocities = np.empty((max_num_creatures, 1), dtype=np.float32)

# types:

# male - 0

# female - 1

# food - 2

for x in range(1, 800 // spacing):

for y in range(1, 600 // spacing):

if num_creatures % 2 == 0:

types[num_creatures] = 0

else:

types[num_creatures] = 1

p_x[num_creatures] = x * spacing

p_y[num_creatures] = y * spacing

max_velocities[num_creatures] = 5

num_creatures += 1

device_p_x = cuda.to_device(p_x)

device_p_y = cuda.to_device(p_y)

device_radii = cuda.to_device(radii)

device_types = cuda.to_device(types)

device_velocities = cuda.to_device(velocities)

device_max_velocities = cuda.to_device(max_velocities)

update(device_p_x, device_p_y, device_radii, device_types, device_velocities, device_max_velocities,

acceleration, num_creatures, cycles)

print(device_p_x.copy_to_host()[0])

The 1.0 in math.cos and math.sin is just a placeholder for the directions of the individual creatures

I have a surrounding loop executed cycles amount of times. If I try to remove it and only leave the block of code moving the creatures neither p_x, p_y or velocities have changed, even if I add a constant to them. Why not?

解决方案

There are at least two problems:

You aren't initializing velocities:

velocities = np.empty((max_num_creatures, 1), dtype=np.float32)

we can fix that for a trivial test with:

velocities = np.ones((max_num_creatures, 1), dtype=np.float32)

This isn't the correct array shape:

p_x = np.empty((max_num_entities, 1), dtype=np.float32)

^^^^^^^^^^^^^^^^^^^^^

to match your kernel signature:

@cuda.jit('void(float32[:], float32[:], float32[:], uint8[:], float32[:], float32[:], float32, uint32, uint32)')

^^^^^^^^^^

we can fix that with:

p_x = np.empty(max_num_entities, dtype=np.float32)

and likewise for p_y, types, velocities, and max_velocities. (I imagine some change may possibly be in order also for radii, but it's not entirely clear what you intend with that, since it appears you want a multi-dimensional array, but are accessing it in-kernel as a single-dimensional array, AFAICT. Furthermore, that section of your kernel code is a do-nothing, so it is more or less irrelevant for the problem at hand).

When I make those changes, I get what appears to be rational output:

$ cat t9.py

import numpy as np

import math

from numba import cuda

@cuda.jit('void(float32[:], float32[:], float32[:], uint8[:], float32[:], float32[:], float32, uint32, uint32)')

def update(p_x, p_y, radii, types, velocities, max_velocities, acceleration, num_creatures, cycles):

for c in range(cycles):

for i in range(num_creatures):

velocities[i] = velocities[i] + acceleration

if velocities[i] > max_velocities[i]:

velocities[i] = max_velocities[i]

p_x[i] = p_x[i] + (math.cos(1.0) * velocities[i])

p_y[i] = p_y[i] + (math.sin(1.0) * velocities[i])

for i in range(num_creatures):

for j in range(i, num_creatures):

delta_x = p_x[j] - p_x[i]

delta_y = p_y[j] - p_y[i]

distance_squared = (delta_x * delta_x) + (delta_y * delta_y)

sum_of_radii = radii[types[i]] + radii[types[i]]

if distance_squared < sum_of_radii * sum_of_radii:

pass

acceleration = .1

creature_radius = 10

spacing = 20

food_radius = 3

max_num_creatures = 1500

num_creatures = 0

max_num_food = 500

num_food = 0

max_num_entities = max_num_creatures + max_num_food

num_entities = 0

cycles = 1

p_x = np.empty(max_num_entities, dtype=np.float32)

p_y = np.empty(max_num_entities, dtype=np.float32)

radii = np.array([creature_radius, creature_radius, food_radius], dtype=np.float32)

types = np.empty(max_num_entities, dtype=np.uint8)

velocities = np.ones(max_num_creatures, dtype=np.float32)

max_velocities = np.empty(max_num_creatures, dtype=np.float32)

# types:

# male - 0

# female - 1

# food - 2

for x in range(1, 800 // spacing):

for y in range(1, 600 // spacing):

if num_creatures % 2 == 0:

types[num_creatures] = 0

else:

types[num_creatures] = 1

p_x[num_creatures] = x * spacing

p_y[num_creatures] = y * spacing

max_velocities[num_creatures] = 5

num_creatures += 1

device_p_x = cuda.to_device(p_x)

device_p_y = cuda.to_device(p_y)

device_radii = cuda.to_device(radii)

device_types = cuda.to_device(types)

device_velocities = cuda.to_device(velocities)

device_max_velocities = cuda.to_device(max_velocities)

update(device_p_x, device_p_y, device_radii, device_types, device_velocities, device_max_velocities,

acceleration, num_creatures, cycles)

print(device_p_x.copy_to_host())

$ python t9.py

[ 2.05943317e+01 2.05943317e+01 2.05943317e+01 ..., 3.64769361e-11

1.52645868e-19 1.80563260e+28]

$

Also note that currently you are only launching one block of one thread, but I assume that is not pertinent to your request, currently.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值