python中vstack_python – 在numba中使用numpy.vstack

所以我一直在尝试优化一些从一些数组数据计算统计误差度量的代码.该指标称为连续排名概率分数(CRPS).

我一直在使用Numba来尝试加速此计算中所需的双循环,但我遇到了numpy.vstack函数的问题.根据我从文档here的理解,应该支持vstack()函数,但是当我运行以下代码时,我得到一个错误.

def crps_hersbach_numba(obs, fcst_ens, remove_neg=False, remove_zero=False):

"""Calculate the the continuous ranked probability score (CRPS) as per equation 25-27 in

Hersbach et al. (2000)

Parameters

----------

obs: 1D ndarry

Array of observations for each start date

fcst_ens: 2D ndarray

Array of ensemble forecast of dimension n x M, where n = number of start dates and

M = number of ensemble members.

remove_neg: bool

If True, when a negative value is found at the i-th position in the observed OR ensemble

array, the i-th value of the observed AND ensemble array are removed before the

computation.

remove_zero: bool

If true, when a zero value is found at the i-th position in the observed OR ensemble

array, the i-th value of the observed AND ensemble array are removed before the

computation.

Returns

-------

dict

Dictionary contains a number of *experimental* outputs including:

- ["crps"] 1D ndarray of crps values per n start dates.

- ["crpsMean1"] arithmetic mean of crps values.

- ["crpsMean2"] mean crps using eqn. 28 in Hersbach (2000).

Notes

-----

**NaN and inf treatment:** If any value in obs or fcst_ens is NaN or inf, then the

corresponding row in both fcst_ens (for all ensemble members) and in obs will be deleted.

References

----------

- Hersbach, H. (2000) Decomposition of the Continuous Ranked Porbability Score

for Ensemble Prediction Systems, Weather and Forecasting, 15, 559-570.

"""

# Treating the Data

obs, fcst_ens = treat_data(obs, fcst_ens, remove_neg=remove_neg, remove_zero=remove_zero)

# Set parameters

n = fcst_ens.shape[0] # number of forecast start dates

m = fcst_ens.shape[1] # number of ensemble members

# Create vector of pi's

p = np.linspace(0, m, m + 1)

pi = p / m

crps_numba = np.zeros(n)

@njit

def calculate_crps():

# Loop fcst start times

for i in prange(n):

# Initialise vectors for storing output

a = np.zeros(m - 1)

b = np.zeros(m - 1)

# Verifying analysis (or obs)

xa = obs[i]

# Ensemble fcst CDF

x = np.sort(fcst_ens[i, :])

# Deal with 0 < i < m [So, will loop 50 times for m = 51]

for j in prange(m - 1):

# Rule 1

if xa > x[j + 1]:

a[j] = x[j + 1] - x[j]

b[j] = 0

# Rule 2

if x[j] < xa < x[j + 1]:

a[j] = xa - x[j]

b[j] = x[j + 1] - xa

# Rule 3

if xa < x[j]:

a[j] = 0

b[j] = x[j + 1] - x[j]

# Deal with outliers for i = 0, and i = m,

# else a & b are 0 for non-outliers

if xa < x[0]:

a1 = 0

b1 = x[0] - xa

else:

a1 = 0

b1 = 0

# Upper outlier (rem m-1 is for last member m, but python is 0-based indexing)

if xa > x[m - 1]:

am = xa - x[m - 1]

bm = 0

else:

am = 0

bm = 0

# Combine full a & b vectors including outlier

a = np.concatenate((np.array([0]), a, np.array([am])))

# a = np.insert(a, 0, a1)

# a = np.append(a, am)

a = np.concatenate((np.array([0]), a, np.array([bm])))

# b = np.insert(b, 0, b1)

# b = np.append(b, bm)

# Populate a_mat and b_mat

if i == 0:

a_mat = a

b_mat = b

else:

a_mat = np.vstack((a_mat, a))

b_mat = np.vstack((b_mat, b))

# Calc crps for individual start times

crps_numba[i] = ((a * pi ** 2) + (b * (1 - pi) ** 2)).sum()

return crps_numba, a_mat, b_mat

crps, a_mat, b_mat = calculate_crps()

print(crps)

# Calc mean crps as simple mean across crps[i]

crps_mean_method1 = np.mean(crps)

# Calc mean crps across all start times from eqn. 28 in Hersbach (2000)

abar = np.mean(a_mat, 0)

bbar = np.mean(b_mat, 0)

crps_mean_method2 = ((abar * pi ** 2) + (bbar * (1 - pi) ** 2)).sum()

# Output array as a dictionary

output = {'crps': crps, 'crpsMean1': crps_mean_method1,

'crpsMean2': crps_mean_method2}

return output

我得到的错误是这样的:

Cannot unify array(float64, 1d, C) and array(float64, 2d, C) for 'a_mat', defined at *path

File "test.py", line 86:

def calculate_crps():

if i == 0:

a_mat = a

^

[1] During: typing of assignment at *path

File "test.py", line 89:

def calculate_crps():

else:

a_mat = np.vstack((a_mat, a))

^

This is not usually a problem with Numba itself but instead often caused by

the use of unsupported features or an issue in resolving types.

我只是想知道我哪里出错了.似乎vstack功能应该可以工作,但也许我错过了一些东西.

解决方法:

I just wanted to know where I am going wrong. It seems as though the vstack function should work but maybe I am missing something.

TL; DR:这不是vstack问题.问题是您有代码路径尝试将不同类型的数组分配给同一个变量(这会抛出该统一异常).

问题在于:

# Populate a_mat and b_mat

if i == 0:

a_mat = a

b_mat = b

else:

a_mat = np.vstack((a_mat, a))

b_mat = np.vstack((b_mat, b))

在第一个代码路径中,您将1d c-contigous float64数组分配给a_mat和b_mat,而在else中,它是一个2d c-contiguous float64数组.这些类型不兼容,因此numba会抛出错误.

numba代码不像Python代码那样工作有时很棘手,因为在为变量赋值时,你所拥有的类型并不重要.但是在最近的版本中,numba异常消息得到了更好的解决,因此如果您知道异常提示,通常可以快速找出问题所在.

更长的解释

问题是numba隐式地推断出变量的类型.例如:

from numba import njit

@njit

def func(arr):

a = arr

return a

这里我没有输入函数所以我需要运行一次:

>>> import numpy as np

>>> func(np.zeros(5))

array([0., 0., 0., 0., 0.])

然后你可以检查类型:

>>> func.inspect_types()

func (array(float64, 1d, C),)

--------------------------------------------------------------------------------

# File:

# --- LINE 3 ---

# label 0

@njit

# --- LINE 4 ---

def func(arr):

# --- LINE 5 ---

# arr = arg(0, name=arr) :: array(float64, 1d, C)

# a = arr :: array(float64, 1d, C)

# del arr

a = arr

# --- LINE 6 ---

# $0.3 = cast(value=a) :: array(float64, 1d, C)

# del a

# return $0.3

return a

正如您所看到的,变量a的类型为array(float64,1d,C)的输入类型为array(float64,1d,C).

现在,让我们使用np.vstack代替:

from numba import njit

import numpy as np

@njit

def func(arr):

a = np.vstack((arr, arr))

return a

并强制第一次编译它:

>>> func(np.zeros(5))

array([[0., 0., 0., 0., 0.],

[0., 0., 0., 0., 0.]])

然后再次检查类型:

func (array(float64, 1d, C),)

--------------------------------------------------------------------------------

# File:

# --- LINE 4 ---

# label 0

@njit

# --- LINE 5 ---

def func(arr):

# --- LINE 6 ---

# arr = arg(0, name=arr) :: array(float64, 1d, C)

# $0.1 = global(np: ) :: Module()

# $0.2 = getattr(value=$0.1, attr=vstack) :: Function()

# del $0.1

# $0.5 = build_tuple(items=[Var(arr, (6)), Var(arr, (6))]) :: tuple(array(float64, 1d, C) x 2)

# del arr

# $0.6 = call $0.2($0.5, func=$0.2, args=[Var($0.5, (6))], kws=(), vararg=None) :: (tuple(array(float64, 1d, C) x 2),) -> array(float64, 2d, C)

# del $0.5

# del $0.2

# a = $0.6 :: array(float64, 2d, C)

# del $0.6

a = np.vstack((arr, arr))

# --- LINE 7 ---

# $0.8 = cast(value=a) :: array(float64, 2d, C)

# del a

# return $0.8

return a

此时a为数组输入(float64,1d,C)输入数组(float64,2d,C).

你可能问过自己为什么要谈论这个问题.让我们看看如果您尝试有条件地分配给:

from numba import njit

import numpy as np

@njit

def func(arr, condition):

if condition:

a = np.vstack((arr, arr))

else:

a = arr

return a

如果您现在运行代码:

>>> func(np.zeros(5), True)

TypingError: Failed at nopython (nopython frontend)

Cannot unify array(float64, 2d, C) and array(float64, 1d, C) for 'a', defined at (7)

File "", line 7:

def func(arr, condition):

if condition:

a = np.vstack((arr, arr))

^

[1] During: typing of assignment at (9)

File "", line 9:

def func(arr, condition):

else:

a = arr

^

这正是你所遇到的问题,因为变量需要在numba中只有一种类型,用于一组固定的输入类型.并且因为dtype,rank(维度数)和连续属性都是类型的一部分,所以不能将具有不同维度的数组分配给同一个变量.

请注意,您可以扩展尺寸以使其工作并从结果中再次挤出不必要的尺寸(可能不是很好但是它应该用最少量的“更改”来解决问题:

from numba import njit

import numpy as np

@njit

def func(arr, condition):

if condition:

a = np.vstack((arr, arr))

else:

a = np.expand_dims(arr, 0)

return a

>>> func(np.zeros(5), False)

array([[0., 0., 0., 0., 0.]]) #

>>> func(np.zeros(5), True)

array([[0., 0., 0., 0., 0.],

[0., 0., 0., 0., 0.]])

标签:python,numpy,python-3-x,optimization,numba

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值