更新于08/06/2019
纯粹的,快速的&大输入的矢量化解决方案
用于就地计算的out参数,
dtype参数,
索引顺序参数
这个函数相当于pandas的ewm(adjust = False).mean(),但要快得多. ewm(adjust = True).mean()(pandas的默认值)可以在结果的开头产生不同的值.我正在努力为此解决方案添加adjust功能.
当输入太大时,@Divakar’s answer会导致浮点精度问题.这是因为(1-α)**(n 1) – >当n – >时为0 inf和alpha – > 1,导致在计算中弹出除零和NaN值.
这是我最快的解决方案,没有精度问题,几乎完全矢量化.它有点复杂,但性能很好,特别是对于非常大的输入.不使用就地计算(可以使用out参数,节省内存分配时间):100M元素输入向量为3.62秒,100K元素输入向量为3.2ms,相对较老的5000元素输入向量为293μs PC(结果将随着不同的alpha / row_size值而变化).
# tested with python3 & numpy 1.15.2
import numpy as np
def ewma_vectorized_safe(data, alpha, row_size=None, dtype=None, order='C', out=None):
"""
Reshapes data before calculating EWMA, then iterates once over the rows
to calculate the offset without precision issues
:param data: Input data, will be flattened.
:param alpha: scalar float in range (0,1)
The alpha parameter for the moving average.
:param row_size: int, optional
The row size to use in the computation. High row sizes need higher precision,
low values will impact performance. The optimal value depends on the
platform and the alpha being used. Higher alpha values require lower
row size. Default depends on dtype.
:param dtype: optional
Data type used for calculations. Defaults to float64 unless
data.dtype is float32, then it will use float32.
:param order: {'C', 'F', 'A'}, optional
Order to use when flattening the data. Defaults to 'C'.
:param out: ndarray, or None, optional
A location into which the result is stored. If provided, it must have
the same shape as the desired output. If not provided or `None`,
a freshly-allocated array is returned.
:return: The flattened result.
"""
data = np.array(data, copy=False)
if dtype is None:
if data.dtype == np.float32:
dtype = np.float32
else:
dtype = np.float
else:
dtype = np.dtype(dtype)
row_size = int(row_size) if row_size is not None
else get_max_row_size(alpha, dtype)
if data.size <= row_size:
# The normal function can handle this input, use that
return ewma_vectorized(data, alpha, dtype=dtype, order=order, out=out)
if data.ndim > 1:
# flatten input
data = np.reshape(data, -1, order=order)
if out is None:
out = np.empty_like(data, dtype=dtype)
else:
assert out.shape == d