I need to calculate the mean in columns of an array with more than 1000 rows.
np.mean(some_array) gives me
inf as output
but i am pretty sure the values are ok. I am loading a csv from here into my Data variable and column 'cement' is "healthy" from my point of view.
In[254]:np.mean(Data[:230]['Cement'])
Out[254]:275.75
but if I increase the number of rows
the problem starts:
In [259]:np.mean(Data[:237]['Cement'])
Out[259]:inf
but when i look at the Data
In [261]:Data[230:237]['Cement']
Out[261]:
array([[ 425. ],
[ 333. ],
[ 250.25],
[ 491. ],
[ 160. ],
[ 229.75],
[ 338. ]], dtype=float16)
i do not find a reason for this behaviour
P.S This happens in Python 3.x using wakari (cloud based Ipython)
Numpy Version '1.8.1'
I am loading the Data with:
No_Col=9
conv = lambda valstr: float(valstr.replace(',','.'))
c={}
for i in range(0,No_Col,1):
c[i] = conv
Data=np.genfromtxt(get_data,dtype=float16 , delimiter='\t', skip_header=0, names=True, converters=c)
解决方案
I will guess that the problem is precision (as others have also commented). Quoting directly from the documentation for mean() we see
Notes
The arithmetic mean is the sum of the elements along the axis divided
by the number of elements.
Note that for floating-point input, the mean is computed using the
same precision the input has. Depending on the input data, this can
cause the results to be inaccurate, especially for float32 (see
example below). Specifying a higher-precision accumulator using the
dtype keyword can alleviate this issue.
Since your array is of type float16 you have very limited precision. Using dtype=np.float64 will probably alleviate the overflow. Also see the examples in the mean() documentation.