I am trying to calculate quantile for a column values manually, but not able to find the correct quantile value manually using the formula when compared to result output from Pandas.
I looked around for different solutions, but did not find the right answer
In [54]: df
Out[54]:
data1 data2 key1 key2
0 -0.204708 1.393406 a one
1 0.478943 0.092908 a two
2 1.965781 1.246435 a one
In [55]: grouped = df.groupby('key1')
In [56]: grouped['data1'].quantile(0.9)
Out[56]:
key1
a 1.668413
using the formula to find it manually,n is 3 as there are 3 values in data1 column
quantile(n+1)
applying the values of df1 column
=0.9(n+1)
=0.9(4)
= 3.6
so 3.6th position is 1.965781, so how does pandas gives 1.668413 ?
解决方案
The function quantile will assign percentages based on the range of your data.
In your case:
-0.204708 would be considered the 0th percentile,
0.478943 would be considered the 50th percentile and
1.965781 would be considered the 100th percentile.
So you could calculate the 90th percentile the following way (using linear interpolation between the 50th and 100th percentile:
>>import numpy as np
>>x =np.array([-0.204708,1.965781,0.478943])
>>ninetieth_percentile = (x[1] - x[2])/0.5*0.4+x[2]
>>ninetieth_percentile
1.6684133999999999
Note the values 0.5 and 0.4 come from the fact that two points of your data span 50% of the data and 0.4 represents the amount above the 50% you wish to find (0.5+0.4 = 0.9). Hope this makes sense.