python 分位数计算代码_Python Pandas-手动分位数计算

I am trying to calculate quantile for a column values manually, but not able to find the correct quantile value manually using the formula when compared to result output from Pandas.

I looked around for different solutions, but did not find the right answer

In [54]: df

Out[54]:

data1 data2 key1 key2

0 -0.204708 1.393406 a one

1 0.478943 0.092908 a two

2 1.965781 1.246435 a one

In [55]: grouped = df.groupby('key1')

In [56]: grouped['data1'].quantile(0.9)

Out[56]:

key1

a 1.668413

using the formula to find it manually,n is 3 as there are 3 values in data1 column

quantile(n+1)

applying the values of df1 column

=0.9(n+1)

=0.9(4)

= 3.6

so 3.6th position is 1.965781, so how does pandas gives 1.668413 ?

解决方案

The function quantile will assign percentages based on the range of your data.

In your case:

-0.204708 would be considered the 0th percentile,

0.478943 would be considered the 50th percentile and

1.965781 would be considered the 100th percentile.

So you could calculate the 90th percentile the following way (using linear interpolation between the 50th and 100th percentile:

>>import numpy as np

>>x =np.array([-0.204708,1.965781,0.478943])

>>ninetieth_percentile = (x[1] - x[2])/0.5*0.4+x[2]

>>ninetieth_percentile

1.6684133999999999

Note the values 0.5 and 0.4 come from the fact that two points of your data span 50% of the data and 0.4 represents the amount above the 50% you wish to find (0.5+0.4 = 0.9). Hope this makes sense.

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值