python均值和标准差,如何在Python中从频率分布表中获得均值和标准差

I have a list of tuples [(val1, freq1), (val2, freq2) .... (valn, freqn)]. I need to get measures of central tendencies (mean, median ) and measures of deviation (variance , mean) for the above data.I would also like to plot a boxplot for the values.

I see that numpy arrays have direct methods for getting mean / median and standard deviation (or variance) from list of values.

Does numpy (or any other well-known library) have a direct means to operate on such a frequency distribution table ?

Also What is the best way to programtically expand the above list of tuples to one list ? (e.g if freq dist is [(1,3) , (50,2)], best way to get a list [1,1,1,50,50] to use np.mean([1,1,1,50,50]))

I see a custom function here, but I would like to use a standard implementation if possible

解决方案

First, I'd change that messy list into two numpy arrays like @user8153 did:

val, freq = np.array(list_tuples).T

Then you can reconstruct the array (using np.repeat prevent looping):

data = np.repeat(val, freq)

If that causes memory errors (or you just want to squeeze out as much performance as possible), you can also use some purpose-built functions:

def mean_(val, freq):

return np.average(val, weights = freq)

def median_(val, freq):

ord = np.argsort(val)

cdf = np.cumsum(freq[ord])

return val[ord][np.searchsorted(cdf, cdf[-1] // 2)]

def mode_(val, freq): #in the strictest sense, assuming unique mode

return val[np.argmax(freq)]

def var_(val, freq):

avg = mean_(val, freq)

dev = freq * (val - avg) ** 2

return dev.sum() / (freq.sum() - 1)

def std_(val, freq):

return np.sqrt(var_(val, freq))

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值