关于求解中位数,我们知道在Python中直接有中位数处理函数(mean),比如在Python中求解一个中位数,代码很简单。
Python计算中位数
import numpy as np
nums = [1.1,2.2,3.3,4.4,5.5,6.6]
#均值
np.mean(nums)
#中位数
np.median(nums)
在hive中没有直接提供相关的mean函数,但官方提供了两个UDAF,percentile和percentile_approx。
我们看下官方是怎么说的
DOUBLEpercentile(BIGINT col, p)Returns the exact pthpercentile of a column in the group (does not work with floating point types). p must be between 0 and 1. NOTE: A true percentile can only be computed for integer values. Use PERCENTILE_APPROX if your input is non-integral.
arraypercentile(BIGINT col, array(p1[, p2]...))Returns the exact percentiles p1, p2, ... of a column in the group (does not work with floating point types). pimust be between 0 and 1. NOTE: A true percentile can only be computed for integer values. Use PERCENTILE_APPROX if your input is non-integral.
DOUBLEpercentil