众数(Mode)
在Hive中,没有直接提供计算众数的内置函数。但您可以通过以下方式来近似计算众数,例如通过使用collect_list、explode和group by来实现,这里假设your_table是您的表,col是您要计算众数的列。
SELECT col, COUNT(*) as count
FROM (
SELECT col, EXPLODE(COLLECT_LIST(col)) as col
FROM your_table
GROUP BY col
) t
GROUP BY col
ORDER BY count DESC
LIMIT 1;
平均值
使用内置的AVG函数来计算平均值,其中,column_name是您要计算平均值的列。
SELECT AVG(column_name) AS average_value
FROM your_table;
方差
使用内置的VAR_POP或VAR_SAMP函数来计算总体方差或样本方差,其中,column_name是您要计算方差的列。
总体方差(VAR_POP):
SELECT VAR_POP(column_name) AS population_variance
FROM your_table;
样本方差(VAR_SAMP):
SELECT VAR_SAMP(column_name) AS sample_variance
FROM your_table;
1/4,1/2,3/4分位数
PERCENTILE_APPROX
SELECT
PERCENTILE_APPROX(column_name, 0.25) AS first_quartile,
PERCENTILE_APPROX(column_name, 0.5) AS median,
PERCENTILE_APPROX(column_name, 0.75) AS third_quartile
FROM
your_table;
ERCENTILE函数
SELECT
PERCENTILE(column_name, 0.25) AS first_quartile,
PERCENTILE(column_name, 0.5) AS median,
PERCENTILE(column_name, 0.75) AS third_quartile
FROM
your_table;
饱和度
SELECT
column_name,
COUNT(DISTINCT column_value) AS distinct_values,
COUNT(*) AS total_values,
(COUNT(DISTINCT column_value) / COUNT(*)) * 100 AS saturation_percentage
FROM
your_table
GROUP BY
column_name;