Postgres2015全国用户大会将于11月20至21日在北京丽亭华苑酒店召开。本次大会嘉宾阵容强大,国内顶级PostgreSQL数据库专家将悉数到场,并特邀欧洲、俄罗斯、日本、美国等国家和地区的数据库方面专家助阵:
- Postgres-XC项目的发起人铃木市一(SUZUKI Koichi)
- Postgres-XL的项目发起人Mason Sharp
- pgpool的作者石井达夫(Tatsuo Ishii)
- PG-Strom的作者海外浩平(Kaigai Kohei)
- Greenplum研发总监姚延栋
- 周正中(德哥), PostgreSQL中国用户会创始人之一
- 汪洋,平安科技数据库技术部经理
- ……
|
|
前面一篇对PostgreSQL支持的一般聚合函数做了一些实例讲解,
本文将讲解一下统计类的聚合函数.
在此前我分别以相关性函数corr和计算总体/样本 方差, 标准方差的函数variance, var_pop, var_samp, stddev, stddev_pop, stddev_samp做过比较详细的讲解.
例如corr在数据库的物理存储顺序以及逻辑顺序的统计信息中就有应用. 请参考 :
PostgreSQL 9.4为例, 目前支持的统计学聚合函数表如下 :
这些函数的源码全部都在 src/backend/utils/adt/float.c .
这些统计学相关的术语, 算法可参考维基百科, 或百度百科.
Table 9-50. Aggregate Functions for Statistics
Function | Argument Type | Return Type | Description |
---|---|---|---|
corr(Y, X) | double precision | double precision | correlation coefficient |
covar_pop(Y, X) | double precision | double precision | population covariance |
covar_samp(Y, X) | double precision | double precision | sample covariance |
regr_avgx(Y, X) | double precision | double precision | average of the independent variable (sum(X)/N) |
regr_avgy(Y, X) | double precision | double precision | average of the dependent variable (sum(Y)/N) |
regr_count(Y, X) | double precision | bigint | number of input rows in which both expressions are nonnull |
regr_intercept(Y, X) | double precision | double precision | y-intercept of the least-squares-fit linear equation determined by the (X, Y) pairs |
regr_r2(Y, X) | double precision | double precision | square of the correlation coefficient |
regr_slope(Y, X) | double precision | double precision | slope of the least-squares-fit linear equation determined by the (X, Y) pairs |
regr_sxx(Y, X) | double precision | double precision | sum(X^2) - sum(X)^2/N ("sum of squares" of the independent variable) |
regr_sxy(Y, X) | double precision | double precision | sum(X*Y) - sum(X) * sum(Y)/N ("sum of products" of independent times dependent variable) |
regr_syy(Y, X) | double precision | double precision | sum(Y^2) - sum(Y)^2/N ("sum of squares" of the dependent variable) |
stddev(expression) | smallint, int, bigint, real, double precision, or numeric | double precision for floating-point arguments, otherwise numeric | historical alias for stddev_samp |
stddev_pop(expression) | smallint, int, bigint, real, double precision, or numeric | double precision for floating-point arguments, otherwise numeric | population standard deviation of the input values |
stddev_samp(expression) | smallint, int, bigint, real, double precision, or numeric | double precision for floating-point arguments, otherwise numeric | sample standard deviation of the input values |
variance (expression) | smallint, int, bigint, real, double precision, or numeric | double precision for floating-point arguments, otherwise numeric | historical alias for var_samp |
var_pop (expression) | smallint, int, bigint, real, double precision, or numeric | double precision for floating-point arguments, otherwise numeric | population variance of the input values (square of the population standard deviation) |
var_samp (expression) | smallint, int, bigint, real, double precision, or numeric | double precision for floating-point arguments, otherwise numeric | sample variance of the input values (square of the sample standard deviation) |
Table 9-51 shows some aggregate functions that use the ordered-set aggregate syntax. These functions are sometimes referred to as "inverse distribution" functions.
如下 :
相关性统计 :
corr, regr_r2
总体|样本 方差, 标准方差 :
variance, var_pop, var_samp
stddev, stddev_pop, stddev_samp
总体协方差,
样本协方差
:
covar_pop, covar_samp
线性回归 :
regr_avgx,
regr_avgy,
regr_count,
regr_intercept(截距),
regr_r2(相关度corr的平方),
regr_slope(斜率), regr_sxx, regr_sxy, regr_syy.
[参考]
2.
http://blog.163.com/digoal@126/blog/static/163877040201512835135264/
3. http://blog.163.com/digoal@126/blog/static/163877040201512810112541/
3. http://blog.163.com/digoal@126/blog/static/163877040201512810112541/
5. src/backend/utils/adt/float.c