//参考资料:https://www.cnblogs.com/fujian-code/p/8798409.html
使用percentile_approx函数计算age的分位数
//describe只展示计数、均值、标准差、最小值和最大值, Q1、中位数、Q3需要单独统计
/*
scala> df.select("age").describe().show
+-------+------------------+
|summary| age|
+-------+------------------+
| count| 5811|
| mean|29.463087248322147|
| stddev| 4.775418126339402|
| min| 18|
| max| 118|
+-------+------------------+
*/
//创建TempView,再使用spark.sql计算分位数
df.createOrReplaceTempView("trainFeatures")
scala> spark.sql("SELECT min(age) AS Min_age, percentile_approx(age, 0.25) AS Q1_age, percentile_approx(age, 0.5) AS Median_age," +
"percentile_approx(age, 0.75) AS Q3_age, max(age) AS Max_age FROM trainFeatures").show
+-------+------+----------+------+-------+
|Min_age|Q1_age|Median_age|Q3_age|Max_age|
+-------+------+----------+------+-------+
| 18| 26| 29| 32| 118|
+-------+------+----------+------+-------+
转载于:https://my.oschina.net/kyo4321/blog/3050522