在平时的工作中,经常有按照不同维度筛选和统计数据的需求。拿视频会员订单数据来说吧,运营人员要查看深圳市的成功下单数或则深圳市某一种产品的成功下单数或者某一种产品的所有成功下单数时,每天的订单数又很大,现查的话按照不同的维度去查询又很慢。此时本篇文章或许会帮助到你。
group by:主要用来对查询的结果进行分组,相同组合的分组条件在结果集中只显示一行记录。可以添加聚合函数。
grouping sets:对分组集中指定的组表达式的每个子集执行group by,group by A,B grouping sets(A,B)就等价于 group by A union group by B,其中A和B也可以是一个集合,比如group by A,B,C grouping sets((A,B),(A,C))。
rollup:在指定表达式的每个层次级别创建分组集。group by A,B,C with rollup首先会对(A、B、C)进行group by,然后对(A、B)进行group by,然后是(A)进行group by,最后对全表进行group by操作。
cube:为指定表达式集的每个可能组合创建分组集。首先会对(A、B、C)进行group by,然后依次是(A、B),(A、C),(A),(B、C),(B),( C),最后对全表进行group by操作。
数据库中会员订单的数据映射的对象如下:
<span style="color:#000000"><code class="language-java"><span style="color:#c678dd">case</span> <span style="color:#c678dd">class</span> MemberOrderInfo<span style="color:#999999">(</span>area<span style="color:#669900">:</span>String<span style="color:#999999">,</span>memberType<span style="color:#669900">:</span>String<span style="color:#999999">,</span>product<span style="color:#669900">:</span>String<span style="color:#999999">,</span>price<span style="color:#669900">:</span>Int<span style="color:#999999">)</span>
</code></span>
- 1
会员订单表中的数据如下:
<span style="color:#000000"><code class="language-java"> <span style="color:#c678dd">import</span> sqlContext<span style="color:#999999">.</span>implicits<span style="color:#999999">.</span>_
val orders<span style="color:#669900">=</span><span style="color:#61aeee">Seq</span><span style="color:#999999">(</span>
<span style="color:#61aeee">MemberOrderInfo</span><span style="color:#999999">(</span><span style="color:#669900">"深圳"</span><span style="color:#999999">,</span><span style="color:#669900">"钻石会员"</span><span style="color:#999999">,</span><span style="color:#669900">"钻石会员1个月"</span><span style="color:#999999">,</span><span style="color:#98c379">25</span><span style="color:#999999">)</span><span style="color:#999999">,</span>
<span style="color:#61aeee">MemberOrderInfo</span><span style="color:#999999">(</span><span style="color:#669900">"深圳"</span><span style="color:#999999">,</span><span style="color:#669900">"钻石会员"</span><span style="color:#999999">,</span><span style="color:#669900">"钻石会员1个月"</span><span style="color:#999999">,</span><span style="color:#98c379">25</span><span style="color:#999999">)</span><span style="color:#999999">,</span>
<span style="color:#61aeee">MemberOrderInfo</span><span style="color:#999999">(</span><span style="color:#669900">"深圳"</span><span style="color:#999999">,</span><span style="color:#669900">"钻石会员"</span><span style="color:#999999">,</span><span style="color:#669900">"钻石会员3个月"</span><span style="color:#999999">,</span><span style="color:#98c379">70</span><span style="color:#999999">)</span><span style="color:#999999">,</span>
<span style="color:#61aeee">MemberOrderInfo</span><span style="color:#999999">(</span><span style="color:#669900">"深圳"</span><span style="color:#999999">,</span><span style="color:#669900">"钻石会员"</span><span style="color:#999999">,</span><span style="color:#669900">"钻石会员12个月"</span><span style="color:#999999">,</span><span style="color:#98c379">300</span><span style="color:#999999">)</span><span style="color:#999999">,</span>
<span style="color:#61aeee">MemberOrderInfo</span><span style="color:#999999">(</span><span style="color:#669900">"深圳"</span><span style="color:#999999">,</span><span style="color:#669900">"铂金会员"</span><span style="color:#999999">,</span><span style="color:#669900">"铂金会员3个月"</span><span style="color:#999999">,</span><span style="color:#98c379">60</span><span style="color:#999999">)</span><span style="color:#999999">,</span>
<span style="color:#61aeee">MemberOrderInfo</span><span style="color:#999999">(</span><span style="color:#669900">"深圳"</span><span style="color:#999999">,</span><span style="color:#669900">"铂金会员"</span><span style="color:#999999">,</span><span style="color:#669900">"铂金会员3个月"</span><span style="color:#999999">,</span><span style="color:#98c379">60</span><span style="color:#999999">)</span><span style="color:#999999">,</span>
<span style="color:#61aeee">MemberOrderInfo</span><span style="color:#999999">(</span><span style="color:#669900">"深圳"</span><span style="color:#999999">,</span><span style="color:#669900">"铂金会员"</span><span style="color:#999999">,</span><span style="color:#669900">"铂金会员6个月"</span><span style="color:#999999">,</span><span style="color:#98c379">120</span><span style="color:#999999">)</span><span style="color:#999999">,</span>
<span style="color:#61aeee">MemberOrderInfo</span><span style="color:#999999">(</span><span style="color:#669900">"深圳"</span><span style="color:#999999">,</span><span style="color:#669900">"黄金会员"</span><span style="color:#999999">,</span><span style="color:#669900">"黄金会员1个月"</span><span style="color:#999999">,</span><span style="color:#98c379">15</span><span style="color:#999999">)</span><span style="color:#999999">,</span>
<span style="color:#61aeee">MemberOrderInfo</span><span style="color:#999999">(</span><span style="color:#669900">"深圳"</span><span style="color:#999999">,</span><span style="color:#669900">"黄金会员"</span><span style="color:#999999">,</span><span style="color:#669900">"黄金会员1个月"</span><span style="color:#999999">,</span><span style="color:#98c379">15</span><span style="color:#999999">)</span><span style="color:#999999">,</span>
<span style="color:#61aeee">MemberOrderInfo</span><span style="color:#999999">(</span><span style="color:#669900">"深圳"</span><span style="color:#999999">,</span><span style="color:#669900">"黄金会员"</span><span style="color:#999999">,</span><span style="color:#669900">"黄金会员3个月"</span><span style="color:#999999">,</span><span style="color:#98c379">45</span><span style="color:#999999">)</span><span style="color:#999999">,</span>
<span style="color:#61aeee">MemberOrderInfo</span><span style="color:#999999">(</span><span style="color:#669900">"深圳"</span><span style="color:#999999">,</span><span style="color:#669900">"黄金会员"</span><span style="color:#999999">,</span><span style="color:#669900">"黄金会员12个月"</span><span style="color:#999999">,</span><span style="color:#98c379">180</span><span style="color:#999999">)</span><span style="color:#999999">,</span>
<span style="color:#61aeee">MemberOrderInfo</span><span style="color:#999999">(</span><span style="color:#669900">"北京"</span><span style="color:#999999">,</span><span style="color:#669900">"钻石会员"</span><span style="color:#999999">,</span><span style="color:#669900">"钻石会员1个月"</span><span style="color:#999999">,</span><span style="color:#98c379">25</span><span style="color:#999999">)</span><span style="color:#999999">,</span>
<span style="color:#61aeee">MemberOrderInfo</span><span style="color:#999999">(</span><span style="color:#669900">"北京"</span><span style="color:#999999">,</span><span style="color:#669900">"钻石会员"</span><span style="color:#999999">,</span><span style="color:#669900">"钻石会员1个月"</span><span style="color:#999999">,</span><span style="color:#98c379">25</span><span style="color:#999999">)</span><span style="color:#999999">,</span>
<span style="color:#61aeee">MemberOrderInfo</span><span style="color:#999999">(</span><span style="color:#669900">"北京"</span><span style="color:#999999">,</span><span style="color:#669900">"铂金会员"</span><span style="color:#999999">,</span><span style="color:#669900">"铂金会员3个月"</span><span style="color:#999999">,</span><span style="color:#98c379">60</span><span style="color:#999999">)</span><span style="color:#999999">,</span>
<span style="color:#61aeee">MemberOrderInfo</span><span style="color:#999999">(</span><span style="color:#669900">"北京"</span><span style="color:#999999">,</span><span style="color:#669900">"黄金会员"</span><span style="color:#999999">,</span><span style="color:#669900">"黄金会员3个月"</span><span style="color:#999999">,</span><span style="color:#98c379">45</span><span style="color:#999999">)</span><span style="color:#999999">,</span>
<span style="color:#61aeee">MemberOrderInfo</span><span style="color:#999999">(</span><span style="color:#669900">"上海"</span><span style="color:#999999">,</span><span style="color:#669900">"钻石会员"</span><span style="color:#999999">,</span><span style="color:#669900">"钻石会员1个月"</span><span style="color:#999999">,</span><span style="color:#98c379">25</span><span style="color:#999999">)</span><span style="color:#999999">,</span>
<span style="color:#61aeee">MemberOrderInfo</span><span style="color:#999999">(</span><span style="color:#669900">"上海"</span><span style="color:#999999">,</span><span style="color:#669900">"钻石会员"</span><span style="color:#999999">,</span><span style="color:#669900">"钻石会员1个月"</span><span style="color:#999999">,</span><span style="color:#98c379">25</span><span style="color:#999999">)</span><span style="color:#999999">,</span>
<span style="color:#61aeee">MemberOrderInfo</span><span style="color:#999999">(</span><span style="color:#669900">"上海"</span><span style="color:#999999">,</span><span style="color:#669900">"铂金会员"</span><span style="color:#999999">,</span><span style="color:#669900">"铂金会员3个月"</span><span style="color:#999999">,</span><span style="color:#98c379">60</span><span style="color:#999999">)</span><span style="color:#999999">,</span>
<span style="color:#61aeee">MemberOrderInfo</span><span style="color:#999999">(</span><span style="color:#669900">"上海"</span><span style="color:#999999">,</span><span style="color:#669900">"黄金会员"</span><span style="color:#999999">,</span><span style="color:#669900">"黄金会员3个月"</span><span style="color:#999999">,</span><span style="color:#98c379">45</span><span style="color:#999999">)</span>
<span style="color:#999999">)</span>
<span style="color:#5c6370">//把seq转换成DataFrame</span>
val memberDF<span style="color:#669900">:</span>DataFrame <span style="color:#669900">=</span>orders<span style="color:#999999">.</span><span style="color:#61aeee">toDF</span><span style="color:#999999">(</span><span style="color:#999999">)</span>
<span style="color:#5c6370">//把DataFrame注册成临时表</span>
memberDF<span style="color:#999999">.</span><span style="color:#61aeee">registerTempTable</span><span style="color:#999999">(</span><span style="color:#669900">"orderTempTable"</span><span style="color:#999999">)</span>
</code></span>
接下来我们通过操作 orderTempTable 来看一下grouping sets、group by、rollup和cube具体如何使用。
1.group by
group by是SELECT语句的从句,用来指定查询分组条件,主要用来对查询的结果进行分组,相同组合的分组条件在结果集中只显示一行记录。使用group by从句时候,通过添加聚合函数(主要有COUNT()、SUM、MAX()、MIN()等)可以使数据聚合。
<span style="color:#000000"><code class="language-java"> sqlContext<span style="color:#999999">.</span><span style="color:#61aeee">sql</span><span style="color:#999999">(</span><span style="color:#669900">"select area,memberType,product,sum(price) as total from orderTempTable group by area,memberType,product"</span><span style="color:#999999">)</span>
</code></span>
- 1
执行结果如下,可以看到结果中按照area,memberType,product来做聚合sum操作。
<span style="color:#000000"><code>+----+----------+--------+---+
|area|memberType|product |total|
+----+----------+--------+---+
|深圳 |铂金会员 |铂金会员6个月 |120|
|深圳 |黄金会员 |黄金会员12个月|180|
|深圳 |钻石会员 |钻石会员3个月 |70 |
|深圳 |黄金会员 |黄金会员3个月 |45 |
|深圳 |钻石会员 |钻石会员12个月|300|
|北京 |黄金会员 |黄金会员3个月 |45 |
|深圳 |钻石会员 |钻石会员1个月 |50 |
|深圳 |黄金会员 |黄金会员1个月 |30 |
|深圳 |铂金会员 |铂金会员3个月 |120|
|北京 |钻石会员 |钻石会员1个月 |50 |
|北京 |铂金会员 |铂金会员3个月 |60 |
|上海 |黄金会员 |黄金会员3个月 |45 |
|上海 |钻石会员 |钻石会员1个月 |50 |
|上海 |铂金会员 |铂金会员3个月 |60 |
+----+----------+--------+---+
</code></span>
2.grouping sets
a.grouping sets是group by子句更进一步的扩展, 它让你能够定义多个数据分组。这样做使聚合更容易, 并且因此使得多维数据分析更容易。
b.够用grouping sets在同一查询中定义多个分组
<span style="color:#000000"><code class="language-java"> sqlContext<span style="color:#999999">.</span><span style="color:#61aeee">sql</span><span style="color:#999999">(</span><span style="color:#669900">"select area,memberType,product,sum(price) as total from orderTempTable group by area,memberType,product grouping sets(area,memberType,product)"</span><span style="color:#999999">)</span>
</code></span>
- 1
- 2
上面的语句输出结果如下,可以看到使用grouping sets(area,memberType,product)会分别对这3个维度进行group by,也可以grouping sets ((area,memberType),(area,product)))此时相当于group by (area,memberType) union group by (area,product),也就是说grouping sets 后面可以指定你想要的各种维度组合。
<span style="color:#000000"><code>+----+----------+--------+-----+
|area|memberType|product |total|
+----+----------+--------+-----+
|null|null |铂金会员3个月 |240 |
|null|铂金会员 |null |360 |
|上海 |null |null |155 |
|null|钻石会员 |null |520 |
|null|null |钻石会员12个月|300 |
|null|null |黄金会员12个月|180 |
|null|null |钻石会员3个月 |70 |
|null|null |黄金会员3个月 |135 |
|深圳 |null |null |915 |
|null|null |钻石会员1个月 |150 |
|null|null |黄金会员1个月 |30 |
|null|黄金会员 |null |345 |
|北京 |null |null |155 |
|null|null |铂金会员6个月 |120 |
+----+----------+--------+-----+
</code></span>
3.rollup
rollup 是根据维度在数据结果集中进行的聚合操作。
group by A,B,C with rollup首先会对(A、B、C)进行group by,然后对(A、B)进行group by,然后是(A)进行group by,最后对全表进行group by操作。
<span style="color:#000000"><code class="language-java"> sqlContext<span style="color:#999999">.</span><span style="color:#61aeee">sql</span><span style="color:#999999">(</span><span style="color:#669900">"select area,memberType,product,sum(price) as total from orderTempTable group by area,memberType,product with rollup"</span><span style="color:#999999">)</span>
</code></span>
输出结果中,可以group by A,B,C with rollup,的确是上述几种group by的并集。
<span style="color:#000000"><code>+----+----------+--------+-----+
|area|memberType|product |total|
+----+----------+--------+-----+
|上海 |null |null |155 |
|北京 |铂金会员 |null |60 |
|北京 |钻石会员 |null |50 |
|上海 |钻石会员 |钻石会员1个月 |50 |
|深圳 |黄金会员 |黄金会员1个月 |30 |
|深圳 |钻石会员 |钻石会员12个月|300 |
|北京 |黄金会员 |黄金会员3个月 |45 |
|深圳 |钻石会员 |钻石会员3个月 |70 |
|北京 |铂金会员 |铂金会员3个月 |60 |
|上海 |铂金会员 |null |60 |
|上海 |钻石会员 |null |50 |
|深圳 |黄金会员 |null |255 |
|深圳 |null |null |915 |
|上海 |黄金会员 |黄金会员3个月 |45 |
|深圳 |铂金会员 |铂金会员3个月 |120 |
|深圳 |钻石会员 |钻石会员1个月 |50 |
|上海 |铂金会员 |铂金会员3个月 |60 |
|北京 |黄金会员 |null |45 |
|深圳 |铂金会员 |null |240 |
|null|null |null |1225 |
|深圳 |钻石会员 |null |420 |
|北京 |null |null |155 |
|北京 |钻石会员 |钻石会员1个月 |50 |
|深圳 |黄金会员 |黄金会员12个月|180 |
|深圳 |铂金会员 |铂金会员6个月 |120 |
|深圳 |黄金会员 |黄金会员3个月 |45 |
|上海 |黄金会员 |null |45 |
+----+----------+--------+-----+
</code></span>
4.cube
group by A,B,C with cube,则首先会对(A、B、C)进行group by,然后依次是(A、B),(A、C),(A),(B、C),(B),( C),最后对全表进行group by操作。
<span style="color:#000000"><code class="language-java"> sqlContext<span style="color:#999999">.</span><span style="color:#61aeee">sql</span><span style="color:#999999">(</span><span style="color:#669900">"select area,memberType,product,sum(price) as total from orderTempTable group by area,memberType,product with cube"</span><span style="color:#999999">)</span>
</code></span>
- 1
- 2
<span style="color:#000000"><code>+----+----------+--------+-----+
|area|memberType|product |total|
+----+----------+--------+-----+
|深圳 |null |黄金会员12个月|180 |
|深圳 |null |钻石会员3个月 |70 |
|深圳 |null |黄金会员3个月 |45 |
|null|null |铂金会员3个月 |240 |
|北京 |null |铂金会员3个月 |60 |
|null|铂金会员 |null |360 |
|上海 |null |null |155 |
|北京 |铂金会员 |null |60 |
|null|钻石会员 |null |520 |
|北京 |钻石会员 |null |50 |
|上海 |钻石会员 |钻石会员1个月 |50 |
|深圳 |黄金会员 |黄金会员1个月 |30 |
|null|null |钻石会员12个月|300 |
|深圳 |钻石会员 |钻石会员12个月|300 |
|null|黄金会员 |黄金会员12个月|180 |
|null|铂金会员 |铂金会员6个月 |120 |
|null|黄金会员 |黄金会员3个月 |135 |
|深圳 |null |钻石会员1个月 |50 |
|深圳 |null |黄金会员1个月 |30 |
|北京 |黄金会员 |黄金会员3个月 |45 |
|null|null |黄金会员12个月|180 |
|上海 |null |铂金会员3个月 |60 |
|null|null |钻石会员3个月 |70 |
|深圳 |钻石会员 |钻石会员3个月 |70 |
|null|null |黄金会员3个月 |135 |
|北京 |铂金会员 |铂金会员3个月 |60 |
|北京 |null |黄金会员3个月 |45 |
|上海 |铂金会员 |null |60 |
|上海 |钻石会员 |null |50 |
|深圳 |黄金会员 |null |255 |
|null|黄金会员 |黄金会员1个月 |30 |
|深圳 |null |null |915 |
|null|钻石会员 |钻石会员12个月|300 |
|上海 |黄金会员 |黄金会员3个月 |45 |
|深圳 |铂金会员 |铂金会员3个月 |120 |
|null|null |钻石会员1个月 |150 |
|深圳 |钻石会员 |钻石会员1个月 |50 |
|null|null |黄金会员1个月 |30 |
|北京 |null |钻石会员1个月 |50 |
|上海 |铂金会员 |铂金会员3个月 |60 |
|上海 |null |黄金会员3个月 |45 |
|null|钻石会员 |钻石会员3个月 |70 |
|深圳 |null |铂金会员6个月 |120 |
|null|黄金会员 |null |345 |
|北京 |黄金会员 |null |45 |
|深圳 |null |铂金会员3个月 |120 |
|深圳 |铂金会员 |null |240 |
|null|null |null |1225 |
|深圳 |钻石会员 |null |420 |
|北京 |null |null |155 |
|null|铂金会员 |铂金会员3个月 |240 |
|上海 |null |钻石会员1个月 |50 |
|null|钻石会员 |钻石会员1个月 |150 |
|深圳 |null |钻石会员12个月|300 |
|北京 |钻石会员 |钻石会员1个月 |50 |
|深圳 |黄金会员 |黄金会员12个月|180 |
|深圳 |铂金会员 |铂金会员6个月 |120 |
|深圳 |黄金会员 |黄金会员3个月 |45 |
|null|null |铂金会员6个月 |120 |
|上海 |黄金会员 |null |45 |
+----+----------+--------+-----+
</code></span>
5.应用
经过group by、grouping sets、rollup和cube处理过后的数据可以直接存储到MySQL等数据库中。文章开头中提到的需求,可以根据运营提出的筛选维度进行cube分析,就可以得到各种维度组合下的订单统计结果。比如要获取总的定单数。可以如下处理:
<span style="color:#000000"><code class="language-java"> val cubeDF <span style="color:#669900">=</span> sqlContext<span style="color:#999999">.</span><span style="color:#61aeee">sql</span><span style="color:#999999">(</span><span style="color:#669900">"select area,memberType,product,sum(price) as total from orderTempTable group by area,memberType,product with cube"</span><span style="color:#999999">)</span>
cubeDF <span style="color:#999999">.</span><span style="color:#61aeee">registerTempTable</span><span style="color:#999999">(</span><span style="color:#669900">"resultTempTable"</span><span style="color:#999999">)</span>
<span style="color:#5c6370">//下面SQL执行的结果就是所有订单的总数</span>
sqlContext<span style="color:#999999">.</span><span style="color:#61aeee">sql</span><span style="color:#999999">(</span><span style="color:#669900">"select * from resultTempTable where area=null and memberType=null and product =null"</span><span style="color:#999999">)</span></code></span>