Listagg
有以下数据(设定为表t,字段分别为a,b
a abc 6
a acd 2
a af 3
b bb 4
b bd 5
b be 1
a abc 6
a acd 2
a af 3
b bb 4
b bd 5
b be 1
针对sql select a,listagg(b,'|') within group (order by c ) aa from t group by a那么用spark实现
主要思路为排序及分组拼接。
def listagg(sc: SparkContext) {
val rd1 = sc.textFile("D:\\work\\data\\t2.txt",2)
val k2 = rd1.map { x =>
val c = x.split("\t")
(c(0), ( c(1),c(2).toInt))
}
val rd1 = sc.textFile("D:\\work\\data\\t2.txt",2)
val k2 = rd1.map { x =>
val c = x.split("\t")
(c(0), ( c(1),c(2).toInt))
}
val k3 = k2.sortBy(x => (x._1, x._2._2), true) //按第一列和第三列排序 升序
val zeroValue = "" //初始字符串
val seqOp = (u: String, v: ( String,Int)) => { u.+("|").+(v._1) } //拼接字符串 中间加|
val compOp = (u: String, v: String) => { u.+(v) } //拼接字符串
val vdd3 = k3.aggregateByKey(zeroValue)(seqOp, compOp)
vdd3.collect().foreach(println(_))
val zeroValue = "" //初始字符串
val seqOp = (u: String, v: ( String,Int)) => { u.+("|").+(v._1) } //拼接字符串 中间加|
val compOp = (u: String, v: String) => { u.+(v) } //拼接字符串
val vdd3 = k3.aggregateByKey(zeroValue)(seqOp, compOp)
vdd3.collect().foreach(println(_))
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/134308/viewspace-2090244/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/134308/viewspace-2090244/