sparksql将多列组装为JSON数组

spark sql组装JSON数组

原始数据

活动行业个数占比
活动A行业A6030%
活动A行业B2010%
活动A行业C12060%
活动B行业A630%
活动B行业B210%
活动B行业C1260%

结果数据

活动结果
活动A[{“行业”:“行业A”,“个数”:60,“占比”:0.3},{“行业”:“行业B”,“个数”:20,“占比”:0.1},{“行业”:“行业C”,“个数”:120,“占比”:0.6}]
活动B[{“行业”:“行业A”,“个数”:6,“占比”:0.3},{“行业”:“行业B”,“个数”:2,“占比”:0.1},{“行业”:“行业C”,“个数”:12,“占比”:0.6}]

知识积累

  • to_json: 将一个结构体,数组,Map类型的Column转为json
/**
   * Converts a column containing a `StructType`, `ArrayType` or
   * a `MapType` into a JSON string with the specified schema.
   * Throws an exception, in the case of an unsupported type.
   *
   * @param e a column containing a struct, an array or a map.
   *
   * @group collection_funcs
   * @since 2.1.0
   */
  def to_json(e: Column): Column =
    to_json(e, Map.empty[String, String])
  • struct: 将一列或多列转为结构体
/**
   * Creates a new struct column.
   * If the input column is a column in a `DataFrame`, or a derived column expression
   * that is named (i.e. aliased), its name would be retained as the StructField's name,
   * otherwise, the newly generated StructField's name would be auto generated as
   * `col` with a suffix `index + 1`, i.e. col1, col2, col3, ...
   *
   * @group normal_funcs
   * @since 1.4.0
   */
  @scala.annotation.varargs
  def struct(cols: Column*): Column = withExpr { CreateStruct(cols.map(_.expr)) }
  • collect_set:将多行转为一行,返回去重后的数组
/**
   * Aggregate function: returns a set of objects with duplicate elements eliminated.
   *
   * @note The function is non-deterministic because the order of collected results depends
   * on order of rows which may be non-deterministic after a shuffle.
   *
   * @group agg_funcs
   * @since 1.6.0
   */
  def collect_set(e: Column): Column = withAggregateFunction { CollectSet(e.expr) }

实现

  • 原始数据封装一个DataFrame,命名为sourceDF
sourceDF
      .withColumn("json", struct(col("行业"), col("个数"), col("占比")))
      .groupBy("活动")
      .agg(to_json(collect_set(col("json"))).as("res_json"))
      .show()
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值