spark使用insertInto存入hive分区表中

把spark的处理结果存入hive分区表中,可以直接在sql中设定分区即可,可以使用withColumn算子执行

    ss.sql("SELECT merchant_id,platform," +
      "case when trim(first_channel_id) = '' or first_channel_id is null then '-1' else first_channel_id end as channel_id," +
      "is_new," +
      "0 as language_id," +
      "'all' as country_code," +
      "count(1) as pv," +
      "sum(case when page_type = 'Productpage' then 1 else 0 end) as ppv," +
      "count(distinct cookie) as uv," +
      "count(distinct case when is_bounce = 1 then cookie end) as bounce_uv," +
      "count(distinct case when page_type = 'Shoppingcart' then cookie end) as shoppingcart_uv," +
      "null as application_type_id," +
      s"count(case when hour = '$per_hour' then 1 end) as inc_pv," +
      s"sum(case when hour = '$per_hour' and page_type = 'Productpage' then 1 else 0 end) as inc_ppv," +
      s"count(distinct case when hour = '$per_hour' then cookie end) as inc_uv," +
      s"count(distinct case when hour = '$per_hour' and is_bounce = 1 then cookie end) as inc_bounce_uv," +
      "count(distinct case when page_type = 'Productpage' or page_type = 'Categorypage' then cookie end) as product_category_uv " +
      "FROM tmp_traff " +
      "WHERE first_channel_id rlike '^\\\\d+$' " +
      "GROUP BY merchant_id,platform," +
      "case when trim(first_channel_id) = '' or first_channel_id is null then '-1' else first_channel_id end," +
      "is_new")
	  //dt、hour和merchant是分区字段
      .withColumn("dt", lit(s"$dt")).withColumn("hour", lit(s"$per_hour")).withColumn("merchant", lit(s"$merchant"))
      .repartition(1)
	  //直接使用SaveMode实现即可
      .write.mode(SaveMode.Overwrite).format("hive").insertInto("table_name")

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值