spark Json日志分析

1593136280858|{“cm”:{“ln”:"-55.0",“sv”:“V2.9.6”,“os”:“8.0.4”,“g”:“C6816QZ0@gmail.com”,“mid”:“489”,“nw”:“3G”,“l”:“es”,“vc”:“4”,“hw”:“640*960”,“ar”:“MX”,“uid”:“489”,“t”:“1593123253541”,“la”:“5.2”,“md”:“sumsung-18”,“vn”:“1.3.4”,“ba”:“Sumsung”,“sr”:“I”},“ap”:“app”,“et”:[{“ett”:“1593050051366”,“en”:“loading”,“kv”:{“extend2”:"",“loading_time”:“14”,“action”:“3”,“extend1”:"",“type”:“2”,“type1”:“201”,“loading_way”:“1”}},{“ett”:“1593108791764”,“en”:“ad”,“kv”:{“activityId”:“1”,“displayMills”:“78522”,“entry”:“1”,“action”:“1”,“contentType”:“0”}},{“ett”:“1593111271266”,“en”:“notification”,“kv”:{“ap_time”:“1593097087883”,“action”:“1”,“type”:“1”,“content”:""}},{“ett”:“1593066033562”,“en”:“active_background”,“kv”:{“active_source”:“3”}},{“ett”:“1593135644347”,“en”:“comment”,“kv”:{“p_comment_id”:1,“addtime”:“1593097573725”,“praise_count”:973,“other_id”:5,“comment_id”:9,“reply_count”:40,“userid”:7,“content”:“辑赤蹲慰鸽抿肘捎”}}]}

1.从hdfs上读取数据

val fileRdd=sc.textFile("hdfs://192.168.136.10:9000/spark/Log/op.log")

2.数据分割,转换成dataFrame。Id字段加入到最后

val jsonStrRdd=fileRdd.map(x=>x.split('|')).map(x=>(x(0),x(1)))
val rdd=jsonStrRdd.map(x=>{varjsonStr=x._2;jsonStr=jsonStr.substring(0,jsonStr.length-1);jsonStr+",\"id\":\""+x._1+"\"}"})
val jsonDF=rdd.toDF

3.导入包

import spark.implicits._
import org.apache.spark.sql._
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._

4.对value值,按照cm、ap、et、id分成不同的列

val jsonDF2=jsonDF.select(get_json_object($"value","$.cm").as("cm"),
get_json_object($"value","$.ap").as("ap"),
get_json_object($"value","$.et").as("et"),
get_json_object($"value","$.id").as("id"))

5.对cm按照不同的键名进一步拆分

val jsonDF3=jsonDF2.select(
$"id",$"ap",get_json_object($"cm","$.ln").alias("ln"),
get_json_object($"cm","$.sv").alias("sv"),
get_json_object($"cm","$.os").alias("os"),
get_json_object($"cm","$.g").alias("g"),
get_json_object($"cm","$.mid").alias("mid"),
get_json_object($"cm","$.nw").alias("nw"),
get_json_object($"cm","$.l").alias("l"),
get_json_object($"cm","$.vc").alias("vc"),
get_json_object($"cm","$.hw").alias("hw"),
get_json_object($"cm","$.ar").alias("ar"),
get_json_object($"cm","$.uid").alias("uid"),
get_json_object($"cm","$.t").alias("t"),
get_json_object($"cm","$.la").alias("la"),
get_json_object($"cm","$.md").alias("md"),
get_json_object($"cm","$.vn").alias("vn"),
get_json_object($"cm","$.ba").alias("ba"),
get_json_object($"cm","$.sr").alias("sr")
,$"et")

在这里插入图片描述

6.对et列中的数据做进一步拆分
把数组每个元素提取出来;

val jsonDF4=jsonDF3.select($"id", $"ap",$"sv",$"os",$"g",$"mid",
$"nw",$"l",$"vc",$"hw",$"ar",$"uid",$"t",$"la",$"md",$"vn",$"ba",$"sr",
from_json($"et",ArrayType(StructType(
StructField("ett", StringType)::
StructField("en", StringType)::
StructField("kv", StringType)::Nil
))).as("event"))val jsonDF5=jsonDF4.select($"id", $"ap",$"sv",$"os",
$"g",$"mid",$"nw",$"l",$"vc",$"hw",$"ar",$"uid",$"t",$"la",
$"md",$"vn",$"ba",$"sr", explode($"event").alias("event"))

在这里插入图片描述
未explode之前:
在这里插入图片描述

7.对event进行拆分成不同的列

val jsonDF6=jsonDF5.select($"id", $"ap",$"sv",$"os",$"g",$"mid"
,$"nw",$"l",$"vc",$"hw",$"ar",$"uid",$"t",$"la",$"md",$"vn",$"ba",$"sr",$"event.ett",$"event.en",$"event.kv")

8.按照en的类型,拆分kv,并分别存到不同的表中

val jsonDF7=jsonDF6.filter($"en"==="loading")
val jsonDF8=jsonDF6.filter($"en"==="ad")
val jsonDF9=jsonDF6.filter($"en"==="notification")
val jsonDF10=jsonDF6.filter($"en"==="active_background")
val jsonDF11=jsonDF6.filter($"en"==="comment")

val LodingDF=jsonDF7.select(
$"id", $"ap",$"sv",$"os",$"g",$"mid",$"nw",$"l",$"vc",$"hw",$"ar",$"uid",$"t",$"la",$"md",$"vn",$"ba",$"sr",
get_json_object($"kv","$.extend2").as("extend2"),
get_json_object($"kv","$.loading_time").as("loading_time"),
get_json_object($"kv","$.action").as("action"),
get_json_object($"kv","$.extend1").as("extend1"),
get_json_object($"kv","$.type").as("type"),
get_json_object($"kv","$.type1").as("type1"),
get_json_object($"kv","$.loading_way").as("loading_way")
)


val adDF=jsonDF8.select(
$"id", $"ap",$"sv",$"os",$"g",$"mid",$"nw",$"l",$"vc",$"hw",$"ar",$"uid",$"t",$"la",$"md",$"vn",$"ba",$"sr",
get_json_object($"kv","$.activityId").as("activityId"),
get_json_object($"kv","$.displayMills").as("displayMills"),
get_json_object($"kv","$.entry").as("entry"),
get_json_object($"kv","$.action").as("action"),
get_json_object($"kv","$.contentType").as("contentType")
)

val notificationDF=jsonDF9.select(
$"id", $"ap",$"sv",$"os",$"g",$"mid",$"nw",$"l",$"vc",$"hw",$"ar",$"uid",$"t",$"la",$"md",$"vn",$"ba",$"sr",
get_json_object($"kv","$.ap_time").as("ap_time"),
get_json_object($"kv","$.action").as("action"),
get_json_object($"kv","$.type").as("type"),
get_json_object($"kv","$.content").as("content")
)

val activeDF=jsonDF10.select(
$"id", $"ap",$"sv",$"os",$"g",$"mid",$"nw",$"l",$"vc",$"hw",$"ar",$"uid",$"t",$"la",$"md",$"vn",$"ba",$"sr",
get_json_object($"kv","$.active_source").as("active_source")
)
val commentDF=jsonDF11.select(
$"id", $"ap",$"sv",$"os",$"g",$"mid",$"nw",$"l",$"vc",$"hw",$"ar",$"uid",$"t",$"la",$"md",$"vn",$"ba",$"sr",
get_json_object($"kv","$.p_comment_id").as("p_comment_id"),
get_json_object($"kv","$.addtime").as("addtime"),
get_json_object($"kv","$.praise_count").as("praise_count"),
get_json_object($"kv","$.other_id").as("other_id"),
get_json_object($"kv","$.comment_id").as("comment_id"),
get_json_object($"kv","$.reply_count").as("reply_count"),
get_json_object($"kv","$.userid").as("userid"),
get_json_object($"kv","$.content").as("content")
)

9.上传到Hive

//导入包:
import org.apache.spark.sql.hive.HiveContext
//给每个dataFrame创建相应的视图:
adDF.createTempView("ad")
notificationDF.createTempView("notification")
activeDF.createTempView("active")
commentDF.createTempView("comment")
//加载到Hive:(默认库)
new HiveContext(sc).sql("create table Loding as select * from Loding")
new HiveContext(sc).sql("create table ad as select * from ad")
new HiveContext(sc).sql("create table notification as select * from notification")
new HiveContext(sc).sql("create table active as select * from active")
new HiveContext(sc).sql("create table comment as select * from comment")
//另一种存储方法,更为简洁
adDF.write.mode(SaveMode.Overwrite).saveAsTable("load")
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值