/tmp/dj/20170622.1498060818603为json数据
将数据压缩存储成parquet
val logs = spark.read.json("/tmp/dj/20170622.1498060818603")
//logs.coalesce(2).write.option("compression","gzip").json("/tmp/dj/json2")
logs.coalesce(2).write.parquet("/tmp/dj/parquet2")
读取parquet文件
val logs1 = spark.read.parquet("/tmp/dj/parquet2/*")
//now logs1 is DataFrame with some fields of previous json field