Spark-Avro学习3之使用AvroCompression存储AVRO文件

更多Spark学习examples代码请见:https://github.com/xubo245/SparkLearning


1.说明:

使用不同的压缩level对avro数据进行压缩


2.代码:

/**
 * @author xubo
 * @time 20160502
 * ref https://github.com/databricks/spark-avro
 */
package org.apache.spark.avro.learning

import org.apache.spark.sql.SQLContext
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import java.text.SimpleDateFormat
import java.util.Date
/**
 * Avro Compression different level 
 */
object AvroCompression {
  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("AvroCompression").setMaster("local")
    val sc = new SparkContext(conf)
    // import needed for the .avro method to be added
    import com.databricks.spark.avro._

    val sqlContext = new SQLContext(sc)
    sqlContext.setConf("spark.sql.avro.compression.codec", "deflate")
    sqlContext.setConf("spark.sql.avro.deflate.level", "5")
    
    // The Avro records get converted to Spark types, filtered, and
    // then written back out as Avro records
    val df = sqlContext.read
      .format("com.databricks.spark.avro")
      .load("file/data/avro/input/episodes.avro")
    df.show

    val iString = new SimpleDateFormat("yyyyMMddHHmmssSSS").format(new Date())

    df.filter("doctor > 5").write
      .format("com.databricks.spark.avro")
      .save("file/data/avro/output/episodes/AvroCompression" + iString)
    df.filter("doctor > 5").show

  }
}

3.结果:

+--------------------+----------------+------+
|               title|        air_date|doctor|
+--------------------+----------------+------+
|   The Eleventh Hour|    3 April 2010|    11|
|   The Doctor's Wife|     14 May 2011|    11|
| Horror of Fang Rock|3 September 1977|     4|
|  An Unearthly Child|23 November 1963|     1|
|The Mysterious Pl...|6 September 1986|     6|
|                Rose|   26 March 2005|     9|
|The Power of the ...| 5 November 1966|     2|
|          Castrolava|  4 January 1982|     5|
+--------------------+----------------+------+

+--------------------+----------------+------+
|               title|        air_date|doctor|
+--------------------+----------------+------+
|   The Eleventh Hour|    3 April 2010|    11|
|   The Doctor's Wife|     14 May 2011|    11|
|The Mysterious Pl...|6 September 1986|     6|
|                Rose|   26 March 2005|     9|
+--------------------+----------------+------+

4.存储文件:

Objavro.codecdeflateavro.schema�{"type":"record","name":"topLevelRecord","fields":[{"name":"title","type":["string","null"]},{"name":"air_date","type":["string","null"]},{"name":"doctor","type":["int","null"]}]}

379bytes

没修改默认压缩level的为391bytes

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值