SparkSQL的几种输出格式及压缩方式

最新推荐文章于 2023-06-04 13:06:06 发布

机智的大脚猴

最新推荐文章于 2023-06-04 13:06:06 发布

阅读量5.6k

点赞数 2

分类专栏： Spark

本文链接：https://blog.csdn.net/lfish001/article/details/102505710

版权

Spark 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

1、json
默认不压缩
可用压缩格式：none, bzip2, gzip, lz4,snappy ,deflate
2、parquet
默认压缩格式：snappy
可用压缩格式：none, snappy, gzip, lzo

val PARQUET_COMPRESSION = buildConf("spark.sql.parquet.compression.codec")
    .doc("Sets the compression codec used when writing Parquet files. If either `compression` or " +
      "`parquet.compression` is specified in the table-specific options/properties, the " +
      "precedence would be `compression`, `parquet.compression`, " +
      "`spark.sql.parquet.compression.codec`. Acceptable values include: none, uncompressed, " +
      "snappy, gzip, lzo, brotli, lz4, zstd.")
    .stringConf
    .transform(_.toLowerCase(Locale.ROOT))
    .checkValues(Set("none", "uncompressed", "snappy", "gzip", "lzo", "lz4", "brotli", "zstd"))
    .createWithDefault("snappy")

3、orc
默认压缩格式：snappy
可用压缩格式：none, snappy, zlib, lzo

  val ORC_COMPRESSION = buildConf("spark.sql.orc.compression.codec")
    .doc("Sets the compression codec used when writing ORC files. If either `compression` or " +
      "`orc.compress` is specified in the table-specific options/properties, the precedence " +
      "would be `compression`, `orc.compress`, `spark.sql.orc.compression.codec`." +
      "Acceptable values include: none, uncompressed, snappy, zlib, lzo.")
    .stringConf
    .transform(_.toLowerCase(Locale.ROOT))
    .checkValues(Set("none", "uncompressed", "snappy", "zlib", "lzo"))
    .createWithDefault("snappy")

4、text
默认不压缩
可用压缩格式：none, bzip2, gzip, lz4, snappy , deflate

机智的大脚猴

关注

2
点赞
踩
11

收藏

觉得还不错? 一键收藏
1
评论
SparkSQL的几种输出格式及压缩方式

1、json默认不压缩可用压缩格式：none, bzip2, gzip, lz4,snappy ,deflate2、parquet默认压缩格式：snappy可用压缩格式：none, snappy, gzip, lzoval PARQUET_COMPRESSION = buildConf("spark.sql.parquet.compression.codec") .doc("...
复制链接

扫一扫

专栏目录