iceberg系列（1）：存储详解-初探1

九剑问天

已于 2022-02-17 10:14:31 修改

阅读量1.8k

点赞数

分类专栏：数据湖大数据文章标签： hive spark iceberg

于 2022-02-17 10:11:25 首次发布

本文链接：https://blog.csdn.net/liliwei0213/article/details/122977379

版权

大数据同时被 2 个专栏收录

26 篇文章 1 订阅

订阅专栏

数据湖

10 篇文章 1 订阅

订阅专栏

Iceberg是数据湖热门组件之一，本系列文章将深入探究一二。
首先将研究iceberg底层存储。

1、启动本地的Spark

./bin/spark-sql \
  --packages org.apache.iceberg:iceberg-spark3-runtime:0.12.1 \
  --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
  --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \
  --conf spark.sql.catalog.spark_catalog.type=hive \
  --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \
  --conf spark.sql.catalog.local.type=hadoop \
  --conf spark.sql.catalog.local.warehouse=$PWD/warehouse

分别使用v1 v2两种格式创建表
使用format-version 1创建表table

CREATE TABLE local.db.table (id bigint, data string) USING iceberg;

打开目录，其结构如下：

(base) ➜ table ll -R
total 0
drwxr-xr-x  6 liliwei  staff   192B Jan  2 21:22 metadata

./metadata:
total 16
-rw-r--r--@ 1 liliwei  staff   1.2K Jan  2 21:22 v1.metadata.json
-rw-r--r--@ 1 liliwei  staff     1B Jan  2 21:22 version-hint.text
(base) ➜ table

查看v1.metadata.json，内容如下：

{
  "format-version" : 1,
  "table-uuid" : "0dc08d49-ed4d-49bb-8ddf-006e37c65372",
  "location" : "/Users/liliwei/plat/spark-3.1.2-bin-hadoop3.2/warehouse/db/table",
  "last-updated-ms" : 1641129739691,
  "last-column-id" : 2,
  "schema" : {
    "type" : "struct",
    "schema-id" : 0,
    "fields" : [ {
      "id" : 1,
      "name" : "id",
      "required" : false,
      "type" : "long"
    }, {
      "id" : 2,
      "name" : "data",
      "required" : false,
      "type" : "string"
    } ]
  },
  "current-schema-id" : 0,
  "schemas" : [ {
    "type" : "struct",
    "schema-id" : 0,
    "fields" : [ {
      "id" : 1,
      "name" : "id",
      "required" : false,
      "type" : "long"
    }, {
      "id" : 2,
      "name" : "data",
      "required" : false,
      "type" : "string"
    } ]
  } ],
  "partition-spec" : [ ],
  "default-spec-id" : 0,
  "partition-specs" : [ {
    "spec-id" : 0,
    "fields" : [ ]
  } ],
  "last-partition-id" : 999,
  "default-sort-order-id" : 0,
  "sort-orders" : [ {
    "order-id" : 0,
    "fields" : [ ]
  } ],
  "properties" : {
    "owner" : "liliwei"
  },
  "current-snapshot-id" : -1,
  "snapshots" : [ ],
  "snapshot-log" : [ ],
  "metadata-log" : [ ]
}

查看version-hint.text，内容如下：

使用format-version 2创建表tableV2

CREATE TABLE local.db.tableV2 (id bigint, data string) 
USING iceberg
TBLPROPERTIES ('format-version'='2');

tavleV2的目录结构如下：

(base) ➜ tableV2 cd metadata
(base) ➜ metadata ll
total 16
-rw-r--r--  1 liliwei  staff   936B Jan  2 21:38 v1.metadata.json
-rw-r--r--  1 liliwei  staff     1B Jan  2 21:38 version-hint.text
(base) ➜ metadata

v1.metadata.json的内容如下：

{
  "format-version" : 2,
  "table-uuid" : "67b54789-070c-4600-b2ff-3b9a0a774e4a",
  "location" : "/Users/liliwei/plat/spark-3.1.2-bin-hadoop3.2/warehouse/db/tableV2",
  "last-sequence-number" : 0,
  "last-updated-ms" : 1641130714999,
  "last-column-id" : 2,
  "current-schema-id" : 0,
  "schemas" : [ {
    "type" : "struct",
    "schema-id" : 0,
    "fields" : [ {
      "id" : 1,
      "name" : "id",
      "required" : false,
      "type" : "long"
    }, {
      "id" : 2,
      "name" : "data",
      "required" : false,
      "type" : "string"
    } ]
  } ],
  "default-spec-id" : 0,
  "partition-specs" : [ {
    "spec-id" : 0,
    "fields" : [ ]
  } ],
  "last-partition-id" : 999,
  "default-sort-order-id" : 0,
  "sort-orders" : [ {
    "order-id" : 0,
    "fields" : [ ]
  } ],
  "properties" : {
    "owner" : "liliwei"
  },
  "current-snapshot-id" : -1,
  "snapshots" : [ ],
  "snapshot-log" : [ ],
  "metadata-log" : [ ]
}