macOS - 查看 parquet 文件 (parquet-cli)


macOS 上使用 parquet-cli 查看 parquet 文件

Windows 可以用:ParquetViewer


安装

brew install parquet-cli 

Homebew - parquet-cli
https://formulae.brew.sh/formula/parquet-cli


使用

查看元数据

parquet meta path.parquet

$ parquet meta  /Users/.../3.parquet

File path:  /Users/.../3.parquet
Created by: parquet-cpp-arrow version 16.1.0
Properties:
        pandas: {"index_columns": [{"kind": "range", "name": null, "start": 0, "stop": 1, "step": 1}], "column_indexes": [{"name": null, "field_name": null, "pandas_type": "unicode", "numpy_type": "object", "metadata": {"encoding": "UTF-8"}}], "columns": [{"name": "Alice", "field_name": "Alice", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "Beth", "field_name": "Beth", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "Cecil", "field_name": "Cecil", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}], "creator": {"library": "pyarrow", "version": "16.1.0"}, "pandas_version": "2.2.3"}
  ARROW:schema: /4gDAAAQAAAAAAAKAA4ABgAFAAgACgAAAAABBAAQAAAAAAAKAAwAAAAEAAgACgAAAMACAAAEAAAAAQAAAAwAAAAIAAwABAAIAAgAAACYAgAABAAAAIsCAAB7ImluZGV4X2NvbHVtbnMiOiBbeyJraW5kIjogInJhbmdlIiwgIm5hbWUiOiBudWxsLCAic3RhcnQiOiAwLCAic3RvcCI6IDEsICJzdGVwIjogMX1dLCAiY29sdW1uX2luZGV4ZXMiOiBbeyJuYW1lIjogbnVsbCwgImZpZWxkX25hbWUiOiBudWxsLCAicGFuZGFzX3R5cGUiOiAidW5pY29kZSIsICJudW1weV90eXBlIjogIm9iamVjdCIsICJtZXRhZGF0YSI6IHsiZW5jb2RpbmciOiAiVVRGLTgifX1dLCAiY29sdW1ucyI6IFt7Im5hbWUiOiAiQWxpY2UiLCAiZmllbGRfbmFtZSI6ICJBbGljZSIsICJwYW5kYXNfdHlwZSI6ICJ1bmljb2RlIiwgIm51bXB5X3R5cGUiOiAib2JqZWN0IiwgIm1ldGFkYXRhIjogbnVsbH0sIHsibmFtZSI6ICJCZXRoIiwgImZpZWxkX25hbWUiOiAiQmV0aCIsICJwYW5kYXNfdHlwZSI6ICJ1bmljb2RlIiwgIm51bXB5X3R5cGUiOiAib2JqZWN0IiwgIm1ldGFkYXRhIjogbnVsbH0sIHsibmFtZSI6ICJDZWNpbCIsICJmaWVsZF9uYW1lIjogIkNlY2lsIiwgInBhbmRhc190eXBlIjogInVuaWNvZGUiLCAibnVtcHlfdHlwZSI6ICJvYmplY3QiLCAibWV0YWRhdGEiOiBudWxsfV0sICJjcmVhdG9yIjogeyJsaWJyYXJ5IjogInB5YXJyb3ciLCAidmVyc2lvbiI6ICIxNi4xLjAifSwgInBhbmRhc192ZXJzaW9uIjogIjIuMi4zIn0ABgAAAHBhbmRhcwAAAwAAAGwAAAAwAAAABAAAALD///8AAAEFEAAAABgAAAAEAAAAAAAAAAUAAABDZWNpbAAAAKDYAAABBRAAAAAYAAAABAAAAAAAAAAEAAAAQmV0aAAAAADIEAAUAAgABgAHAAwAAAAQABAAAAAAAAEFEAAAABwAAAAEAAAAAAAAAAUAAABBbGljZQAAAAQABAAEAAAA
Schema:
message schema {
  optional binary Alice (STRING);
  optional binary Beth (STRING);
  optional binary Cecil (STRING);
}


Row group 0:  count: 1  204.00 B records  start: 4  total(compressed): 204 B total(uncompressed):192 B 
--------------------------------------------------------------------------------
       type      encodings count     avg size   nulls   min / max
Alice  BINARY    S _ R     1         68.00 B    0       "2341" / "2341"
Beth   BINARY    S _ R     1         68.00 B    0       "9102" / "9102"
Cecil  BINARY    S _ R     1         68.00 B    0       "3258" / "3258"

head 查看前面数据

parquet head path.parquet

$ parquet head  /Users/.../3.parquet
{"Alice": "2341", "Beth": "9102", "Cecil": "3258"}

查看 schema

parquet schema 

$ parquet schema /Users/.../3.parquet
{
  "type" : "record",
  "name" : "schema",
  "fields" : [ {
    "name" : "Alice",
    "type" : [ "null", "string" ],
    "default" : null
  }, {
    "name" : "Beth",
    "type" : [ "null", "string" ],
    "default" : null
  }, {
    "name" : "Cecil",
    "type" : [ "null", "string" ],
    "default" : null
  } ]
}

parquet --help

% parquet -h

Usage: parquet [options] [command] [command options]

  Options:

    -v, --verbose, --debug
	Print extra debugging information

  Commands:

    help
	Retrieves details on the functions of other commands
    meta
	Print a Parquet file's metadata
    pages
	Print page summaries for a Parquet file
    dictionary
	Print dictionaries for a Parquet column
    check-stats
	Check Parquet files for corrupt page and column stats (PARQUET-251)
    schema
	Print the Avro schema for a file
    csv-schema
	Build a schema from a CSV data sample
    convert-csv
	Create a file from CSV data
    convert
	Create a Parquet file from a data file
    to-avro
	Create an Avro file from a data file
    cat
	Print the first N records from a file
    head
	Print the first N records from a file
    column-index
	Prints the column and offset indexes of a Parquet file
    column-size
	Print the column sizes of a parquet file
    prune
	(Deprecated: will be removed in 2.0.0, use rewrite command instead) Prune column(s) in a Parquet file and save it to a new file. The columns left are not changed.
    trans-compression
	(Deprecated: will be removed in 2.0.0, use rewrite command instead) Translate the compression from one to another (It doesn't support bloom filter feature yet).
    masking
	(Deprecated: will be removed in 2.0.0, use rewrite command instead) Replace columns with masked values and write to a new Parquet file
    footer
	Print the Parquet file footer in json format
    bloom-filter
	Check bloom filters for a Parquet column
    scan
	Scan all records from a file
    rewrite
	Rewrite one or more Parquet files to a new Parquet file

  Examples:

    # print information for meta
    parquet help meta

  See 'parquet help <command>' for more information on a specific command.

相关资料:
Apache Parquet : https://parquet.apache.org/
fastparquet : https://github.com/dask/fastparquet


2025-02-14(五)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

编程乐园

请我喝杯伯爵奶茶~!

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值