macOS 上使用 parquet-cli
查看 parquet 文件
Windows 可以用:ParquetViewer
安装
brew install parquet-cli
Homebew - parquet-cli
https://formulae.brew.sh/formula/parquet-cli
使用
查看元数据
parquet meta path.parquet
$ parquet meta /Users/.../3.parquet
File path: /Users/.../3.parquet
Created by: parquet-cpp-arrow version 16.1.0
Properties:
pandas: {"index_columns": [{"kind": "range", "name": null, "start": 0, "stop": 1, "step": 1}], "column_indexes": [{"name": null, "field_name": null, "pandas_type": "unicode", "numpy_type": "object", "metadata": {"encoding": "UTF-8"}}], "columns": [{"name": "Alice", "field_name": "Alice", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "Beth", "field_name": "Beth", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "Cecil", "field_name": "Cecil", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}], "creator": {"library": "pyarrow", "version": "16.1.0"}, "pandas_version": "2.2.3"}
ARROW:schema: /4gDAAAQAAAAAAAKAA4ABgAFAAgACgAAAAABBAAQAAAAAAAKAAwAAAAEAAgACgAAAMACAAAEAAAAAQAAAAwAAAAIAAwABAAIAAgAAACYAgAABAAAAIsCAAB7ImluZGV4X2NvbHVtbnMiOiBbeyJraW5kIjogInJhbmdlIiwgIm5hbWUiOiBudWxsLCAic3RhcnQiOiAwLCAic3RvcCI6IDEsICJzdGVwIjogMX1dLCAiY29sdW1uX2luZGV4ZXMiOiBbeyJuYW1lIjogbnVsbCwgImZpZWxkX25hbWUiOiBudWxsLCAicGFuZGFzX3R5cGUiOiAidW5pY29kZSIsICJudW1weV90eXBlIjogIm9iamVjdCIsICJtZXRhZGF0YSI6IHsiZW5jb2RpbmciOiAiVVRGLTgifX1dLCAiY29sdW1ucyI6IFt7Im5hbWUiOiAiQWxpY2UiLCAiZmllbGRfbmFtZSI6ICJBbGljZSIsICJwYW5kYXNfdHlwZSI6ICJ1bmljb2RlIiwgIm51bXB5X3R5cGUiOiAib2JqZWN0IiwgIm1ldGFkYXRhIjogbnVsbH0sIHsibmFtZSI6ICJCZXRoIiwgImZpZWxkX25hbWUiOiAiQmV0aCIsICJwYW5kYXNfdHlwZSI6ICJ1bmljb2RlIiwgIm51bXB5X3R5cGUiOiAib2JqZWN0IiwgIm1ldGFkYXRhIjogbnVsbH0sIHsibmFtZSI6ICJDZWNpbCIsICJmaWVsZF9uYW1lIjogIkNlY2lsIiwgInBhbmRhc190eXBlIjogInVuaWNvZGUiLCAibnVtcHlfdHlwZSI6ICJvYmplY3QiLCAibWV0YWRhdGEiOiBudWxsfV0sICJjcmVhdG9yIjogeyJsaWJyYXJ5IjogInB5YXJyb3ciLCAidmVyc2lvbiI6ICIxNi4xLjAifSwgInBhbmRhc192ZXJzaW9uIjogIjIuMi4zIn0ABgAAAHBhbmRhcwAAAwAAAGwAAAAwAAAABAAAALD///8AAAEFEAAAABgAAAAEAAAAAAAAAAUAAABDZWNpbAAAAKDYAAABBRAAAAAYAAAABAAAAAAAAAAEAAAAQmV0aAAAAADIEAAUAAgABgAHAAwAAAAQABAAAAAAAAEFEAAAABwAAAAEAAAAAAAAAAUAAABBbGljZQAAAAQABAAEAAAA
Schema:
message schema {
optional binary Alice (STRING);
optional binary Beth (STRING);
optional binary Cecil (STRING);
}
Row group 0: count: 1 204.00 B records start: 4 total(compressed): 204 B total(uncompressed):192 B
--------------------------------------------------------------------------------
type encodings count avg size nulls min / max
Alice BINARY S _ R 1 68.00 B 0 "2341" / "2341"
Beth BINARY S _ R 1 68.00 B 0 "9102" / "9102"
Cecil BINARY S _ R 1 68.00 B 0 "3258" / "3258"
head 查看前面数据
parquet head path.parquet
$ parquet head /Users/.../3.parquet
{"Alice": "2341", "Beth": "9102", "Cecil": "3258"}
查看 schema
parquet schema
$ parquet schema /Users/.../3.parquet
{
"type" : "record",
"name" : "schema",
"fields" : [ {
"name" : "Alice",
"type" : [ "null", "string" ],
"default" : null
}, {
"name" : "Beth",
"type" : [ "null", "string" ],
"default" : null
}, {
"name" : "Cecil",
"type" : [ "null", "string" ],
"default" : null
} ]
}
parquet --help
% parquet -h
Usage: parquet [options] [command] [command options]
Options:
-v, --verbose, --debug
Print extra debugging information
Commands:
help
Retrieves details on the functions of other commands
meta
Print a Parquet file's metadata
pages
Print page summaries for a Parquet file
dictionary
Print dictionaries for a Parquet column
check-stats
Check Parquet files for corrupt page and column stats (PARQUET-251)
schema
Print the Avro schema for a file
csv-schema
Build a schema from a CSV data sample
convert-csv
Create a file from CSV data
convert
Create a Parquet file from a data file
to-avro
Create an Avro file from a data file
cat
Print the first N records from a file
head
Print the first N records from a file
column-index
Prints the column and offset indexes of a Parquet file
column-size
Print the column sizes of a parquet file
prune
(Deprecated: will be removed in 2.0.0, use rewrite command instead) Prune column(s) in a Parquet file and save it to a new file. The columns left are not changed.
trans-compression
(Deprecated: will be removed in 2.0.0, use rewrite command instead) Translate the compression from one to another (It doesn't support bloom filter feature yet).
masking
(Deprecated: will be removed in 2.0.0, use rewrite command instead) Replace columns with masked values and write to a new Parquet file
footer
Print the Parquet file footer in json format
bloom-filter
Check bloom filters for a Parquet column
scan
Scan all records from a file
rewrite
Rewrite one or more Parquet files to a new Parquet file
Examples:
# print information for meta
parquet help meta
See 'parquet help <command>' for more information on a specific command.
相关资料:
Apache Parquet : https://parquet.apache.org/
fastparquet : https://github.com/dask/fastparquet
2025-02-14(五)