Trafodion执行一个parquet_tools可执行程序,用于检查parquet文件是否正常。
parquet_tools存储在目录$TRFA_HOME/sql/scripts下,
cd $TRAF_HOME/sql/scripts/
ll parquet_tools
parquet_tools可执行文件依赖于parquet-tools-${PARQUET_VERSION}.jar,这可以通过查看parquet_tools的内容知道,
${lv_cmd} jar ${TRAF_HOME}/export/lib/parquet-tools-${PARQUET_VERSION}.jar $*
关于parquet_tools的用法,
parquet_tools -h
usage: parquet-tools cat [option...] <input>
where option is one of:
--debug Enable debug output
-h,--help Show this help string
-j,--json Show records in JSON format.
--no-color Disable color output even if supported
where <input> is the parquet file to print to stdout
usage: parquet-tools head [option...] <input>
where option is one of:
--debug Enable debug output
-h,--help Show this help string
-n,--records <arg> The number of records to show (default: 5)
--no-color Disable color output even if supported
where <input> is the parquet file to print to stdout
usage: parquet-tools schema [option...] <input>
where option is one of:
-d,--detailed Show detailed information about the schema.
--debug Enable debug output
-h,--help Show this help string
--no-color Disable color output even if supported
where <input> is the parquet file containing the schema to show
usage: parquet-tools meta [option...] <input>
where option is one of:
--debug Enable debug output
-h,--help Show this help string
--no-color Disable color output even if supported
where <input> is the parquet file to print to stdout
usage: parquet-tools dump [option...] <input>
where option is one of:
-c,--column <arg> Dump only the given column, can be specified more than
once
-d,--disable-data Do not dump column data
--debug Enable debug output
-h,--help Show this help string
-m,--disable-meta Do not dump row group and page metadata
-n,--disable-crop Do not crop the output based on console width
--no-color Disable color output even if supported
where <input> is the parquet file to print to stdout
usage: parquet-tools merge [option...] <input> [<input> ...] <output>
where option is one of:
--debug Enable debug output
-h,--help Show this help string
--no-color Disable color output even if supported
where <input> is the source parquet files/directory to be merged
<output> is the destination parquet file
以下是一些基本示例,
//查看parquet文件中字段DEVICE_NUMBER的dump信息
parquet_tools dump -c DEVICE_NUMBER -d /opt/trafodion/bss_userinfo_20180812_0
//查看parquet文件的dump信息
parquet_tools dump -d /opt/trafodion/bss_userinfo_20180812_0
//查看parquet文件的前10行内容
parquet_tools head -n 10 /opt/trafodion/bss_userinfo_20180812_0
//查看parquet文件的meta信息
parquet_tools meta /opt/trafodion/bss_userinfo_20180812_0
//查看parquet文件的schema信息
parquet_tools schema /opt/trafodion/bss_userinfo_20180812_0