2019-01-07,周一,阴
今天准备做的是将某parquet文件
id: 9
name: 9@asdf
age: 9
group1
test1: test1
test2: test2
读成如下的表。
id name age group1.test1 group1.test2
9 9@asdf 9 test1 test2
目前只实现了根数据的读取,group内的数据还不知道怎么取出,一读就错。
代码如下:
package cn.edu.nju.zyf.parquetDemo01;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.log4j.Logger;
import org.apache.parquet.column.ColumnDescriptor;
import org.apache.parquet.example.data.Group;
import org.apache.parquet.example.data.simple.SimpleGroupFactory;
import org.apache.parquet.hadoop.ParquetFileReader;
import org.apache.parquet.hadoop.ParquetFileWriter;
import org.apache.parquet.hadoop.ParquetReader;
import org.apache.parquet.hadoop.ParquetWriter;
import org.apache.parquet.hadoop.example.GroupReadSupport;
import org.apache.parquet.hadoop.example.GroupWriteSupport;
import org.apache.parquet.hadoop.metadata.CompressionCodecName;
import org.apache.parquet.hadoop.metadata.ParquetMetadata;
import org.apache.parquet.schema.*;
import java.io.IOException;
/**
* @author zhuyuanfu
* @version 2018-01-03
* @description just a simple demo for writing and reading parquet files.
*/
public class ParquetWriteAndPrettyPrintingReadDemo {
private static MessageType getMessageTypeFromCode (){
MessageType messageType = Types.buildMessage()
.required(PrimitiveType.PrimitiveTypeName.BINARY).as(OriginalType.UTF8).named("id")
.required(PrimitiveType.PrimitiveTypeName.