ParquetWriter的构造函数已被弃用(1.8.1)但不是ParquetWriter本身,您仍然可以通过扩展其中的抽象Builder子类来创建ParquetWriter.
这里有一个来自镶木地板创作者自己的例子ExampleParquetWriter:
public static class Builder extends ParquetWriter.Builder {
private MessageType type = null;
private Map extraMetaData = new HashMap();
private Builder(Path file) {
super(file);
}
public Builder withType(MessageType type) {
this.type = type;
return this;
}
public Builder withExtraMetaData(Map extraMetaData) {
this.extraMetaData = extraMetaData;
return this;
}
@Override
protected Builder self() {
return this;
}
@Override
protected WriteSupport getWriteSupport(Configuration conf) {
return new GroupWriteSupport(type, extraMetaData);
}
}
如果您不想使用Group和GroupWriteSupport(捆绑在Parquet中,但仅作为数据模型实现的示例),您可以使用Avro,Protocol Buffers或Thrift内存数据模型.以下是使用Avro编写Parquet的示例:
try (ParquetWriter writer = AvroParquetWriter
.builder(fileToWrite)
.withSchema(schema)
.withConf(new Configuration())
.withCompressionCodec(CompressionCodecName.SNAPPY)
.build()) {
for (GenericData.Record record : recordsToWrite) {
writer.write(record);
}
}
您将需要这些依赖项:
org.apache.parquet
parquet-avro
1.8.1
org.apache.parquet
parquet-hadoop
1.8.1
完整示例here.