java读取orc文件字段对不上_java – 为什么Apache Orc RecordReader.searchArgument()没有正确过滤?...

这是一个简单的程序:

>将记录写入Orc文件

>然后尝试使用谓词下推(searchArgument)读取文件

问题:

>这是在Orc中使用谓词下推的正确方法吗?

> read(..)方法似乎返回所有记录,完全忽略了searchArguments.这是为什么?

笔记:

我无法找到任何有用的单元测试来演示谓词下推如何在Orc(Orc on GitHub)中工作.我也无法找到有关此功能的任何明确文档.尝试查看Spark和Presto代码,但我找不到任何有用的东西.

public class TestRoundTrip {

public static void main(String[] args) throws IOException {

final String file = "tmp/test-round-trip.orc";

new File(file).delete();

final long highestX = 10000L;

final Configuration conf = new Configuration();

write(file, highestX, conf);

read(file, highestX, conf);

}

private static void read(String file, long highestX, Configuration conf) throws IOException {

Reader reader = OrcFile.createReader(

new Path(file),

OrcFile.readerOptions(conf)

);

//Retrieve x that is "highestX - 1000". So, only 1 value should've been retrieved.

Options readerOptions = new Options(conf)

.searchArgument(

SearchArgumentFactory

.newBuilder()

.equals("x", Type.LONG, highestX - 1000)

.build(),

new String[]{"x"}

);

RecordReader rows = reader.rows(readerOptions);

VectorizedRowBatch batch = reader.getSchema().createRowBatch();

while (rows.nextBatch(batch)) {

LongColumnVector x = (LongColumnVector) batch.cols[0];

LongColumnVector y = (LongColumnVector) batch.cols[1];

for (int r = 0; r < batch.size; r++) {

long xValue = x.vector[r];

long yValue = y.vector[r];

System.out.println(xValue + ", " + yValue);

}

}

rows.close();

}

private static void write(String file, long highestX, Configuration conf) throws IOException {

TypeDescription schema = TypeDescription.fromString("struct");

Writer writer = OrcFile.createWriter(

new Path(file),

OrcFile.writerOptions(conf).setSchema(schema)

);

VectorizedRowBatch batch = schema.createRowBatch();

LongColumnVector x = (LongColumnVector) batch.cols[0];

LongColumnVector y = (LongColumnVector) batch.cols[1];

for (int r = 0; r < highestX; ++r) {

int row = batch.size++;

x.vector[row] = r;

y.vector[row] = r * 3;

// If the batch is full, write it out and start over.

if (batch.size == batch.getMaxSize()) {

writer.addRowBatch(batch);

batch.reset();

}

}

if (batch.size != 0) {

writer.addRowBatch(batch);

batch.reset();

}

writer.close();

}

}

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值