rcfile java_Java RCFile.Reader方法代碼示例

最新推荐文章于 2021-03-13 07:49:26 发布

晓风轻

最新推荐文章于 2021-03-13 07:49:26 发布

阅读量200

点赞数

文章标签： rcfile java

本文链接：https://blog.csdn.net/weixin_35260153/article/details/114237164

版权

本文整理匯總了Java中org.apache.hadoop.hive.ql.io.RCFile.Reader方法的典型用法代碼示例。如果您正苦於以下問題：Java RCFile.Reader方法的具體用法？Java RCFile.Reader怎麽用？Java RCFile.Reader使用的例子？那麽恭喜您, 這裏精選的方法代碼示例或許可以為您提供幫助。您也可以進一步了解該方法所在類org.apache.hadoop.hive.ql.io.RCFile的用法示例。

在下文中一共展示了RCFile.Reader方法的5個代碼示例，這些例子默認根據受歡迎程度排序。您可以為喜歡或者感覺有用的代碼點讚，您的評價將有助於我們的係統推薦出更棒的Java代碼示例。

示例1: getSampleData

點讚 3

import org.apache.hadoop.hive.ql.io.RCFile; //導入方法依賴的package包/類

@Override

public SampleDataRecord getSampleData(Path path) throws IOException {

SampleDataRecord sampleDataRecord = null;

List sampleData = null;

if (!fs.exists(path))

LOG.error(" File Path: " + path.toUri().getPath() + " is not exist in HDFS");

else {

try {

RCFile.Reader reader = new RCFile.Reader(fs, path, fs.getConf());

sampleData = getSampleData(reader);

sampleDataRecord = new SampleDataRecord(path.toUri().getPath(), sampleData);

} catch (Exception e) {

LOG.error("path : {} content " + " is not RC File format content ", path.toUri().getPath());

LOG.info(e.getStackTrace().toString());

}

return sampleDataRecord;

}

開發者ID:thomas-young-2013，項目名稱:wherehowsX，代碼行數:19，

示例2: getSchema

點讚 2

import org.apache.hadoop.hive.ql.io.RCFile; //導入方法依賴的package包/類

@Override

public DatasetJsonRecord getSchema(Path path) throws IOException {

DatasetJsonRecord record = null;

if (!fs.exists(path))

LOG.error("file path : {} not in hdfs", path);

else {

try {

RCFile.Reader reader = new RCFile.Reader(fs, path, fs.getConf());

Map meta = reader.getMetadata().getMetadata();

/** rcfile column number */

int columnNumber = Integer.parseInt(meta.get(new Text(COLUMN_NUMBER_KEY)).toString());

FileStatus status = fs.getFileStatus(path);

String schemaString = getRCFileSchema(columnNumber);

String storage = STORAGE_TYPE;

String abstractPath = path.toUri().getPath();

String codec = "rc.codec";

record = new DatasetJsonRecord(schemaString, abstractPath, status.getModificationTime(), status.getOwner(), status.getGroup(),

status.getPermission().toString(), codec, storage, "");

LOG.info("rc file : {} schema is {}", path.toUri().getPath(), schemaString);

} catch (Exception e) {

LOG.error("path : {} content " + " is not RC File format content ", path.toUri().getPath());

LOG.info(e.getStackTrace().toString());

}

return record;

}

開發者ID:thomas-young-2013，項目名稱:wherehowsX，代碼行數:28，

示例3: doProcess

點讚 2

import org.apache.hadoop.hive.ql.io.RCFile; //導入方法依賴的package包/類

@Override

protected boolean doProcess(Record record, InputStream in) throws IOException {

Path attachmentPath = getAttachmentPath(record);

SingleStreamFileSystem fs = new SingleStreamFileSystem(in, attachmentPath);

RCFile.Reader reader = null;

try {

reader = new RCFile.Reader(fs, attachmentPath, conf);

Record template = record.copy();

removeAttachments(template);

template.put(Fields.ATTACHMENT_MIME_TYPE, OUTPUT_MEDIA_TYPE);

if (includeMetaData) {

SequenceFile.Metadata metadata = reader.getMetadata();

if (metadata != null) {

template.put(RC_FILE_META_DATA, metadata);

}

switch (readMode) {

case row:

return readRowWise(reader, template);

case column:

return readColumnWise(reader, template);

default :

throw new IllegalStateException();

}

} catch (IOException e) {

throw new MorphlineRuntimeException("IOException while processing attachment "

+ attachmentPath.getName(), e);

} finally {

if (reader != null) {

reader.close();

}

開發者ID:cloudera，項目名稱:cdk，代碼行數:35，

示例4: readRowWise

點讚 2

import org.apache.hadoop.hive.ql.io.RCFile; //導入方法依賴的package包/類

private boolean readRowWise(final RCFile.Reader reader, final Record record)

throws IOException {

LongWritable rowID = new LongWritable();

while (true) {

boolean next;

try {

next = reader.next(rowID);

} catch (EOFException ex) {

// We have hit EOF of the stream

break;

}

if (!next) {

break;

}

incrementNumRecords();

Record outputRecord = record.copy();

BytesRefArrayWritable rowBatchBytes = new BytesRefArrayWritable();

rowBatchBytes.resetValid(columns.size());

reader.getCurrentRow(rowBatchBytes);

// Read all the columns configured and set it in the output record

for (RCFileColumn rcColumn : columns) {

BytesRefWritable columnBytes = rowBatchBytes.get(rcColumn.getInputField());

outputRecord.put(rcColumn.getOutputField(), updateColumnValue(rcColumn, columnBytes));

}

// pass record to next command in chain:

if (!getChild().process(outputRecord)) {

return false;

}

return true;

}

開發者ID:cloudera，項目名稱:cdk，代碼行數:38，

示例5: readColumnWise

點讚 2

import org.apache.hadoop.hive.ql.io.RCFile; //導入方法依賴的package包/類

private boolean readColumnWise(RCFile.Reader reader, Record record) throws IOException {

for (RCFileColumn rcColumn : columns) {

reader.sync(0);

reader.resetBuffer();

while (true) {

boolean next;

try {

next = reader.nextBlock();

} catch (EOFException ex) {

// We have hit EOF of the stream

break;

}

if (!next) {

break;

}

BytesRefArrayWritable rowBatchBytes = reader.getColumn(rcColumn.getInputField(), null);

for (int rowIndex = 0; rowIndex < rowBatchBytes.size(); rowIndex++) {

incrementNumRecords();

Record outputRecord = record.copy();

BytesRefWritable rowBytes = rowBatchBytes.get(rowIndex);

outputRecord.put(rcColumn.getOutputField(), updateColumnValue(rcColumn, rowBytes));

// pass record to next command in chain:

if (!getChild().process(outputRecord)) {

return false;

}

return true;

}

開發者ID:cloudera，項目名稱:cdk，代碼行數:35，

注：本文中的org.apache.hadoop.hive.ql.io.RCFile.Reader方法示例整理自Github/MSDocs等源碼及文檔管理平台，相關代碼片段篩選自各路編程大神貢獻的開源項目，源碼版權歸原作者所有，傳播和使用請參考對應項目的License；未經允許，請勿轉載。

晓风轻

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫