java对象转文件,如何将Hadoop Path对象转换为Java文件对象

Is there a way to change a valid and existing Hadoop Path object into a useful Java File object. Is there a nice way of doing this or do I need to bludgeon to code into submission? The more obvious approaches don't work, and it seems like it would be a common bit of code

void func(Path p) {

if (p.isAbsolute()) {

File f = new File(p.toURI());

}

}

This doesn't work because Path::toURI() returns the "hdfs" identifier and Java's File(URI uri) constructor only recognizes the "file" identifier.

Is there a way to get Path and File to work together?

**

Ok, how about a specific limited example.

Path[] paths = DistributedCache.getLocalCacheFiles(job);

DistributedCache is supposed to provide a localized copy of a file, but it returns a Path. I assume that DistributedCache make a local copy of the file, where they are on the same disk. Given this limited example, where hdfs is hopefully not in the equation, is there a way for me to reliably convert a Path into a File?

**

解决方案

Not that I'm aware of.

To my understanding, a Path in Hadoop represents an identifier for a node in their distributed filesystem. This is a different abstraction from a java.io.File, which represents a node on the local filesystem. It's unlikely that a Path could even have a File representation that would behave equivalently, because the underlying models are fundamentally different.

Hence the lack of translation. I presume by your assertion that File objects are "[more] useful", you want an object of this class in order to use existing library methods? For the reasons above, this isn't going to work very well. If it's your own library, you could rewrite it to work cleanly with Hadoop Paths and then convert any Files into Path objects (this direction works as Paths are a strict superset of Files). If it's a third party library then you're out of luck; the authors of that method didn't take into account the effects of a distributed filesystem and only wrote that method to work on plain old local files.

您好,我可以回答这个问题。您可以使用Apache POI和Apache Parquet库来实现这个转换。首先,您需要使用POI库读取Excel文件中的数据,然后将数据转换为Parquet格式并写入Parquet文件。具体实现可以参考以下代码示例: ```java import java.io.File; import java.io.IOException; import java.util.ArrayList; import java.util.List; import org.apache.poi.ss.usermodel.Cell; import org.apache.poi.ss.usermodel.Row; import org.apache.poi.ss.usermodel.Sheet; import org.apache.poi.ss.usermodel.WorkbookFactory; import org.apache.parquet.hadoop.ParquetWriter; import org.apache.parquet.hadoop.metadata.CompressionCodecName; import org.apache.parquet.schema.MessageType; import org.apache.parquet.schema.MessageTypeParser; import org.apache.parquet.schema.Types; import org.apache.parquet.schema.Types.MessageTypeBuilder; import org.apache.parquet.schema.Types.PrimitiveTypeBuilder; public class ExcelToParquetConverter { public static void main(String[] args) throws IOException { // 读取Excel文件 Sheet sheet = WorkbookFactory.create(new File("input.xlsx")).getSheetAt(); // 构建Parquet文件的Schema MessageType schema = buildSchema(sheet); // 创建Parquet文件的Writer ParquetWriter<Row> writer = createWriter(schema, "output.parquet"); // 将Excel文件中的数据转换为Parquet格式并写入Parquet文件 for (Row row : sheet) { List<Object> values = new ArrayList<>(); for (Cell cell : row) { values.add(getCellValue(cell)); } writer.write(new org.apache.parquet.hadoop.example.GroupWriteSupport().toGroup(schema, values)); } // 关闭Parquet文件的Writer writer.close(); } private static MessageType buildSchema(Sheet sheet) { MessageTypeBuilder builder = Types.buildMessage(); builder.setName(sheet.getSheetName()); for (int i = ; i < sheet.getRow().getLastCellNum(); i++) { String columnName = sheet.getRow().getCell(i).getStringCellValue(); PrimitiveTypeBuilder columnBuilder = builder.primitive(columnName, Types.PrimitiveType.PrimitiveTypeName.BINARY); columnBuilder.optional(1); } return builder.named(sheet.getSheetName()); } private static ParquetWriter<Row> createWriter(MessageType schema, String outputPath) throws IOException { return org.apache.parquet.hadoop.ParquetWriter.builder(new org.apache.parquet.hadoop.Path(outputPath)) .withWriteMode(org.apache.parquet.hadoop.ParquetFileWriter.Mode.OVERWRITE) .withCompressionCodec(CompressionCodecName.SNAPPY) .withRowGroupSize(ParquetWriter.DEFAULT_BLOCK_SIZE) .withPageSize(ParquetWriter.DEFAULT_PAGE_SIZE) .withSchema(schema) .build(); } private static Object getCellValue(Cell cell) { switch (cell.getCellType()) { case STRING: return cell.getStringCellValue(); case NUMERIC: return cell.getNumericCellValue(); case BOOLEAN: return cell.getBooleanCellValue(); case FORMULA: return cell.getCellFormula(); default: return null; } } } ``` 希望这个示例对您有所帮助。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值