1. wesleypeck编写的开源parquet-tools
parquet-tools出现org/apache/hadoop/conf/Configuration问题的解决
该版本由于原作者不在进行更新,目前网上能够找到的版本大部分无法使用,原因在于源码中pom.xml并没有引入对应hadoop-core的依赖,导致jar包在执行对应命令时会报错:
NoClassDefFoundError: org/apache/hadoop/conf/Configuration
2.自己下载源码包编译
在https://mvnrepository.com官网下载源码包,下载Apache Parquet Tools即可
3.开始编译
打开源码包在org.apache.parquet.tools目录下有入口类Main.class,这个类就是工具的入口类;把源码和pom.xml放入自己的工程内,编译是选择带依赖编译。我在编译是程序多个报错,可能是我的环境问题,我有多加了几个依赖项。编译是指定包内的入口类为打包的mainclass。
我下载的是parquet-tools-1.11.2-sources.jar这个版本,我已经编译成功,下面是完整的依赖环境:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>parquet-tools</groupId> <artifactId>parquet-tools</artifactId> <version>0.0.1-SNAPSHOT</version> <dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>2.7.3</version> <scope>provided</scope> <exclusions> <exclusion> <artifactId>hadoop-common</artifactId> <groupId>org.apache.hadoop</groupId> </exclusion> <exclusion> <artifactId>hadoop-hdfs</artifactId> <groupId>org.apache.hadoop</groupId> </exclusion> <exclusion> <artifactId>hadoop-mapreduce-client-app</artifactId> <groupId>org.apache.hadoop</groupId> </exclusion> <exclusion> <artifactId>hadoop-yarn-api</artifactId> <groupId>org.apache.hadoop</groupId> </exclusion> <exclusion> <artifactId>hadoop-mapreduce-client-core</artifactId> <groupId>org.apache.hadoop</groupId> </exclusion> <exclusion> <artifactId>hadoop-mapreduce-client-jobclient</artifactId> <groupId>org.apache.hadoop</groupId> </exclusion> <exclusion> <artifactId>hadoop-annotations</artifactId> <groupId>org.apache.hadoop</groupId> </exclusion> </exclusions> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.12</version> <scope>test</scope> <exclusions> <exclusion> <artifactId>hamcrest-core</artifactId> <groupId>org.hamcrest</groupId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.easymock</groupId> <artifactId>easymock</artifactId> <version>3.4</version> <scope>test</scope> <exclusions> <exclusion> <artifactId>objenesis</artifactId> <groupId>org.objenesis</groupId> </exclusion> </exclusions> </dependency> <dependency> <groupId>commons-httpclient</groupId> <artifactId>commons-httpclient</artifactId> <version>3.1</version> <scope>test</scope> <exclusions> <exclusion> <artifactId>commons-logging</artifactId> <groupId>commons-logging</groupId> </exclusion> <exclusion> <artifactId>commons-codec</artifactId> <groupId>commons-codec</groupId> </exclusion> </exclusions> </dependency> <!-- <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-core</artifactId> <version>1.2.1</version> </dependency>--> <dependency> <groupId>org.apache.parquet</groupId> <artifactId>parquet-column</artifactId> <version>1.12.3</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client-runtime</artifactId> <version>3.3.4</version> </dependency> <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-databind</artifactId> <version>2.14.2</version> </dependency> <dependency> <groupId>org.apache.parquet</groupId> <artifactId>parquet-hadoop</artifactId> <version>1.12.0</version> </dependency> <dependency> <groupId>org.apache.curator</groupId> <artifactId>curator-client</artifactId> <version>4.2.0</version> </dependency> <dependency> <groupId>commons-cli</groupId> <artifactId>commons-cli</artifactId> <version>1.5.0</version> <!-- 根据实际情况选择合适的版本 --> </dependency> </dependencies> <build> <!-- <sourceDirectory>src/main/java</sourceDirectory> --> <plugins> <plugin> <artifactId>maven-compiler-plugin</artifactId> <version>3.3</version> <configuration> <source>1.8</source> <target>1.8</target> </configuration> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>2.12.4</version> <configuration> <skipTests>true</skipTests> </configuration> </plugin> <!--将数据资源--> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-resources-plugin</artifactId> <version>3.0.1</version> <configuration> <encoding>UTF-8</encoding> </configuration> </plugin> <!--该插件用于将scala代码编译成class文件--> <plugin> <groupId>net.alchim31.maven</groupId> <artifactId>scala-maven-plugin</artifactId> <version>3.2.2</version> <executions> <!--绑定到maven的编译阶段--> <execution> <goals> <goal>compile</goal> <goal>testCompile</goal> </goals> <configuration> <args> <arg>-dependencyfile</arg> <arg>${project.build.directory}/.scala_dependencies</arg> </args> </configuration> </execution> </executions> </plugin> <!--指定源代码目录--> <plugin> <groupId>org.codehaus.mojo</groupId> <artifactId>build-helper-maven-plugin</artifactId> <version>3.2.0</version> <executions> <execution> <id>add-source</id> <phase>generate-sources</phase> <goals> <goal>add-source</goal> </goals> <configuration> <sources> <source>src/main/java</source> <source>src/main/scala</source> </sources> </configuration> </execution> </executions> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-assembly-plugin</artifactId> <executions> <execution> <phase>package</phase> <goals> <goal>single</goal> </goals> <configuration> <archive> <manifest> <mainClass> org.apache.parquet.tools.Main </mainClass> </manifest> </archive> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> </configuration> </execution> </executions> </plugin> </plugins> </build> </project>