编译hudi操作记录

目录

修改组件版本

修改源码兼容hadoop3

手动安装kafka依赖

解决spark模块依赖冲突

1)修改hudi-spark-bundle的pom文件,排除低版本jetty,添加hudi指定版本的jetty:

2)修改hudi-utilities-bundle的pom文件,排除低版本jetty,添加hudi指定版本的jetty:

编译过程中的报错及操作


hudi官方文档详见:Overview | Apache Hudi

 

修改组件版本

hudi项目clone到本地,拉取0.12.0分支,修改最外层pom文件,更改各组件版本如下

hadoop3.0.0-cdh6.3.1
hive2.1.1-cdh6.3.1
spark2.4.0-cdh6.3.1
flink1.14.3
hudi0.12.0

 

修改源码兼容hadoop3

org.apache.hudi.common.table.log.block.HoodieParquetDataBlock 的110行,添加第二个参数null

 

手动安装kafka依赖

重新编译hudi需要的kafka依赖如下:

  • common-config-5.3.4.jar
  • common-utils-5.3.4.jar
  • kafka-avro-serializer-5.3.4.jar
  • kafka-schema-registry-client-5.3.4.jar

 下载链接:http://packages.confluent.io/archive/5.3/confluent-5.3.4-2.12.zip

安装到本地库

mvn install:install-file -DgroupId=io.confluent -DartifactId=common-config -Dversion=5.3.4 -Dpackaging=jar -Dfile=./common-config-5.3.4.jar

mvn install:install-file -DgroupId=io.confluent -DartifactId=common-utils -Dversion=5.3.4 -Dpackaging=jar -Dfile=./common-utils-5.3.4.jar

mvn install:install-file -DgroupId=io.confluent -DartifactId=kafka-avro-serializer -Dversion=5.3.4 -Dpackaging=jar -Dfile=./kafka-avro-serializer-5.3.4.jar

mvn install:install-file -DgroupId=io.confluent -DartifactId=kafka-schema-registry-client -Dversion=5.3.4 -Dpackaging=jar -Dfile=./kafka-schema-registry-client-5.3.4.jar

解决spark模块依赖冲突

1)修改hudi-spark-bundle的pom文件,排除低版本jetty,添加hudi指定版本的jetty:

hudi/packaging/hudi-spark-bundle/pom.xml,在382行的位置,修改如下

<!-- Hive -->
    <dependency>
      <groupId>${hive.groupid}</groupId>
      <artifactId>hive-service</artifactId>
      <version>${hive.version}</version>
      <scope>${spark.bundle.hive.scope}</scope>
      <exclusions>
        <exclusion>
          <artifactId>guava</artifactId>
          <groupId>com.google.guava</groupId>
        </exclusion>
        <exclusion>
          <groupId>org.eclipse.jetty</groupId>
          <artifactId>*</artifactId>
        </exclusion>
        <exclusion>
          <groupId>org.pentaho</groupId>
          <artifactId>*</artifactId>
        </exclusion>
      </exclusions>
    </dependency>

    <dependency>
      <groupId>${hive.groupid}</groupId>
      <artifactId>hive-service-rpc</artifactId>
      <version>${hive.version}</version>
      <scope>${spark.bundle.hive.scope}</scope>
    </dependency>

    <dependency>
      <groupId>${hive.groupid}</groupId>
      <artifactId>hive-jdbc</artifactId>
      <version>${hive.version}</version>
      <scope>${spark.bundle.hive.scope}</scope>
      <exclusions>
        <exclusion>
          <groupId>javax.servlet</groupId>
          <artifactId>*</artifactId>
        </exclusion>
        <exclusion>
          <groupId>javax.servlet.jsp</groupId>
          <artifactId>*</artifactId>
        </exclusion>
        <exclusion>
          <groupId>org.eclipse.jetty</groupId>
          <artifactId>*</artifactId>
        </exclusion>
      </exclusions>
    </dependency>

    <dependency>
      <groupId>${hive.groupid}</groupId>
      <artifactId>hive-metastore</artifactId>
      <version>${hive.version}</version>
      <scope>${spark.bundle.hive.scope}</scope>
      <exclusions>
        <exclusion>
          <groupId>javax.servlet</groupId>
          <artifactId>*</artifactId>
        </exclusion>
        <exclusion>
          <groupId>org.datanucleus</groupId>
          <artifactId>datanucleus-core</artifactId>
        </exclusion>
        <exclusion>
          <groupId>javax.servlet.jsp</groupId>
          <artifactId>*</artifactId>
        </exclusion>
        <exclusion>
          <artifactId>guava</artifactId>
          <groupId>com.google.guava</groupId>
        </exclusion>
      </exclusions>
    </dependency>

    <dependency>
      <groupId>${hive.groupid}</groupId>
      <artifactId>hive-common</artifactId>
      <version>${hive.version}</version>
      <scope>${spark.bundle.hive.scope}</scope>
      <exclusions>
        <exclusion>
          <groupId>org.eclipse.jetty.orbit</groupId>
          <artifactId>javax.servlet</artifactId>
        </exclusion>
        <exclusion>
          <groupId>org.eclipse.jetty</groupId>
          <artifactId>*</artifactId>
        </exclusion>
      </exclusions>
</dependency>

    <!-- 增加hudi配置版本的jetty -->
    <dependency>
      <groupId>org.eclipse.jetty</groupId>
      <artifactId>jetty-server</artifactId>
      <version>${jetty.version}</version>
    </dependency>
    <dependency>
      <groupId>org.eclipse.jetty</groupId>
      <artifactId>jetty-util</artifactId>
      <version>${jetty.version}</version>
    </dependency>
    <dependency>
      <groupId>org.eclipse.jetty</groupId>
      <artifactId>jetty-webapp</artifactId>
      <version>${jetty.version}</version>
    </dependency>
    <dependency>
      <groupId>org.eclipse.jetty</groupId>
      <artifactId>jetty-http</artifactId>
      <version>${jetty.version}</version>
    </dependency>

2)修改hudi-utilities-bundle的pom文件,排除低版本jetty,添加hudi指定版本的jetty:

hudi/packaging/hudi-utilities-bundle/pom.xml,在345行的位置,修改如下:

<!-- Hoodie -->
    <dependency>
      <groupId>org.apache.hudi</groupId>
      <artifactId>hudi-common</artifactId>
      <version>${project.version}</version>
      <exclusions>
        <exclusion>
          <groupId>org.eclipse.jetty</groupId>
          <artifactId>*</artifactId>
        </exclusion>
      </exclusions>
    </dependency>
    <dependency>
      <groupId>org.apache.hudi</groupId>
      <artifactId>hudi-client-common</artifactId>
      <version>${project.version}</version>
      <exclusions>
        <exclusion>
          <groupId>org.eclipse.jetty</groupId>
          <artifactId>*</artifactId>
        </exclusion>
      </exclusions>
    </dependency>


<!-- Hive -->
    <dependency>
      <groupId>${hive.groupid}</groupId>
      <artifactId>hive-service</artifactId>
      <version>${hive.version}</version>
      <scope>${utilities.bundle.hive.scope}</scope>
      <exclusions>
		<exclusion>
          <artifactId>servlet-api</artifactId>
          <groupId>javax.servlet</groupId>
        </exclusion>
        <exclusion>
          <artifactId>guava</artifactId>
          <groupId>com.google.guava</groupId>
        </exclusion>
        <exclusion>
          <groupId>org.eclipse.jetty</groupId>
          <artifactId>*</artifactId>
        </exclusion>
        <exclusion>
          <groupId>org.pentaho</groupId>
          <artifactId>*</artifactId>
        </exclusion>
      </exclusions>
    </dependency>

    <dependency>
      <groupId>${hive.groupid}</groupId>
      <artifactId>hive-service-rpc</artifactId>
      <version>${hive.version}</version>
      <scope>${utilities.bundle.hive.scope}</scope>
    </dependency>

    <dependency>
      <groupId>${hive.groupid}</groupId>
      <artifactId>hive-jdbc</artifactId>
      <version>${hive.version}</version>
      <scope>${utilities.bundle.hive.scope}</scope>
      <exclusions>
        <exclusion>
          <groupId>javax.servlet</groupId>
          <artifactId>*</artifactId>
        </exclusion>
        <exclusion>
          <groupId>javax.servlet.jsp</groupId>
          <artifactId>*</artifactId>
        </exclusion>
        <exclusion>
          <groupId>org.eclipse.jetty</groupId>
          <artifactId>*</artifactId>
        </exclusion>
      </exclusions>
    </dependency>

    <dependency>
      <groupId>${hive.groupid}</groupId>
      <artifactId>hive-metastore</artifactId>
      <version>${hive.version}</version>
      <scope>${utilities.bundle.hive.scope}</scope>
      <exclusions>
        <exclusion>
          <groupId>javax.servlet</groupId>
          <artifactId>*</artifactId>
        </exclusion>
        <exclusion>
          <groupId>org.datanucleus</groupId>
          <artifactId>datanucleus-core</artifactId>
        </exclusion>
        <exclusion>
          <groupId>javax.servlet.jsp</groupId>
          <artifactId>*</artifactId>
        </exclusion>
        <exclusion>
          <artifactId>guava</artifactId>
          <groupId>com.google.guava</groupId>
        </exclusion>
      </exclusions>
    </dependency>

    <dependency>
      <groupId>${hive.groupid}</groupId>
      <artifactId>hive-common</artifactId>
      <version>${hive.version}</version>
      <scope>${utilities.bundle.hive.scope}</scope>
      <exclusions>
        <exclusion>
          <groupId>org.eclipse.jetty.orbit</groupId>
          <artifactId>javax.servlet</artifactId>
        </exclusion>
        <exclusion>
          <groupId>org.eclipse.jetty</groupId>
          <artifactId>*</artifactId>
        </exclusion>
      </exclusions>
</dependency>

    <!-- 增加hudi配置版本的jetty -->
    <dependency>
      <groupId>org.eclipse.jetty</groupId>
      <artifactId>jetty-server</artifactId>
      <version>${jetty.version}</version>
    </dependency>
    <dependency>
      <groupId>org.eclipse.jetty</groupId>
      <artifactId>jetty-util</artifactId>
      <version>${jetty.version}</version>
    </dependency>
    <dependency>
      <groupId>org.eclipse.jetty</groupId>
      <artifactId>jetty-webapp</artifactId>
      <version>${jetty.version}</version>
    </dependency>
    <dependency>
      <groupId>org.eclipse.jetty</groupId>
      <artifactId>jetty-http</artifactId>
      <version>${jetty.version}</version>
    </dependency>

编译过程中的报错及操作

指定版本执行编译命令

mvn clean package -DskipTests -Dspark2.4.0 -Dflink1.14 -Dscala-2.12 -Dhadoop.version=3.0.0 -Pflink-bundle-shade-hive2

报错如下:

无法将类 org.apache.zookeeper.server.ZooKeeperServer中的方法 shutdown应用到给定类型

将hadoop、spark版本指定为cdh版本,重新编译  

mvn clean package -DskipTests -Dspark2.4.0-cdh6.3.1 -Dflink1.14 -Dscala-2.12 -Dhadoop.version=3.0.0-cdh6.3.1 -Pflink-bundle-shade-hive2

报错如下:

cdh的包找不到,没有Scala2.12的包

 更改Scala版本为2.11,重新编译

mvn clean package -DskipTests -Dspark2.4.0-cdh6.3.1 -Dflink1.14 -Dscala-2.11 -Dhadoop.version=3.0.0-cdh6.3.1 -Pflink-bundle-shade-hive2

报错如下:

[ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.3.1:compile (scala-compile-first) on project hudi-spark-common_2.11: wrap: org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1) -> [Help 1]

 往上查日志,发现error信息

 注释掉org.apache.hudi.DataSourceReadOptions的部分代码

注释掉org.apache.hudi.HoodieBaseRelation的部分代码  

 

 

重新编译

mvn clean package -DskipTests -Dspark2.4.0-cdh6.3.1 -Dflink1.14 -Dscala-2.11 -Dhadoop.version=3.0.0-cdh6.3.1 -Pflink-bundle-shade-hive2

 报错如下:

 error: polymorphic expression cannot be instantiated to expected type
 ......
 required: Seq[org.apache.spark.sql.execution.datasources.PartitionedFile]

修改org.apache.spark.sql.adapter.Spark2Adapter的部分源码

 

重新编译

mvn clean package -DskipTests -Dspark2.4.0-cdh6.3.1 -Dflink1.14 -Dscala-2.11 -Dhadoop.version=3.0.0-cdh6.3.1 -Pflink-bundle-shade-hive2

 

报错如下:

JSONException未处理

修改org.apache.hudi.utilities.sources.helpers.S3EventsMetaSelector的部分源码,增加try catch  

 

重新编译

mvn clean package -DskipTests -Dspark2.4.0-cdh6.3.1 -Dflink1.14 -Dscala-2.11 -Dhadoop.version=3.0.0-cdh6.3.1 -Pflink-bundle-shade-hive2

 

报错如下:

同样是JSONException未处理

 

 

修改部分源码,增加try catch

重新编译

mvn clean package -DskipTests -Dspark2.4.0-cdh6.3.1 -Dflink1.14 -Dscala-2.11 -Dhadoop.version=3.0.0-cdh6.3.1 -Pflink-bundle-shade-hive2

 编译成功

 

参考:

Hudi集成CDH – 时光如水

0871-6.3.2-如何基于CDH6环境编译Hudi-0.9.0并使用 - 腾讯云开发者社区-腾讯云

整合Apache Hudi+Mysql+FlinkCDC2.1+CDH6.3.0_江南独孤客的技术博客_51CTO博客

评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

LCriska

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值