hudi 0.13.1 编译遇到的几个坑

官方文档

官网:https://hudi.apache.org/docs/quick-start-guide

下载

# 下载
[hadoop@node01 opt]$ https://dist.apache.org/repos/dist/release/hudi/0.13.1/hudi-0.13.1.src.tgz
# 解压
[hadoop@node01 opt]$ tar -zxvf hudi-0.13.1.src.tgz -C /hoem/hadoop/opt/

环境准备

[hadoop@node01 opt]$ wget https://dist.apache.org/repos/dist/release/maven/maven-3/3.9.4/binaries/apache-maven-3.9.4-bin.tar.gz
[hadoop@node01 opt]$ tar -zxvf apache-maven-3.9.4-bin.tar.gz
[hadoop@node01 opt]$ mv apache-maven-3.9.4 maven-3.9.4

设置maven环境变量

# 打开文件
sudo vim /etc/profile
# 添加内容
#MAVEN_HOME
export MAVEN_HOME=/home/hadoop/opt/maven-3.9.4
export PATH=$PATH:$MAVEN_HOME/bin
# 执行命令使其生效
source /etc/profile
# 查看是否生效
mvn -v

修改maven配置

vim /home/hadoop/opt/maven-3.9.4/conf/settings.xml 

<!-- 添加阿里云镜像-->
<mirror>
        <id>nexus-aliyun</id>
        <mirrorOf>central</mirrorOf>
        <name>Nexus aliyun</name>
        <url>http://maven.aliyun.com/nexus/content/groups/public</url>
</mirror>

修改文件

1、修改 hudi 的 pom.xml 文件

根据个人实际使用情况修改Hdi依赖的Hadoop和Hive版本

<!--<hadoop.version>2.10.1</hadoop.version>-->
<hadoop.version>3.3.5</hadoop.version>
<!--<hive.version>2.3.1</hive.version>-->
<hive.version>3.1.3</hive.version>


添加阿里云仓库地址

<repository>
  <id>nexus-aliyun</id>
  <name>nexus-aliyun</name>
  <url>http://maven.aliyun.com/nexus/content/groups/public/</url>
  <releases>
    <enabled>true</enabled>
  </releases>
  <snapshots>
    <enabled>false</enabled>
  </snapshots>
</repository>


2. 源码修改

在Hive3中Date类型序列化器的入参发生改变,Hudi中依赖的是低版本的Hive,需要做一些修改。

路径:hudi-0.13.1/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HiveAvroSerializer.java

case DATE:
        //return DateWritable.dateToDays(((DateObjectInspector)fieldOI).getPrimitiveJavaObject(structFieldData));
        return new DateWritable((DateWritable) structFieldData).getDays();
case TIMESTAMP:
        /*Timestamp timestamp =
            ((TimestampObjectInspector) fieldOI).getPrimitiveJavaObject(structFieldData);
        return timestamp.getTime();*/
        return new TimestampWritable((TimestampWritable) structFieldData).getTimestamp().getTime();
case INT:
        if (schema.getLogicalType() != null && schema.getLogicalType().getName().equals("date")) {
          //return DateWritable.dateToDays(new WritableDateObjectInspector().getPrimitiveJavaObject(structFieldData));
          //return new DateWritable((DateWritable)structFieldData).getDays();
          return new WritableDateObjectInspector().getPrimitiveWritableObject(structFieldData).getDays();
        }
      return fieldOI.getPrimitiveJavaObject(structFieldData);

修改完代码后在编译时mvn参数需要添加-Dcheckstyle.skip选项

  1. 解决spark模块依赖冲突

修改了Hive版本为3.1.3,其携带的jetty是0.9.3,hudi本身用的0.9.4,存在依赖冲突。

(1). 修改hudi-spark-bundle的pom.xml文件,排除低版本jetty,添加hudi指定版本的jetty:

# 需修改的文件位置
hudi-0.13.1/packaging/hudi-spark-bundle/pom.xml

修改内容:

<!-- Hive -->
<dependency>
  <groupId>${hive.groupid}</groupId>
  <artifactId>hive-service</artifactId>
  <version>${hive.version}</version>
  <scope>${spark.bundle.hive.scope}</scope>
  <exclusions>
    <exclusion>
      <artifactId>servlet-api</artifactId>
      <groupId>javax.servlet</groupId>
    </exclusion>
    <exclusion>
      <artifactId>guava</artifactId>
      <groupId>com.google.guava</groupId>
    </exclusion>
    <exclusion>
      <groupId>org.eclipse.jetty</groupId>
      <artifactId>*</artifactId>
    </exclusion>
    <exclusion>
      <groupId>org.pentaho</groupId>
      <artifactId>*</artifactId>
    </exclusion>
  </exclusions>
</dependency>

<dependency>
  <groupId>${hive.groupid}</groupId>
  <artifactId>hive-service-rpc</artifactId>
  <version>${hive.version}</version>
  <scope>${spark.bundle.hive.scope}</scope>
  <exclusions>
    <exclusion>
      <groupId>javax.servlet</groupId>
      <artifactId>*</artifactId>
    </exclusion>
    <exclusion>
      <groupId>javax.servlet.jsp</groupId>
      <artifactId>*</artifactId>
    </exclusion>
    <exclusion>
      <groupId>org.eclipse.jetty</groupId>
      <artifactId>*</artifactId>
    </exclusion>
  </exclusions>
</dependency>

<dependency>
  <groupId>${hive.groupid}</groupId>
  <artifactId>hive-jdbc</artifactId>
  <version>${hive.version}</version>
  <scope>${spark.bundle.hive.scope}</scope>
  <exclusions>
    <exclusion>
      <groupId>javax.servlet</groupId>
      <artifactId>*</artifactId>
    </exclusion>
    <exclusion>
      <groupId>javax.servlet.jsp</groupId>
      <artifactId>*</artifactId>
    </exclusion>
    <exclusion>
      <groupId>org.eclipse.jetty</groupId>
      <artifactId>*</artifactId>
    </exclusion>
  </exclusions>
</dependency>

<dependency>
  <groupId>${hive.groupid}</groupId>
  <artifactId>hive-metastore</artifactId>
  <version>${hive.version}</version>
  <scope>${spark.bundle.hive.scope}</scope>
  <exclusions>
    <exclusion>
      <groupId>org.apache.hbase</groupId>
      <artifactId>*</artifactId>
    </exclusion>
    <exclusion>
      <groupId>javax.servlet</groupId>
      <artifactId>*</artifactId>
    </exclusion>
    <exclusion>
      <groupId>org.datanucleus</groupId>
      <artifactId>datanucleus-core</artifactId>
    </exclusion>
    <exclusion>
      <groupId>javax.servlet.jsp</groupId>
      <artifactId>*</artifactId>
    </exclusion>
    <exclusion>
      <artifactId>guava</artifactId>
      <groupId>com.google.guava</groupId>
    </exclusion>
  </exclusions>
</dependency>

<dependency>
  <groupId>${hive.groupid}</groupId>
  <artifactId>hive-common</artifactId>
  <version>${hive.version}</version>
  <scope>${spark.bundle.hive.scope}</scope>
  <exclusions>
    <exclusion>
      <groupId>org.eclipse.jetty</groupId>
      <artifactId>*</artifactId>
    </exclusion>
    <exclusion>
      <groupId>org.eclipse.jetty.orbit</groupId>
      <artifactId>javax.servlet</artifactId>
    </exclusion>
    <exclusion>
      <groupId>org.eclipse.jetty</groupId>
      <artifactId>*</artifactId>
    </exclusion>
  </exclusions>
</dependency>
<!-- 增加hudi配置版本的jetty -->
<dependency>
  <groupId>org.eclipse.jetty</groupId>
  <artifactId>jetty-server</artifactId>
  <version>${jetty.version}</version>
</dependency>
<dependency>
  <groupId>org.eclipse.jetty</groupId>
  <artifactId>jetty-util</artifactId>
  <version>${jetty.version}</version>
</dependency>
<dependency>
  <groupId>org.eclipse.jetty</groupId>
  <artifactId>jetty-webapp</artifactId>
  <version>${jetty.version}</version>
</dependency>
<dependency>
  <groupId>org.eclipse.jetty</groupId>
  <artifactId>jetty-http</artifactId>
  <version>${jetty.version}</version>
</dependency>

<!-- zookeeper -->

(2). 修改hudi-utilities-bundle的pom文件,排除低版本jetty,添加hudi指定版本的jetty:

# 需修改的文件位置
hudi-0.13.1/packaging/hudi-utilities-bundle/pom.xml

修改内容:

<!-- Hive -->
<dependency>
  <groupId>${hive.groupid}</groupId>
  <artifactId>hive-service</artifactId>
  <version>${hive.version}</version>
  <scope>${utilities.bundle.hive.scope}</scope>
  <exclusions>
    <exclusion>
      <groupId>org.apache.hbase</groupId>
      <artifactId>*</artifactId>
    </exclusion>
    <exclusion>
      <artifactId>servlet-api</artifactId>
      <groupId>javax.servlet</groupId>
    </exclusion>
    <exclusion>
      <artifactId>guava</artifactId>
      <groupId>com.google.guava</groupId>
    </exclusion>
    <exclusion>
      <groupId>org.eclipse.jetty</groupId>
      <artifactId>*</artifactId>
    </exclusion>
    <exclusion>
      <groupId>org.pentaho</groupId>
      <artifactId>*</artifactId>
    </exclusion>
  </exclusions>
</dependency>

<dependency>
  <groupId>${hive.groupid}</groupId>
  <artifactId>hive-service-rpc</artifactId>
  <version>${hive.version}</version>
  <scope>${utilities.bundle.hive.scope}</scope>
  <exclusions>
    <exclusion>
      <groupId>javax.servlet</groupId>
      <artifactId>*</artifactId>
    </exclusion>
    <exclusion>
      <groupId>javax.servlet.jsp</groupId>
      <artifactId>*</artifactId>
    </exclusion>
    <exclusion>
      <groupId>org.eclipse.jetty</groupId>
      <artifactId>*</artifactId>
    </exclusion>
  </exclusions>
</dependency>

<dependency>
  <groupId>${hive.groupid}</groupId>
  <artifactId>hive-jdbc</artifactId>
  <version>${hive.version}</version>
  <scope>${utilities.bundle.hive.scope}</scope>
  <exclusions>
    <exclusion>
      <groupId>javax.servlet</groupId>
      <artifactId>*</artifactId>
    </exclusion>
    <exclusion>
      <groupId>javax.servlet.jsp</groupId>
      <artifactId>*</artifactId>
    </exclusion>
    <exclusion>
      <groupId>org.eclipse.jetty</groupId>
      <artifactId>*</artifactId>
    </exclusion>
  </exclusions>
</dependency>

<dependency>
  <groupId>${hive.groupid}</groupId>
  <artifactId>hive-metastore</artifactId>
  <version>${hive.version}</version>
  <scope>${utilities.bundle.hive.scope}</scope>
  <exclusions>
    <exclusion>
      <groupId>org.apache.hbase</groupId>
      <artifactId>*</artifactId>
    </exclusion>
    <exclusion>
      <groupId>javax.servlet</groupId>
      <artifactId>*</artifactId>
    </exclusion>
    <exclusion>
      <groupId>org.datanucleus</groupId>
      <artifactId>datanucleus-core</artifactId>
    </exclusion>
    <exclusion>
      <groupId>javax.servlet.jsp</groupId>
      <artifactId>*</artifactId>
    </exclusion>
    <exclusion>
      <artifactId>guava</artifactId>
      <groupId>com.google.guava</groupId>
    </exclusion>
  </exclusions>
</dependency>

<dependency>
  <groupId>${hive.groupid}</groupId>
  <artifactId>hive-common</artifactId>
  <version>${hive.version}</version>
  <scope>${utilities.bundle.hive.scope}</scope>
  <exclusions>
    <exclusion>
      <groupId>org.eclipse.jetty.orbit</groupId>
      <artifactId>javax.servlet</artifactId>
    </exclusion>
    <exclusion>
      <groupId>org.eclipse.jetty</groupId>
      <artifactId>*</artifactId>
    </exclusion>
  </exclusions>
</dependency>

<dependency>
  <groupId>org.apache.htrace</groupId>
  <artifactId>htrace-core</artifactId>
  <version>${htrace.version}</version>
  <scope>compile</scope>
</dependency>
<!-- 增加hudi配置版本的jetty -->
<dependency>
  <groupId>org.eclipse.jetty</groupId>
  <artifactId>jetty-server</artifactId>
  <version>${jetty.version}</version>
</dependency>
<dependency>
  <groupId>org.eclipse.jetty</groupId>
  <artifactId>jetty-util</artifactId>
  <version>${jetty.version}</version>
</dependency>
<dependency>
  <groupId>org.eclipse.jetty</groupId>
  <artifactId>jetty-webapp</artifactId>
  <version>${jetty.version}</version>
</dependency>
<dependency>
  <groupId>org.eclipse.jetty</groupId>
  <artifactId>jetty-http</artifactId>
  <version>${jetty.version}</version>
</dependency>
<!--增加hudi common和hudi client common的依赖-->
<dependency>
  <groupId>org.apache.hudi</groupId>
  <artifactId>hudi-common</artifactId>
  <version>${project.version}</version>
  <exclusions>
    <exclusion>
      <groupId>org.eclipse.jetty</groupId>
      <artifactId>*</artifactId>
    </exclusion>
  </exclusions>
</dependency>
<dependency>
  <groupId>org.apache.hudi</groupId>
  <artifactId>hudi-client-common</artifactId>
  <version>${project.version}</version>
  <exclusions>
    <exclusion>
      <groupId>org.eclipse.jetty</groupId>
      <artifactId>*</artifactId>
    </exclusion>
  </exclusions>
</dependency>
  1. 修改packaging/hudi-flink-bundle/pom.xml,在relocations标签中加入
<relocation>
  <pattern>org.apache.parquet</pattern>
  <shadedPattern>${flink.bundle.shade.prefix}org.apache.parquet</shadedPattern>
</relocation>

编译

mvn clean package -DskipTests -Dcheckstyle.skip -Dspark3.4 -Dflink1.16 -Dscala-2.12 -Dhadoop.version=3.3.5 -Pflink-bundle-shade-hive3

运行 hudi-cli/hudi-cli.sh出现下面的结果表示编译成功


代码编译完成后进入:hudi-0.13.1/packaging/hudi-spark-bundle/target/

详细资料关注微信公众号
在这里插入图片描述

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值