- 安装thrift
wget https://mirrors.cnnic.cn/apache/thrift/0.19.0/thrift-0.12.0.tar.gz
tar -zxvf thrift-0.12.0.tar.gz
cd thrift-0.12.0
yum install libtool flex bison pkgconfig boost-devel libevent-devel zlib-devel python-devel ruby-devel openssl-devel ant
./bootstrap.sh
./configure
make &make install
- 增加阿里云OSS支持
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-aliyun</artifactId>
<version>3.2.1</version>
<scope>provided</scope>
</dependency>
-
设置hive版本(CDH上面hive版本为2.1.1)
vim packaging/hudi-flink-bundle/pom.xml
hive.version=2.1.1 -
打hudi包
mvn clean package -DskipTests -Drat.skip=true -Dscala-2.12 -T24C
mvn clean install -Drat.skip=true -Pflink-bundle-shade-hive2 -Pinclude-flink-sql-connector-hive -DskipTests -Dscala-2.12 -T24C
问题总结:
问题1. thrift库冲突
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.thrift.protocol.TProtocol.getScheme()Ljava/lang/Class;
发现是自己flink-parquet打的shaded包里面把org.apache.thrift打进去了,剔除出去重新打包就可以了
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.2.4</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<shadedArtifactAttached>true</shadedArtifactAttached>
<!--这个表示生成的shade的包 它的后缀名称是什么,通过这个后缀名称,在引用的时候的,就不会出现引用了shade包的情况了。 -->
<shadedClassifierName>shade</shadedClassifierName>
<!--不让打fat包,如果打fat包 在flink1.12.1集成时出现libthrift包冲突 -->
<artifactSet>
<includes>
<include>org.apache.flink:flink-formats</include>
</includes>
</artifactSet>
<relocations>
<relocation>
<!-- 源包名 -->
<pattern>org.apache.flink.formats.parquet</pattern>
<!-- 目的包名 -->
<shadedPattern>shaded.org.apache.flink.formats.parquet</shadedPattern>
</relocation>
<relocation>
<!-- 源包名 -->
<pattern>org.apache.parquet</pattern>
<!-- 目的包名 -->
<shadedPattern>shaded.org.apache.parquet</shadedPattern>
</relocation>
</relocations>
</configuration>
</execution>
</executions>
</plugin>
主要更改为如下代码(只打包指定模块):
<artifactSet>
<includes>
<include>org.apache.flink:flink-formats</include>
</includes>
</artifactSet>
问题2:flink向Hudi表写数据,元数据并没有更新到Hive里面,报错连接不上Hive,感觉是版本不匹配
,后maven 编译Hudi表的时候,不把hive依赖打进去,采用
mvn clean package -DskipTests -Drat.skip=true -Dscala-2.12 -T24C
问题3. 刚开始编译Hudi0.9 scala.version=2.12时报错,未找到原因,后先用scala.version=2.11编译成功之后,再用scala.version=2.12编译成功