编译Spark 3.0.1
编译参考
spark官方编译指导
http://spark.apache.org/docs/latest/building-spark.html
前置要求
-
mvn版本为3.6.3
-
java版本 8
-
mvn内存设置 export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g"
命令参数设置
- 通过-D传入依赖的版本
- -Dhadoop.version=3.2.1 指定hadoop的版本号
- 通过-P激活对应的Profile
- -Pyarn 支持yarn
- -Phadoop-3.2 支持hadoop 3.2
- -Phive -Phive-thriftserver 支持hive
- -Pkubernetes 支持kubernetes
- -Psparkr 支持sparkr
- -Pmesos 支持mesos
- -Pscala-2.12 支持大版本的scala为2.12
- -Phive-1.2 支持编译的hive版本为1.2.1版本
- python支持
- –pip 支持编译python
- R语言支持
- –r 支持编译R语言
- 打出一个压缩包
- –tgz 打出gz个是的压缩包
- 压缩包的名字
- –name XXX 给打出的压缩包带上相关的后缀名
- 指定自己环境中的mvn位置
- –mvn ${path}
编译脚本修改
- 在make-distribution.sh脚本中有这样一段用来获取环境中Spark/scala/hadoop/hive版本号的代码,需要直接改称真正的版本号
- 更改后如下
编译命令
./dev/make-distribution.sh --tgz --mvn /usr/share/maven/bin/mvn --name hadoop3.2.1 -Phive -Phive-thriftserver -Pmesos -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -Pscala-2.12 -Phive-1.2
这个命令经过执行后最终实际是
/usr/share/maven/bin/mvn clean package -DskipTests -Phive -Phive-thriftserver -Pmesos -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -Pscala-2.12 -Phive-1.2
编译输出
spark-3.0.1$ ./dev/make-distribution.sh --tgz --mvn /usr/share/maven/bin/mvn --name hadoop3.2.1 -Phive -Phive-thriftserver -Pmesos -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -Pscala-2.12 -Phive-1.2
+++ dirname ./dev/make-distribution.sh
++ cd ./dev/…
++ pwd
-
SPARK_HOME=/home/sourcecode/bigdata/spark-3.0.1
-
DISTDIR=/home/sourcecode/bigdata/spark-3.0.1/dist
-
MAKE_TGZ=false
-
MAKE_PIP=false
-
MAKE_R=false
-
NAME=none
-
MVN=/home/sourcecode/bigdata/spark-3.0.1/build/mvn
-
(( 12 ))
-
case $1 in
-
MAKE_TGZ=true
-
shift
-
(( 11 ))
-
case $1 in
-
MVN=/usr/share/maven/bin/mvn
-
shift
-
shift
-
(( 9 ))
-
case $1 in
-
NAME=hadoop3.2.1
-
shift
-
shift
-
(( 7 ))
-
case $1 in
-
break
-
‘[’ -z /usr/lib/jvm/java-8-openjdk-amd64 ‘]’
-
‘[’ -z /usr/lib/jvm/java-8-openjdk-amd64 ‘]’
++ command -v git -
‘[’ ‘]’
++ command -v /usr/share/maven/bin/mvn -
‘[’ ‘!’ /usr/share/maven/bin/mvn ‘]’
-
VERSION=3.0.1
-
SCALA_VERSION=2.12.10
-
SPARK_HADOOP_VERSION=3.2.1
-
SPARK_HIVE=1.2.1
-
‘[’ hadoop3.2.1 == none ‘]’
-
echo ‘Spark version is 3.0.1’
Spark version is 3.0.1 -
‘[’ true == true ‘]’
-
echo ‘Making spark-3.0.1-bin-hadoop3.2.1.tgz’
Making spark-3.0.1-bin-hadoop3.2.1.tgz -
cd /home/sourcecode/bigdata/spark-3.0.1
-
export ‘MAVEN_OPTS=-Xmx2g -XX:ReservedCodeCacheSize=1g’
-
MAVEN_OPTS=’-Xmx2g -XX:ReservedCodeCacheSize=1g’
-
BUILD_COMMAND=("$MVN" clean package -DskipTests $@)
-
echo -e ‘\nBuilding with…’
-
echo -e ‘$ /usr/share/maven/bin/mvn’ clean package -DskipTests -Phive -Phive-thriftserver -Pmesos -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 ‘-Pscala-2.12\n’
$ /usr/share/maven/bin/mvn clean package -DskipTests -Phive -Phive-thriftserver -Pmesos -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -Pscala-2.12 -
/usr/share/maven/bin/mvn clean package -DskipTests -Phive -Phive-thriftserver -Pmesos -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -Pscala-2.12 -Phive-1.2