机器环境:
- mac 10.14.6
- Apache Maven 3.5.4(https://archive.apache.org/dist/maven/maven-3/3.5.4/)
- Java version: 1.8.0_151(https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html)
- R version 3.6.1(https://mirrors.tuna.tsinghua.edu.cn/CRAN/bin/macosx/R-3.6.1.pkg)
- Scala 2.11.12(https://downloads.lightbend.com/scala/2.11.12/scala-2.11.12.tgz)
- spark 2.4.2 (https://github.com/apache/spark/releases/tag/v2.4.2)
安装环境
根据上面的地址,下载对应的包,需要配置配置环境变量的包有:
java,maven,scala. R包使用默认的环境即可
开始编译
- 解压
unzip spark-2.4.2.zip
文件 cd spark-2.4.2
,修改配置vim dev/make-distribution.sh
.注释掉一下内容…
# VERSION=$("$MVN" help:evaluate -Dexpression=project.version $@ 2>/dev/null\
# | grep -v "INFO"\
# | grep -v "WARNING"\
# | tail -n 1)
# SCALA_VERSION=$("$MVN" help:evaluate -Dexpression=scala.binary.version $@ 2>/dev/null\
# | grep -v "INFO"\
# | grep -v "WARNING"\
# | tail -n 1)
# SPARK_HADOOP_VERSION=$("$MVN" help:evaluate -Dexpression=hadoop.version $@ 2>/dev/null\
# | grep -v "INFO"\
# | grep -v "WARNING"\
# | tail -n 1)
# SPARK_HIVE=$("$MVN" help:evaluate -Dexpression=project.activeProfiles -pl sql/hive $@ 2>/dev/null\
# | grep -v "INFO"\
# | grep -v "WARNING"\
# | fgrep --count "<id>hive</id>";\
# # Reset exit status to 0, otherwise the script stops here if the last grep finds nothing\
# # because we use "set -o pipefail"
# echo -n)
# if [ "$NAME" == "none" ]; then
# NAME=$SPARK_HADOOP_VERSION
# fi
目的是为了加快编译速度,减少校验信息
- 修改maven 仓库地址
<mirrors>
<mirror>
<id>alimaven</id>
<name>aliyun maven</name>
<url>http://maven.aliyun.com/nexus/content/groups/public/</url>
<mirrorOf>central</mirrorOf>
</mirror>
<mirror>
<id>cloudera</id>
<mirrorOf>*</mirrorOf>
<name>cloudera Readable Name for this Mirror.</name>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</mirror>
</mirrors>
- 开始执行编译
./dev/make-distribution.sh --name 2.6.0-cdh5.15.1 --pip --r --tgz -Psparkr -Phadoop-2.6 -Dhadoop.version=2.6.0-cdh5.15.1 -Phive -Phive-thriftserver -Pyarn -Pmesos -Pkubernetes
参数详解
- Phadoop hadoop的大版本号
- Dhadoop.version=2.6.0-cdh5.7.0 hadoop 的详细版本号
- –pip 支持python
- –r 支持r
- Psparkr支持pyspark
- Pkubernetes 支持k8s
- Phive-thriftserver 支持hive
- -Phive 支持hive
- –tgz 打包方式
- –name 打包后的生成的名称
- -Phive -Phive-thriftserve 连接hive相关
- -Pyarn 连接hadoop相关
接下来就是漫长的等待,下载jar包吧.可能偶尔会出错,那就重试吧,宝贝加油!!!
编译成功
[INFO]
[INFO] Spark Project Parent POM 2.4.2 ..................... SUCCESS [ 8.616 s]
[INFO] Spark Project Tags ................................. SUCCESS [ 13.194 s]
[INFO] Spark Project Sketch ............................... SUCCESS [ 27.080 s]
[INFO] Spark Project Local DB ............................. SUCCESS [ 21.762 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 30.353 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 18.767 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [ 51.953 s]
[INFO] Spark Project Launcher ............................. SUCCESS [ 43.107 s]
[INFO] Spark Project Core ................................. SUCCESS [05:08 min]
[INFO] Spark Project ML Local Library ..................... SUCCESS [01:24 min]
[INFO] Spark Project GraphX ............................... SUCCESS [03:41 min]
[INFO] Spark Project Streaming ............................ SUCCESS [06:32 min]
[INFO] Spark Project Catalyst ............................. SUCCESS [11:04 min]
[INFO] Spark Project SQL .................................. SUCCESS [07:35 min]
[INFO] Spark Project ML Library ........................... SUCCESS [08:59 min]
[INFO] Spark Project Tools ................................ SUCCESS [ 25.680 s]
[INFO] Spark Project Hive ................................. SUCCESS [06:11 min]
[INFO] Spark Project REPL ................................. SUCCESS [01:38 min]
[INFO] Spark Project YARN Shuffle Service ................. SUCCESS [ 47.297 s]
[INFO] Spark Project YARN ................................. SUCCESS [03:39 min]
[INFO] Spark Project Mesos ................................ SUCCESS [03:34 min]
[INFO] Spark Project Kubernetes ........................... SUCCESS [02:47 min]
[INFO] Spark Project Hive Thrift Server ................... SUCCESS [01:28 min]
[INFO] Spark Project Assembly ............................. SUCCESS [ 16.813 s]
[INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:52 min]
[INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [03:18 min]
[INFO] Spark Project Examples ............................. SUCCESS [02:02 min]
[INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ 11.098 s]
[INFO] Spark Avro 2.4.2 ................................... SUCCESS [02:33 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 36:18 min (Wall Clock)
[INFO] Finished at: 2019-08-08T19:56:52+08:00
[INFO] ------------------------------------------------------------------------
会在当目录下看到一个spark版本+指定的name.tar.gz的压缩文件,这个文件就是我们可以直接用的文件.
软件可从这里下载:链接:https://pan.baidu.com/s/1yRwHJ5_GzOfmzpVNwHEcYA
提取码:vko8
参考文档地址:http://spark.apache.org/docs/2.4.2/building-spark.html