spark2.4.2版本编译

机器环境:

  • mac 10.14.6
  • Apache Maven 3.5.4(https://archive.apache.org/dist/maven/maven-3/3.5.4/)
  • Java version: 1.8.0_151(https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html)
  • R version 3.6.1(https://mirrors.tuna.tsinghua.edu.cn/CRAN/bin/macosx/R-3.6.1.pkg)
  • Scala 2.11.12(https://downloads.lightbend.com/scala/2.11.12/scala-2.11.12.tgz)
  • spark 2.4.2 (https://github.com/apache/spark/releases/tag/v2.4.2)

安装环境

根据上面的地址,下载对应的包,需要配置配置环境变量的包有:
java,maven,scala. R包使用默认的环境即可

开始编译

  • 解压unzip spark-2.4.2.zip文件
  • cd spark-2.4.2,修改配置vim dev/make-distribution.sh.注释掉一下内容…
# VERSION=$("$MVN" help:evaluate -Dexpression=project.version $@ 2>/dev/null\
#     | grep -v "INFO"\
#     | grep -v "WARNING"\
#     | tail -n 1)
# SCALA_VERSION=$("$MVN" help:evaluate -Dexpression=scala.binary.version $@ 2>/dev/null\
#     | grep -v "INFO"\
#     | grep -v "WARNING"\
#     | tail -n 1)
# SPARK_HADOOP_VERSION=$("$MVN" help:evaluate -Dexpression=hadoop.version $@ 2>/dev/null\
#     | grep -v "INFO"\
#     | grep -v "WARNING"\
#     | tail -n 1)
# SPARK_HIVE=$("$MVN" help:evaluate -Dexpression=project.activeProfiles -pl sql/hive $@ 2>/dev/null\
#     | grep -v "INFO"\
#     | grep -v "WARNING"\
#     | fgrep --count "<id>hive</id>";\
#     # Reset exit status to 0, otherwise the script stops here if the last grep finds nothing\
#     # because we use "set -o pipefail"
#     echo -n)

# if [ "$NAME" == "none" ]; then
#   NAME=$SPARK_HADOOP_VERSION
# fi

目的是为了加快编译速度,减少校验信息

  • 修改maven 仓库地址
 <mirrors>
      <mirror>
            <id>alimaven</id>
            <name>aliyun maven</name>
            <url>http://maven.aliyun.com/nexus/content/groups/public/</url>
            <mirrorOf>central</mirrorOf>
      </mirror>
      <mirror>
            <id>cloudera</id>
            <mirrorOf>*</mirrorOf>
            <name>cloudera Readable Name for this Mirror.</name>
            <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
      </mirror>
</mirrors>
  • 开始执行编译./dev/make-distribution.sh --name 2.6.0-cdh5.15.1 --pip --r --tgz -Psparkr -Phadoop-2.6 -Dhadoop.version=2.6.0-cdh5.15.1 -Phive -Phive-thriftserver -Pyarn -Pmesos -Pkubernetes

参数详解

  • Phadoop hadoop的大版本号
  • Dhadoop.version=2.6.0-cdh5.7.0 hadoop 的详细版本号
  • –pip 支持python
  • –r 支持r
  • Psparkr支持pyspark
  • Pkubernetes 支持k8s
  • Phive-thriftserver 支持hive
  • -Phive 支持hive
  • –tgz 打包方式
  • –name 打包后的生成的名称
  • -Phive -Phive-thriftserve 连接hive相关
  • -Pyarn 连接hadoop相关

接下来就是漫长的等待,下载jar包吧.可能偶尔会出错,那就重试吧,宝贝加油!!!

编译成功

[INFO]
[INFO] Spark Project Parent POM 2.4.2 ..................... SUCCESS [  8.616 s]
[INFO] Spark Project Tags ................................. SUCCESS [ 13.194 s]
[INFO] Spark Project Sketch ............................... SUCCESS [ 27.080 s]
[INFO] Spark Project Local DB ............................. SUCCESS [ 21.762 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 30.353 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 18.767 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [ 51.953 s]
[INFO] Spark Project Launcher ............................. SUCCESS [ 43.107 s]
[INFO] Spark Project Core ................................. SUCCESS [05:08 min]
[INFO] Spark Project ML Local Library ..................... SUCCESS [01:24 min]
[INFO] Spark Project GraphX ............................... SUCCESS [03:41 min]
[INFO] Spark Project Streaming ............................ SUCCESS [06:32 min]
[INFO] Spark Project Catalyst ............................. SUCCESS [11:04 min]
[INFO] Spark Project SQL .................................. SUCCESS [07:35 min]
[INFO] Spark Project ML Library ........................... SUCCESS [08:59 min]
[INFO] Spark Project Tools ................................ SUCCESS [ 25.680 s]
[INFO] Spark Project Hive ................................. SUCCESS [06:11 min]
[INFO] Spark Project REPL ................................. SUCCESS [01:38 min]
[INFO] Spark Project YARN Shuffle Service ................. SUCCESS [ 47.297 s]
[INFO] Spark Project YARN ................................. SUCCESS [03:39 min]
[INFO] Spark Project Mesos ................................ SUCCESS [03:34 min]
[INFO] Spark Project Kubernetes ........................... SUCCESS [02:47 min]
[INFO] Spark Project Hive Thrift Server ................... SUCCESS [01:28 min]
[INFO] Spark Project Assembly ............................. SUCCESS [ 16.813 s]
[INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:52 min]
[INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [03:18 min]
[INFO] Spark Project Examples ............................. SUCCESS [02:02 min]
[INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ 11.098 s]
[INFO] Spark Avro 2.4.2 ................................... SUCCESS [02:33 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 36:18 min (Wall Clock)
[INFO] Finished at: 2019-08-08T19:56:52+08:00
[INFO] ------------------------------------------------------------------------

会在当目录下看到一个spark版本+指定的name.tar.gz的压缩文件,这个文件就是我们可以直接用的文件.

软件可从这里下载:链接:https://pan.baidu.com/s/1yRwHJ5_GzOfmzpVNwHEcYA
提取码:vko8

参考文档地址:http://spark.apache.org/docs/2.4.2/building-spark.html
欢迎关注我的公众号

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值