1. 安装maven
1)将安装包解压到指定目录:
[root@master apache-maven-3.5.3]# tar -zxf /opt/maven/apache-maven-3.5.3-bin.tar.gz -C /usr/local/
2)配置maven环境变量,并测试maven是否安装成功
[root@master apache-maven-3.5.3]# vi /etc/profile
#maven
export MAVEN_HOME=/usr/local/apache-maven-3.5.3
export PATH=$PATH:$MAVEN_HOME/bin
export MAVEN_OPTS="-Xmx2048m -XX:MetaspaceSize=1024m -XX:MaxMetaspaceSize=1524m -Xss2m"
export PATH=$PATH:$MAVEN_HOME/bin
[root@master apache-maven-3.5.3]# source /etc/profile
[root@master apache-maven-3.5.3]# mvn -version
Apache Maven 3.5.3 (3383c37e1f9e9b3bc3df5050c29c8aff9f295297; 2018-02-24T11:49:05-08:00)
Maven home: /usr/local/apache-maven-3.5.3
Java version: 1.8.0_171, vendor: Oracle Corporation
Java home: /usr/local/jdk1.8.0_171/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "3.10.0-123.el7.x86_64", arch: "amd64", family: "unix"
2.下载Spark源码
1)挂载到/opt目录
2)解压到工作目录
[root@master home]# tar -zxf /opt/spark/spark-2.3.1.tgz -C /home/andy/work
[root@master home]# cd /home/andy/work
[root@master work]# ll
total 4
drwxrwxr-x. 29 andy andy 4096 Jun 1 13:34 spark-2.3.1
[root@master work]# cd spark-2.3.1/
[root@master spark-2.3.1]# ll
total 228
-rw-rw-r--. 1 andy andy 2318 Jun 1 13:34 appveyor.yml
drwxrwxr-x. 3 andy andy 43 Jun 1 13:34 assembly
drwxrwxr-x. 2 andy andy 4096 Jun 1 13:34 bin
drwxrwxr-x. 2 andy andy 75 Jun 1 13:34 build
drwxrwxr-x. 9 andy andy 4096 Jun 1 13:34 common
drwxrwxr-x. 2 andy andy 4096 Jun 1 13:34 conf
-rw-rw-r--. 1 andy andy 995 Jun 1 13:34 CONTRIBUTING.md
drwxrwxr-x. 3 andy andy 30 Jun 1 13:34 core
drwxrwxr-x. 5 andy andy 47 Jun 1 13:34 data
drwxrwxr-x. 6 andy andy 4096 Jun 1 13:34 dev
drwxrwxr-x. 9 andy andy 4096 Jun 1 13:34 docs
drwxrwxr-x. 3 andy andy 30 Jun 1 13:34 examples
drwxrwxr-x. 15 andy andy 4096 Jun 1 13:34 external
drwxrwxr-x. 3 andy andy 30 Jun 1 13:34 graphx
drwxrwxr-x. 2 andy andy 20 Jun 1 13:34 hadoop-cloud
drwxrwxr-x. 3 andy andy 30 Jun 1 13:34 launcher
-rw-rw-r--. 1 andy andy 18045 Jun 1 13:34 LICENSE
drwxrwxr-x. 2 andy andy 4096 Jun 1 13:34 licenses
drwxrwxr-x. 3 andy andy 30 Jun 1 13:34 mllib
drwxrwxr-x. 3 andy andy 30 Jun 1 13:34 mllib-local
-rw-rw-r--. 1 andy andy 24913 Jun 1 13:34 NOTICE
-rw-rw-r--. 1 andy andy 101718 Jun 1 13:34 pom.xml
drwxrwxr-x. 2 andy andy 4096 Jun 1 13:34 project
drwxrwxr-x. 6 andy andy 4096 Jun 1 13:34 python
drwxrwxr-x. 3 andy andy 4096 Jun 1 13:34 R
-rw-rw-r--. 1 andy andy 3809 Jun 1 13:34 README.md
drwxrwxr-x. 5 andy andy 64 Jun 1 13:34 repl
drwxrwxr-x. 5 andy andy 46 Jun 1 13:34 resource-managers
drwxrwxr-x. 2 andy andy 4096 Jun 1 13:34 sbin
-rw-rw-r--. 1 andy andy 17624 Jun 1 13:34 scalastyle-config.xml
drwxrwxr-x. 6 andy andy 4096 Jun 1 13:34 sql
drwxrwxr-x. 3 andy andy 30 Jun 1 13:34 streaming
drwxrwxr-x. 3 andy andy 30 Jun 1 13:34 tools
3.编译Spark源码
本本编译Spark源码是接着上一篇CentOS7安装spark2.0集群来写的,所以下图中的工具配置都已经完成:
#scala
export SCALA_HOME=/usr/local/scala-2.12.6
export PATH=$PATH:$SCALA_HOME/bin
#jdk
export JAVA_HOME=/usr/local/jdk1.8.0_171
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
#spark
export SPARK_HOME=/usr/local/spark-2.3.1-bin-hadoop2.7
export PATH=$PATH:$SPARK_HOME/bin
export SPARK_EXAMPLES_JAR=$SPARK_HOME/examples/jars/spark-examples_2.11-2.3.1.jar
1) 设置Maven内存使用,您需要通过MAVEN_OPTS配置Maven的内存使用量,官方推荐配置如下:
export MAVEN_OPTS="-Xmx2048m -XX:MetaspaceSize=1024m -XX:MaxMetaspaceSize=1524m -Xss2m"
export PATH=$PATH:$MAVEN_OPTS/bin
虚拟机推荐设置内存4G,一定要大于MAVEN_OPTS中设置的最大内存。本人一开始给虚拟机设置的内存为1G,编译进程总是会被卡死。
2)编译
[root@master spark-2.3.1]# mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -Phadoop-provided -Phive -Phive-thriftserver -Pnetlib-lgpl -DskipTests clean package
[INFO] Scanning for projects...
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/apache/18/apache-18.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/apache/18/apache-18.pom (16 kB at 4.8 kB/s)
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO]
[INFO] Spark Project Parent POM [pom]
[INFO] Spark Project Tags [jar]
[INFO] Spark Project Sketch [jar]
[INFO] Spark Project Local DB [jar]
[INFO] Spark Project Networking [jar]
[INFO] Spark Project Shuffle Streaming Service [jar]
[INFO] Spark Project Unsafe [jar]
[INFO] Spark Project Launcher [jar]
[INFO] Spark Project Core [jar]
[INFO] Spark Project ML Local Library [jar]
[INFO] Spark Project GraphX [jar]
[INFO] Spark Project Streaming [jar]
[INFO] Spark Project Catalyst [jar]
[INFO] Spark Project SQL [jar]
[INFO] Spark Project ML Library [jar]
[INFO] Spark Project Tools [jar]
[INFO] Spark Project Hive [jar]
[INFO] Spark Project REPL [jar]
[INFO] Spark Project YARN Shuffle Service [jar]
[INFO] Spark Project YARN [jar]
[INFO] Spark Project Hive Thrift Server [jar]
[INFO] Spark Project Assembly [pom]
[INFO] Spark Integration for Kafka 0.10 [jar]
[INFO] Kafka 0.10 Source for Structured Streaming [jar]
[INFO] Spark Project Examples [jar]
[INFO] Spark Integration for Kafka 0.10 Assembly [jar]
[INFO]
[INFO] -----------------< org.apache.spark:spark-parent_2.11 >-----------------
3)编译成功