Apache Spark-2.4.2-编译与安装

最新推荐文章于 2022-01-11 21:31:11 发布

Harm灬小海

最新推荐文章于 2022-01-11 21:31:11 发布

阅读量950

点赞数

分类专栏：大数据学习-高级

本文链接：https://blog.csdn.net/weixin_42330251/article/details/89636943

版权

大数据学习-高级专栏收录该内容

18 篇文章 0 订阅

订阅专栏

文章目录

下载
软件环境
编译与配置
解压部署
启动Spark

下载

百度云下载：链接：https://pan.baidu.com/s/1IvKxR-dx1MgGcaxtEHUVTQ 提取码：8icm
官方下载：https://archive.apache.org/dist/spark/spark-2.4.2/spark-2.4.2.tgz

软件环境

软件	Hadoop	scala	maven	JDK
版本	2.6.0-cdh5.7.0	2.11.12	3.6.1	jdk1.8.0_45

编译与配置

1. 解压Spark源码

[hadoop@hadoop614 Demonstration]$ ll spark-2.4.2.tgz 
-rw-r--r--. 1 hadoop hadoop 16165557 4月  28 04:41 spark-2.4.2.tgz
[hadoop@hadoop614 Demonstration]$ tar -zxvf spark-2.4.2.tgz 
[hadoop@hadoop614 Demonstration]$ cd spark-2.4.2

2. 修改版本号为固定版本,避免编译时脚本自动获取

[hadoop@hadoop614 spark-2.4.2]$ vim dev/make-distribution.sh
**修改**
VERSION=$("$MVN" help:evaluate -Dexpression=project.version $@ 2>/dev/null\
    | grep -v "INFO"\
    | grep -v "WARNING"\
    | tail -n 1)
SCALA_VERSION=$("$MVN" help:evaluate -Dexpression=scala.binary.version $@ 2>/dev/null\
    | grep -v "INFO"\
    | grep -v "WARNING"\
    | tail -n 1)
SPARK_HADOOP_VERSION=$("$MVN" help:evaluate -Dexpression=hadoop.version $@ 2>/dev/null\
    | grep -v "INFO"\
    | grep -v "WARNING"\
    | tail -n 1)
SPARK_HIVE=$("$MVN" help:evaluate -Dexpression=project.activeProfiles -pl sql/hive $@ 2>/dev/null\
    | grep -v "INFO"\
    | grep -v "WARNING"\
    | fgrep --count "<id>hive</id>";\
    # Reset exit status to 0, otherwise the script stops here if the last grep finds nothing\
    # because we use "set -o pipefail"
    echo -n)

**修改为**
VERSION=2.4.2
SCALA_VERSION=2.11
SPARK_HADOOP_VERSION=2.6.0-cdh5.7.0
SPARK_HIVE=1

3. 修改pom文件

在 <repositories> </repositories>块中添加一下内容，<id>central</id>部分的内容地须在第一个位置

[hadoop@hadoop614 spark-2.4.2]$ vim pom.xml 

  <repositories>
 .......
  <repository>
       <id>cloudera</id>
       <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
   </repository>
  </repositories>

在编译的过程中如果出现以下报错的处理方法

[ERROR] Plugin org.codehaus.mojo:build-helper-maven-plugin:3.0.0 or one of its dependencies could not be resolved: Failed to read artifact descriptor for org.codehaus.mojo:build-helper-maven-plugin:jar:3.0.0: Could not transfer artifact org.codehaus.mojo:build-helper-maven-plugin:pom:3.0.0 from/to central (http://maven.aliyun.com/nexus/content/groups/public): maven.aliyun.com:80 failed to respond -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginResolutionException

请在pom文件中添加一下配置

<dependency>
    <groupId>org.codehaus.mojo</groupId>
    <artifactId>build-helper-maven-plugin</artifactId>
    <version>3.0.0</version>
</dependency>

4. 编译命令

编译时间很长，我是用的时阿里云私有maven仓库，所有编译下载过程大概在40分钟左右。

[hadoop@hadoop614 spark-2.4.2]$ pwd
/home/hadoop/Demonstration/spark-2.4.2
[hadoop@hadoop614 spark-2.4.2]$ ./dev/make-distribution.sh --name 2.6.0-cdh5.7.0 --tgz -Phadoop-2.6 -Dhadoop.version=2.6.0-cdh5.7.0 -Phive -Phive-thriftserver -Pyarn -Pkubernetes

解压部署

解压

[hadoop@hadoop614 spark-2.4.2]$ ll spark-2.4.2-bin-2.6.0-cdh5.7.0.tgz 
-rw-rw-r--. 1 hadoop hadoop 231193116 4月  28 06:32 spark-2.4.2-bin-2.6.0-cdh5.7.0.tgz
[hadoop@hadoop614 spark-2.4.2]$ pwd
/home/hadoop/Demonstration/spark-2.4.2
[hadoop@hadoop614 spark-2.4.2]$ tar -zxvf spark-2.4.2-bin-2.6.0-cdh5.7.0.tgz -C ~/app
[hadoop@hadoop614 spark-2.4.2]$ cd ~/app
[hadoop@hadoop614 app]$ ls -ld spark-2.4.2-bin-2.6.0-cdh5.7.0/
drwxrwxr-x. 11 hadoop hadoop 4096 4月  28 06:31 spark-2.4.2-bin-2.6.0-cdh5.7.0/

配置环境变量

[hadoop@hadoop614 app]$ vim ~/.bash_profile 

export SPARK_HOME=/home/hadoop/app/spark-2.4.2-bin-2.6.0-cdh5.7.0
export PATH=${SPARK_HOME}/bin:$PATH

[hadoop@hadoop614 app]$ source ~/.bash_profile

启动Spark

[hadoop@hadoop614 app]$ spark-shell 
19/04/28 06:44:16 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://hadoop614:4040
Spark context available as 'sc' (master = local[*], app id = local-1556405067469).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.2
      /_/
         
Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_45)
Type in expressions to have them evaluated.
Type :help for more information.

scala>