文章目录
下载
- 百度云下载:链接:https://pan.baidu.com/s/1IvKxR-dx1MgGcaxtEHUVTQ 提取码:8icm
- 官方下载:https://archive.apache.org/dist/spark/spark-2.4.2/spark-2.4.2.tgz
软件环境
软件 | Hadoop | scala | maven | JDK |
---|---|---|---|---|
版本 | 2.6.0-cdh5.7.0 | 2.11.12 | 3.6.1 | jdk1.8.0_45 |
编译与配置
1. 解压Spark源码
[hadoop@hadoop614 Demonstration]$ ll spark-2.4.2.tgz
-rw-r--r--. 1 hadoop hadoop 16165557 4月 28 04:41 spark-2.4.2.tgz
[hadoop@hadoop614 Demonstration]$ tar -zxvf spark-2.4.2.tgz
[hadoop@hadoop614 Demonstration]$ cd spark-2.4.2
2. 修改版本号为固定版本,避免编译时脚本自动获取
[hadoop@hadoop614 spark-2.4.2]$ vim dev/make-distribution.sh
**修改**
VERSION=$("$MVN" help:evaluate -Dexpression=project.version $@ 2>/dev/null\
| grep -v "INFO"\
| grep -v "WARNING"\
| tail -n 1)
SCALA_VERSION=$("$MVN" help:evaluate -Dexpression=scala.binary.version $@ 2>/dev/null\
| grep -v "INFO"\
| grep -v "WARNING"\
| tail -n 1)
SPARK_HADOOP_VERSION=$("$MVN" help:evaluate -Dexpression=hadoop.version $@ 2>/dev/null\
| grep -v "INFO"\
| grep -v "WARNING"\
| tail -n 1)
SPARK_HIVE=$("$MVN" help:evaluate -Dexpression=project.activeProfiles -pl sql/hive $@ 2>/dev/null\
| grep -v "INFO"\
| grep -v "WARNING"\
| fgrep --count "<id>hive</id>";\
# Reset exit status to 0, otherwise the script stops here if the last grep finds nothing\
# because we use "set -o pipefail"
echo -n)
**修改为**
VERSION=2.4.2
SCALA_VERSION=2.11
SPARK_HADOOP_VERSION=2.6.0-cdh5.7.0
SPARK_HIVE=1
3. 修改pom文件
- 在
<repositories> </repositories>
块中添加一下内容,<id>central</id>
部分的内容地须在第一个位置
[hadoop@hadoop614 spark-2.4.2]$ vim pom.xml
<repositories>
.......
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
</repositories>
在编译的过程中如果出现以下报错的处理方法
[ERROR] Plugin org.codehaus.mojo:build-helper-maven-plugin:3.0.0 or one of its dependencies could not be resolved: Failed to read artifact descriptor for org.codehaus.mojo:build-helper-maven-plugin:jar:3.0.0: Could not transfer artifact org.codehaus.mojo:build-helper-maven-plugin:pom:3.0.0 from/to central (http://maven.aliyun.com/nexus/content/groups/public): maven.aliyun.com:80 failed to respond -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginResolutionException
请在pom文件中添加一下配置
<dependency>
<groupId>org.codehaus.mojo</groupId>
<artifactId>build-helper-maven-plugin</artifactId>
<version>3.0.0</version>
</dependency>
4. 编译命令
编译时间很长,我是用的时阿里云私有maven仓库,所有编译下载过程大概在40分钟左右。
[hadoop@hadoop614 spark-2.4.2]$ pwd
/home/hadoop/Demonstration/spark-2.4.2
[hadoop@hadoop614 spark-2.4.2]$ ./dev/make-distribution.sh --name 2.6.0-cdh5.7.0 --tgz -Phadoop-2.6 -Dhadoop.version=2.6.0-cdh5.7.0 -Phive -Phive-thriftserver -Pyarn -Pkubernetes
解压部署
- 解压
[hadoop@hadoop614 spark-2.4.2]$ ll spark-2.4.2-bin-2.6.0-cdh5.7.0.tgz
-rw-rw-r--. 1 hadoop hadoop 231193116 4月 28 06:32 spark-2.4.2-bin-2.6.0-cdh5.7.0.tgz
[hadoop@hadoop614 spark-2.4.2]$ pwd
/home/hadoop/Demonstration/spark-2.4.2
[hadoop@hadoop614 spark-2.4.2]$ tar -zxvf spark-2.4.2-bin-2.6.0-cdh5.7.0.tgz -C ~/app
[hadoop@hadoop614 spark-2.4.2]$ cd ~/app
[hadoop@hadoop614 app]$ ls -ld spark-2.4.2-bin-2.6.0-cdh5.7.0/
drwxrwxr-x. 11 hadoop hadoop 4096 4月 28 06:31 spark-2.4.2-bin-2.6.0-cdh5.7.0/
- 配置环境变量
[hadoop@hadoop614 app]$ vim ~/.bash_profile
export SPARK_HOME=/home/hadoop/app/spark-2.4.2-bin-2.6.0-cdh5.7.0
export PATH=${SPARK_HOME}/bin:$PATH
[hadoop@hadoop614 app]$ source ~/.bash_profile
启动Spark
[hadoop@hadoop614 app]$ spark-shell
19/04/28 06:44:16 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://hadoop614:4040
Spark context available as 'sc' (master = local[*], app id = local-1556405067469).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.4.2
/_/
Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_45)
Type in expressions to have them evaluated.
Type :help for more information.
scala>