Hdp2.4集成spark2
集成步骤
1. 从官网下载http://spark.apache.org/downloads.html 下载spark2.3 包
2. 把spark2.3包上传到需要安装的机器上。
cd /usr/hdp/2.4.0.0-169
tar -zxvf spark-2.3.0-bin-hadoop2.7.tgz
mv spark-2.3.0-bin-hadoop2.7 spark2
3. 修改spark2 的用户名和用户组chown -R root:root *
4. 创建软连接指向spark2实际目录。
ln -s spark2-client /usr/hdp/2.4.0.0-169/spark2
ln -s spark2-historyserver /usr/hdp/2.4.0.0-169/spark2
ln -s spark2-thriftserver /usr/hdp/2.4.0.0-169/spark2
5. 进入spark2修改conf目录下的配置文件。
cd conf
cp spark-env.sh.template spark-env.sh
cp spark-defaults.conf.template spark-defaults.conf
6修改文件 vi spark-env.sh 。在起文件末尾添加内容
# Alternate conf dir. (Default: ${SPARK_HOME}/conf)
export SPARK_CONF_DIR=${SPARK_CONF_DIR:-/usr/hdp/current/spark2-historyserver/conf}
# Where log files are stored.(Default:${SPARK_HOME}/logs)
#export SPARK_LOG_DIR=${SPARK_HOME:-/usr/hdp/current/spark2-historyserver}/logs
export SPARK_LOG_DIR=/var/log/spark2
# Where the pid file is stored. (Default: /tmp)
export SPARK_PID_DIR=/var/run/spark2
#Memory for Master, Worker and history server (default: 1024MB)
export SPARK_DAEMON_MEMORY=1024m
# A string representing this instance of spark.(Default: $USER)
SPARK_IDENT_STRING=$USER
# The scheduling priority for daemons. (Default: 0)
SPARK_NICENESS=0
export HADOOP_HOME=${HADOOP_HOME:-/usr/hdp/current/hadoop-client}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/usr/hdp/current/hadoop-client/conf}
# The java implementation to use.
export JAVA_HOME=/usr/jdk64/jdk1.8.0_60
7修改vi spark-defaults.conf 的配置文件内容。在文件结尾添加如下内容
spark.driver.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native
spark.executor.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native
spark.eventLog.dir hdfs:///spark2-history
spark.eventLog.enabled true
# Required: setting this parameter to 'false' turns off ATS timeline server for Spark
spark.hadoop.yarn.timeline-service.enabled false
spark.driver.extraJavaOptions -Dhdp.version=2.4.0.0-169
spark.yarn.am.extraJavaOptions -Dhdp.version=2.4.0.0-169
spark.history.fs.logDirectory hdfs:///spark2-history
#spark.history.kerberos.keytab none
#spark.history.kerberos.principal none
#spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider
#spark.history.ui.port 18080
spark.yarn.containerLauncherMaxThreads 25
spark.yarn.driver.memoryOverhead 200
spark.yarn.executor.memoryOverhead 200
#spark.yarn.historyServer.address sandbox.hortonworks.com:18080
spark.yarn.max.executor.failures 3
spark.yarn.preserve.staging.files false
spark.yarn.queue default
spark.yarn.scheduler.heartbeat.interval-ms 5000
spark.yarn.submit.file.replication 3
spark.ui.port 4041
8. 在ambari界面修改yarn的参数。
yarn.scheduler.maximum-allocation-mb = 2500MB
yarn.nodemanager.resource.memory-mb = 7800MB
9测试hdp集成spark2
提交job测试
spark2运行测试:
export SPARK_MAJOR_VERSION=2
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn-client \
--num-executors 3 \
--driver-memory 512m \
--executor-memory 512m \
--executor-cores 1 \
examples/jars/spark-examples*.jar 10
./bin/spark-submit \
--class org.apache.spark.examples.SparkTC \
--master yarn-client \
--num-executors 3 \
--driver-memory 512m \
--executor-memory 512m \
--executor-cores 1 \
examples/jars/spark-examples*.jar 10
10.在ambari页面yarn 参考job运行状态。
参考链接:
Ps:
如果遇到HDFS写权限问题,可以转换角色。或者在设置权限
dfs.permissions.enabled=false
Over
2018.6.11