Spark1.5.2 on Hadoop2.4.0 安装配置

16 篇文章 0 订阅
16 篇文章 0 订阅

一、           前言

 本文简单介绍yarn安装,主要介绍spark1.5.2on yarn模式安装,仅供参考。

 

二、           yarn配置

1.     yarne.xml

yarne.xml需要添加的配置如下:

    <property>

         <name>yarn.nodemanager.aux-services</name>

         <value>mapreduce_shuffle,spark_shuffle</value>

     </property>

     <property>

          <name>yarn.nodemanager.aux-services.spark_shuffle.class</name>

          <value>org.apache.spark.network.yarn.YarnShuffleService</value>

      </property>

 

  如果配置了yarn.log-aggregation-enable,一定要设置yarn.log.server.url,不然spark历史任务无法查看:

      

<property> 

      <name>yarn.log-aggregation-enable</name> 

       <value>true</value>

</property>

<property> 

   <name>yarn.log.server.url</name> 

   <value>http://namenode1:19888/jobhistory/logs</value> 

     </property>

 

2.     spark-1.5.2-yarn-shuffle.jar

将spark-1.5.2-yarn-shuffle.jar拷贝所有nodemanager的classpath下面:

3.     重启所有nodemanager

三、           Spark

1.     安装hadoop 客户端

  添加如下内容到.bashrc

JAVA_HOME=/home/spark/software/java

HADOOP_HOME=/home/bigdata/software/hadoop

SPARK_HOME=/home/spark/software/spark

R_HOME=/home/spark/software/R

PATH=$R_HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATH

export LANG JAVA_HOME HADOOP_HOME SPARK_HOME HIVE_HOME R_HOME PATH CLASSPATH

 

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

export YARN_CONF_DIR=/home/yarn/software/hadoop/etc/hadoop

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_COMMON_LIB_NATIVE_DIR

export PATH="/home/spark/software/anaconda/bin:$PATH"

export HADOOP_COMMON_LIB_NATIVE_DIR=/home/bigdata/software/hadoop/lib/native/Linux-amd64-64

export HADOOP_CLIENT_OPTS="-Djava.library.path=$HADOOP_COMMON_LIB_NATIVE_DIR"

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_COMMON_LIB_NATIVE_DIR

 

 

 

2.     配置spark-defaults.conf

# This is useful for setting default environmental settings.

spark.serializer                  org.apache.spark.serializer.KryoSerializer

spark.eventLog.enabled              true

#spark.eventLog.dir                  hdfs://SuningHadoop2/sparklogs/sparklogs1.4.0

spark.eventLog.dir                   hdfs:///sparklogs/sparklogshistorylogpre

spark.driver.cores                   1

spark.driver.memory                 4096m

# Tuning parameters

spark.shuffle.consolidateFiles      true

spark.sql.shuffle.partitions        40

spark.default.parallelism           20

#spark.cores.max                     2

spark.shuffle.consolidateFiles      true

 

[spark@spark-pre1 conf]$ vim spark-defaults.conf

# Default system properties included when running spark-submit.

# This is useful for setting default environmental settings.

 

# Example:

# spark.master                     spark://master:7077

# spark.eventLog.enabled           true

# spark.eventLog.dir               hdfs://namenode:8021/directory

spark.serializer                  org.apache.spark.serializer.KryoSerializer

# spark.driver.memory              5g

# spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"

 

spark.eventLog.enabled              true

spark.eventLog.dir                   hdfs:///sparklogs/sparklogshistorylogpre

spark.driver.cores                  1

spark.driver.memory                 4096m

# Tuning parameters

spark.shuffle.consolidateFiles      true

spark.sql.shuffle.partitions        40

spark.default.parallelism           20

    spark.shuffle.consolidateFiles      true

 

3.     配置spark-env.sh

export SPARK_LOCAL_DIRS=/data/spark/sparkLocalDir

export HADOOP_CONF_DIR=/home/bigdata/software/hadoop/etc/hadoop

export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:/home/bigdata/software/hadoop/lib/native/Linux-amd64-64/

export SPARK_HISTORY_OPTS="-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/spark/spark/logs/historyserver.hprof -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/home/spark/spark/logs/historyserver.gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=512M -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=19229 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false   -Dspark.history.fs.logDirectory=hdfs://SuningHadoop2/sparklogs/sparklogshistorylogpre  -Dspark.history.ui.port=8078

 

4.     启动HistoryServer

start-history-server.sh


四、           测试脚本

yarnmode=yarn-cluster

mytime=`date '+%s'`

datasizeG=100G

inputfiles=/user/spark/TeraGen$mytime

outputfiles=/user/spark/TeraSort$mytime

Validatefiles=/user/spark/TeraValidate$mytime

numexecutors=20

executormemory=8G

export HADOOP_USER_NAME=spark

sparkqueue=spark

sparkconf="spark.default.parallelism=2000"

classTeraGen=com.github.ehiggs.spark.terasort.TeraGen

classTeraSort=com.github.ehiggs.spark.terasort.TeraSort

classTeraValidate=com.github.ehiggs.spark.terasort.TeraValidate

sourcejar=/home/spark/workspace/tersort/spark-terasort-1.0-SNAPSHOT-jar-with-dependencies.jar

 

spark-submit --master $yarnmode              \

             --supervise                      \

             --num-executors  $numexecutors    \

             --executor-memory $executormemory  \

             --queue  $sparkqueue               \

             --conf $sparkconf                   \

              --class $classTeraGen  $sourcejar  $datasizeG $inputfiles

beginTime=`date '+%s'`

spark-submit   --master $yarnmode              \

               --supervise                      \

               --queue  $sparkqueue              \

               --executor-memory $executormemory \

               --num-executors $numexecutors      \

               --conf $sparkconf  --class $classTeraSort $sourcejar  $inputfiles $outputfiles

endTime=`date '+%s'`

speadtime=$((endTime-beginTime))

spark-submit --master $yarnmode              \

             --supervise                      \

             --executor-memory $executormemory \

             --queue  $sparkqueue              \

             --num-executors $numexecutors      \

             --conf $sparkconf                   \

             --class $classTeraValidate $sourcejar $outputfiles $classTeraValidate

echo times:$speadtime

hadoop fs -rm -r $inputfiles

hadoop fs -rm -r $outputfiles

hadoop fs -rm -r $Validatefiles

 

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值