1.安装scala
首先在hadoop1上进行操作,下载并解压安装包:scala-2.11.8.tgz,配置好scala的环境变量
export SCALA_HOME=/home/hadoop273/spark/scala-2.11.8
export PATH=$PATH:$SCALA_HOME:/bin
2.安装spark
下载并解压spark安装包:spark-1.6.2-bin-without-hadoop.tgz,配置好spark的环境变量
export SPARK_HOME=/home/hadoop273/spark/spark-1.6.1-bin-without-hadoop
export PATH=$PATH:$SPARK_HOME/bin
3.配置spark-env.sh
export JAVA_HOME=/data/java/jdk1.7.0_79
export SCALA_HOME=/home/hadoop273/spark/scala-2.11.8
export HADOOP_HOME=/home/hadoop273/hadoop/hadoop-2.7.3
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
4.编辑slaves文件
hadoop2
5.分发安装包
将scala安装包和spark安装包分发到hadoop2机器的相同目录下,并配置好hadoop2上的scala和spark环境变量
6.启动hdfs
start-dfs.sh
7.启动spark
sbin/stat-all.sh
这一步报错如下:
Spark Command: /data/java/jdk1.7.0_79/bin/java -cp /home/hadoop273/spark/spark-1.6.2-bin-without-hadoop/conf/:/home/hadoop273/spark/spark-1.6.2-bin-without-hadoop/lib/spark-assembly-1.6.2-hadoop2.2.0.jar:/home/hadoop273/hadoop/hadoop-2.7.3/etc/hadoop/ -Xms1g -Xmx1g -XX:MaxPermSize=256m org.apache.spark.deploy.master.Master --ip hadoop1 --port 7077 --webui-port 8080
========================================
Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/Logger
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2615)
at java.lang.Class.getMethod0(Class.java:2856)
at java.lang.Class.getMethod(Class.java:1668)
at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486)
Caused by: java.lang.ClassNotFoundException: org.slf4j.Logger
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 6 more
从stackOverFlow上找到了如下的解决方案:
An easy fix could be use the classpath from hadoop classpath command:
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
修改后启动日志如下:
starting org.apache.spark.deploy.master.Master, logging to /home/hadoop273/spark/spark-1.6.2-bin-without-hadoop/logs/spark-hadoop273-org.apache.spark.deploy.master.Master-1-hadoop1.out
hadoop1: /home/hadoop273/spark/spark-1.6.2-bin-without-hadoop/conf/spark-env.sh: line 74: hadoop: command not found
hadoop1: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop273/spark/spark-1.6.2-bin-without-hadoop/logs/spark-hadoop273-org.apache.spark.deploy.worker.Worker-1-hadoop1.out
hadoop1: failed to launch org.apache.spark.deploy.worker.Worker:
hadoop1: at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
hadoop1: ... 6 more
hadoop1: full log in /home/hadoop273/spark/spark-1.6.2-bin-without-hadoop/logs/spark-hadoop273-org.apache.spark.deploy.worker.Worker-1-hadoop1.out
查看日志可以发现,hadoop命令没有找到,因此,需要将hadoop改成全路径,修改后的结果如下:
export SPARK_DIST_CLASSPATH=$(/home/hadoop273/hadoop/hadoop-2.7.3/bin/hadoop classpath)
再次启动spark,启动成功,结果如下:
starting org.apache.spark.deploy.master.Master, logging to /home/hadoop273/spark/spark-1.6.1-bin-without-hadoop/logs/spark-hadoop273-org.apache.spark.deploy.master.Master-1-hadoop1.out
hadoop2: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop273/spark/spark-1.6.1-bin-without-hadoop/logs/spark-hadoop273-org.apache.spark.deploy.worker.Worker-1-hadoop2.out