安装scala
下载scala-2.11.4
解压
配置环境变量
SCALA_HOME=/home/hadoop-cdh/app/test/scala-2.11.4
PATH=$PATH:$SCALA_HOME/bin
安装spark
下载
spark-1.2.0-bin-hadoop2.3 (需要和hadoop版本对应,不然会有protocal的错)
解压
配置spark-env.sh
export JAVA_HOME=/home/hadoop-cdh/java/jdk1.7.0_06
export SCALA_HOME=/home/hadoop-cdh/app/test/scala-2.11.4
export HADOOP_HOME=/home/hadoop-cdh/app/hadoop-2.3.0-cdh5.1.0
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_JAR=/home/hadoop-cdh/app/test/spark-1.2.0-bin-hadoop2.3/lib/spark-assembly-1.2.0-hadoop2.3.0.jar
export SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/hadoop-cdh/app/hadoop-2.3.0-cdh5.1.0/share/hadoop/common/hadoop-lzo-0.4.20-SNAPSHOT.jar
配置spark-default.conf(先要mkdir spark.eventLog.dir)
spark.eventLog.dir=/home/hadoop-cdh/app/test/spark-1.2.0-bin-hadoop2.3/applicationHistory
spark.eventLog.enabled=true
spark.yarn.historyServer.address=http://HISTORY_HOST:HISTORY_PORT
配置slaves
host143
host144
启动
start-all.sh
会出现Master Worker进行
运行spark-shell
bin/spark-shell --executor-memory 1g --driver-memory 1g --master spark://host143:7077
测试语句:(需要把word.txt上传到hdfs上,word.txt里是字母,空格隔开)
sc
val file = sc.textFile("hdfs://xxx/user/dirk.zhang/data/word.txt")
val count = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_+_)
参考
http://blog.csdn.net/zwx19921215/article/details/41821147
http://www.tuicool.com/articles/BfUR73