1、下载安装包
jdk和scala语言包:
jdk-7u79-linux-x64.gz
scala-2.10.4.tgz
hadoop和spark安装包:
hadoop-2.6.0.tar.gz
spark-1.3.1-bin-hadoop2.6.tgz
还有开发工具安装包:
ideaIC-14.1.3.tar.gz
2、新建hadoop用户
3、安装jdk
解压 tar -xzf jdk-7u79-linux-x64.gz
移动 mv jdk1.7.0_79 /usr/lib
配置环境变量,编辑/etc/profile文件,在尾部加上如下内容:
export JAVA_HOME=/usr/lib/jdk1.7.0_79
export JAVA_JRE=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$JAVA_JRE/bin:$PATH:.
生效 source /etc/profile
3、安装scala
解压 tar -xzf scala-2.10.4.tgz
移动 mv scala-2.10.4 /usr/local
配置环境变量,编辑/etc/profile文件,在尾部加上如下内容:
export SCALA_HOME=/usr/local/scala-2.10.4/
export PATH=$SCALA_HOME/bin:$PATH
生效 source /etc/profile
4、安装hadoop
解压 tar -xzf hadoop-2.6.0.tar.gz
移动 mv hadoop-2.6.0 /usr/local
配置环境变量,编辑/etc/profile文件,在尾部加上如下内容:
export HADOOP_HOME=/usr/local/hadoop-2.6.0
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
生效 source /etc/profile
修改配置文件,进入/usr/local/hadoop-2.6.0/etc/hadoop目录,
修改hadoop-env.sh和yarn-env.sh
export JAVA_HOME=/usr/lib/jdk1.7.0_79
修改core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/hdfs/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
修改hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/hdfs/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/hdfs/dfs/data</value>
</property>
<property>
<name>dfs.support.append</name>
<value>true</value>
</property>
</configuration>
修改yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
修改mapred-site.xml
<configuration>
<property>
<name>mapreduce.jobtracker.address</name>
<value>hdfs://localhost:9001</value>
</property>
</configuration>
修改masters和slaves
localhost
创建目录,并调整权限
mkdir -p /hdfs/tmp /hdfs/dfs/name /hdfs/dfs/data
chown -R hadoop.hadoop /hdfs
chown -R hadoop.hadoop /usr/local/hadoop-2.6.0/
初始化(以hadoop用户)
hadoop namenode -format
然后可以在/usr/local/hadoop-2.6.0/sbin目录,以hadoop用户启动hadoop服务
./start-dfs.sh
./start-yarn.sh
停止就是对应的stop脚本。
执行测试命令
hadoop fs -ls /
hadoop fs -mkdir /hadoop
5、安装spark
解压 tar -xzf spark-1.3.1-bin-hadoop2.6.tgz
移动 mv spark-1.3.1-bin-hadoop2.6 /usr/local
配置环境变量,编辑/etc/profile文件,在尾部加上如下内容:
export SPARK_HOME=/usr/local/spark-1.3.1-bin-hadoop2.6
export PATH=$SPARK_HOME/bin:$PATH
生效 source /etc/profile
修改配置,进入/usr/local/spark-1.3.1-bin-hadoop2.6/conf目录,
拷贝 cp spark-env.sh.template spark-env.sh
修改spark-env.sh,在尾部追加如下内容:
export SCALA_HOME=/usr/local/scala-2.10.4
export JAVA_HOME=/usr/lib/jdk1.7.0_79
export SPARK_MASTER_IP=localhost
export SPARK_WORKER_MEMORY=1000m
调整目录权限 chown -R hadoop.hadoop /usr/local/spark-1.3.1-bin-hadoop2.6/;
启动,进入/usr/local/spark-1.3.1-bin-hadoop2.6/sbin,以hadoop用户运行./start-all.sh。
6、idea安装和配置
解压 tar -xzf ideaIC-14.1.3.tar.gz
运行 双击idea.sh
安装scala
新建scala项目,File->Project Structure中引入scala-sdk-2.10.4和spark-assembly-1.3.1-hadoop2.6.0,然后创建scala object如下:
import scala.math.random
import org.apache.spark._
/** Computes an approximation to pi */
object SparkPi {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("SparkPi").setMaster("local")
val spark = new SparkContext(conf)
val slices = if (args.length > 0) args(0).toInt else 2
val n = math.min(100000L * slices, Int.MaxValue).toInt // avoid overflow
val count = spark.parallelize(1 until n, slices).map { i =>
val x = random * 2 - 1
val y = random * 2 - 1
if (x*x + y*y < 1) 1 else 0
}.reduce(_ + _)
println("Pi is roughly " + 4.0 * count / n)
spark.stop()
}
}
然后在Run->Edit Configurations中配置Application,选择Main class,然后就可以运行这个例子。
看到日志里
Pi is roughly 3.14278
就可以了。