1、安装hadoop-2.2.0
下载hadoop2.2版本。地址:http://apache.dataguru.cn/hadoop/common/hadoop-2.2.0/hadoop-2.2.0.tar.gz
执行tar zxf hadoop-2.2.0.tar.gz解压至当前目录/home/hduser目录下。
mv hadoop-2.2.0 ~/app/
2、配置hadoop
(1)java home
echo $JAVA_HOME
/usr/lib/jvm/java-7-oracle
编辑/home/hduser/hadoop/etc/hadoop/hadoop-env.sh
替换exportJAVA_HOME=${JAVA_HOME}为如下:
exportJAVA_HOME=/usr/lib/jvm/java-7-oracle
(2)配置core-site
编辑/home/hduser/hadoop/etc/hadoop/core-site.xml
在configuration节点下添加:
hadoop.tmp.dir
~/app/hadoop-2.2.0/tmp
A base for other temporarydirectories.
fs.default.name
hdfs://localhost:8010
The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.
(3)配置mapred-site
编辑/home/hduser/hadoop/etc/hadoop/mapred-site.xml:
~/app/hadoop-2.2.0/etc/hadoop# cp mapred-site.xml.template mapred-site.xml
在configuration节点下添加
mapred.job.tracker
localhost:54311
The host and port that the MapReduce job tracker runs
at. If "local", thenjobs are run in-process as a single map
and reduce task.
mapred.map.tasks
10
As a rule of thumb, use 10x the number of slaves(i.e., number of tasktrackers).
mapred.reduce.tasks
2
As a rule of thumb, use 2x the number of slaveprocessors (i.e., number of tasktrackers).
(4)配置hdfs-site
编辑/home/hduser/hadoop/etc/hadoop/hdfs-site.xml
在configuration节点下添加
dfs.replication
1
Default block replication.
The actual number of replications can be specified when the file iscreated.
The default is used if replication is not specified in create time.
3、运行hadoop
(1)初始化
~/app/hadoop-2.2.0/bin# ./hdfs namenode -format
如果执行成功,你会在日志中(倒数几行)找到如下成功的提示信息:
common.Storage: Storage directory/home/hduser/hadoop/tmp/hadoop-hduser/dfs/name has been successfully formatted.
(2)配置ssh
ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
ssh localhost
(3)配置/etc/profile
export HADOOP_HOME=/root/app/hadoop-2.2.0
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOOME/sbin:$HADOOP_HOME/lib
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
(4)启动dfs
~/app/hadoop-2.2.0/sbin# ./start-dfs.sh
jps看启动成功与否
15021 NameNode
15767 SecondaryNameNode
15123 DataNode
(5)启动yarn
~/app/hadoop-2.2.0/sbin# ./start-yarn.sh
jps
15021 NameNode
16052 NodeManager
15767 SecondaryNameNode
15123 DataNode
15952 ResourceManager
(6)查看资源管理器
(7)测试
bin/hdfs dfs -mkdir /test
~/app/hadoop-2.2.0/bin# hdfs dfs -copyFromLocal ~/app/hadoop-2.2.0/pg20417.txt /test
4、停止hadoop
若停止hadoop,依次运行如下命令:
$./stop-yarn.sh
$./stop-dfs.sh
5、运行spark-on-yarn
(1)在yarn cluster模式下启动spark
~/app/spark-1.0.0-bin-hadoop2/bin# ./spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --dirver-memory 1g --executor-memory 1g -executor-cores 1 ~/app/spark-1.0.0-bin-hadoop2/lib/spark-examples*.jar 10
(2)client模式
./bin/spark-shell --master yarn-client
(3)查看结果
参考