Hadoop Single Node Cluster的安装
Hadoop Single Node Cluster只以一台机器来建立Hadopp环境,我们仍然可以使用Hadoop命令。所以所有功能都集中在这台服务器中。
安装JDK
因为Hadoop是以Java开发的,必须先安装Java环境
hduser@hduser:~$ java -version
程序 'java' 已包含在下列软件包中:
* default-jre
* gcj-4.8-jre-headless
* openjdk-7-jre-headless
* gcj-4.6-jre-headless
* openjdk-6-jre-headless
请尝试:sudo apt-get install <选定的软件包>
说明Java并没有安装
更新软件包信息
sudo apt-get update
安装JDK
sudo apt-get install default-jdk
在安装完之后,查看Java的版本
hduser@hduser:~$ java -version
java version "1.7.0_201"
OpenJDK Runtime Environment (IcedTea 2.6.17) (7u211-2.6.17-0ubuntu0.1)
OpenJDK 64-Bit Server VM (build 24.201-b00, mixed mode)
查询Java安装路径
hduser@hduser:~$ update-alternatives --display java
java - 自动模式
链接目前指向 /usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java
/usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java - 优先级 1071
slave java.1.gz:/usr/lib/jvm/java-7-openjdk-amd64/jre/man/man1/java.1.gz
目前“最佳”的版本为 /usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java。
因此Java安装在/usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java
设置SSH无密码登录
Hadoop是由很多台服务器所组成的,当我们启动Hadoop的时候,NameNode必须与DataNode连接并管理这些节点(DataNode)。此时系统会要求用户输入密码。为了让系统顺利运行而不需要手动输入密码,需要将SSH设置成无密码登录。无密码登录不等于不需要密码,而是需要事先SSH Key(密匙)来进行身份验证。利用SSH协议可以防止远程管理系统时信息外泄问题。
sudo apt-get install ssh
安装rsync
sudo apt-get install rsync
产生SSH Key(密匙)
hduser@hduser:~$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
Generating public/private dsa key pair.
Created directory '/home/hduser/.ssh'.
Your identification has been saved in /home/hduser/.ssh/id_dsa.
Your public key has been saved in /home/hduser/.ssh/id_dsa.pub.
The key fingerprint is:
c9:f0:c3:21:5f:11:40:0c:44:e7:88:3d:d6:0d:b6:23 hduser@hduser
The key's randomart image is:
+--[ DSA 1024]----+
| o++*.o. |
| o *.+ . |
| . E * o |
| . X = |
| S |
| . |
| |
| |
| |
+-----------------+
- -t 指令密匙类型
- -C 设置注释文字
- -f 指定密匙文件存储名
- -P 提供密语
查看SSH Key密匙
hduser@hduser:~$ ll ~/.ssh
总用量 16
drwx------ 2 hduser hduser 4096 10月 17 19:03 ./
drwxr-xr-x 17 hduser hduser 4096 10月 17 19:03 ../
-rw------- 1 hduser hduser 672 10月 17 19:03 id_dsa
-rw-r--r-- 1 hduser hduser 603 10月 17 19:03 id_dsa.pub
将产生的Key放置到许可证文件中
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
hduser@hduser:~$ ll ~/.ssh
总用量 20
drwx------ 2 hduser hduser 4096 10月 17 19:35 ./
drwxr-xr-x 17 hduser hduser 4096 10月 17 19:03 ../
-rw-rw-r-- 1 hduser hduser 603 10月 17 19:35 authorized_keys
-rw------- 1 hduser hduser 672 10月 17 19:03 id_dsa
-rw-r--r-- 1 hduser hduser 603 10月 17 19:03 id_dsa.pub
下载Hadoop
到Hadoop官网下载版本,并安装到Ubuntu中。
下载Hadoop-2.6.4.tar.gzwget
https://archive.apache.org/dist/hadoop/common/hadoop-2.6.4/hadoop-2.6.4.tar.gz
wget https://archive.apache.org/dist/hadoop/common/hadoop-2.6.4/hadoop-2.6.4.tar.gz
解压缩Hadoop搭配hadoop目录
sudo tar -zxvf hadoop-2.6.4.tar.gz
移动hadooop 到/usr/local
sudo mv hadoop-2.6.4 /usr/local/hadoop
显示目录
hduser@hduser:/usr/local/hadoop$ ll
总用量 60
drwxr-xr-x 9 10021 10021 4096 2月 12 2016 ./
drwxr-xr-x 11 root root 4096 10月 17 19:51 ../
drwxr-xr-x 2 10021 10021 4096 2月 12 2016 bin/
drwxr-xr-x 3 10021 10021 4096 2月 12 2016 etc/
drwxr-xr-x 2 10021 10021 4096 2月 12 2016 include/
drwxr-xr-x 3 10021 10021 4096 2月 12 2016 lib/
drwxr-xr-x 2 10021 10021 4096 2月 12 2016 libexec/
-rw-r--r-- 1 10021 10021 15429 2月 12 2016 LICENSE.txt
-rw-r--r-- 1 10021 10021 101 2月 12 2016 NOTICE.txt
-rw-r--r-- 1 10021 10021 1366 2月 12 2016 README.txt
drwxr-xr-x 2 10021 10021 4096 2月 12 2016 sbin/
drwxr-xr-x 4 10021 10021 4096 2月 12 2016 share/
目录 | 说明 |
---|---|
bin/ | 各项运行文件、包括Hadoop、HDFS、YARN等 |
sbin/ | 运行shell运行文件,包括start-all.sh、stop-all.sh |
etc/ | etc/hadoop子目录包含Hadoop配置文件。例如Hadoop-env.sh、core-site.xml、YARN-site.xml、mapred-site.xml、hdfs-site.xml |
lib/ | Hadoop函数库 |
logs/ | 系统日志,可以查看系统运行状况,运行有问题可以从日志找出错误原因 |
设置Hadoop环境变量
运行Hadoop必须设置很多环境变量,如果每次登录时都必须重新的设置就会很麻烦。因此我们可以在~/.bashrc
文件中设置每次登录时都会自动运行一次环境变量设置。
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOPP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export JAVA_LIBRARY_PATH=$HADOOOP_HOME/lib/native:$JAVA_LIBRARY_PATH
设置JDK安装路径
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
设置HADOOP_HOME为Hadoop的安装路径
export HADOOP_HOME=/usr/local/hadoop
设置PATH
export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin
就可以让你在其他目录运行Hadoop命令
设置HADOOP其他环境变量
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
链接库的相关设置
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH
让.bashrc设置生效
source ~/.bashrc
修改Hadoop配置设置文件
进行Hadoop配置设置,包括Hadoop-env.sh、core-site.xml、YARN-site.xml、mapred-site.xml、hdfs-site.xml
配置hadoop-env.sh
将
export JAVA_HOME=${JAVA_HOME}
修改为
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
配置core-site.xml
在文件中,必须设置HDFS的默认名称,当我们使用命令或程序读取HDFS时,可使用此名称。
设置YARN-site.xml
含有MapReduce2(YARN)相关的配置设置
设置mapred-site.xml
mapred-site.xml用于设置监控Map与Reduce程序的JobTracker任务分配的情况以及TaskTracker任务运行情况。
复制模板文件
hduser@hduser:/usr/local/hadoop/etc/hadoop$ sudo cp mapred-site.xml.template ./mapred-site.xml
编辑mapred-site.xml
设置hdfs-site.xml
hdfs-site.xml用于设置HDFS分布式文件系统
创建并格式化HDFS目录
创建namenode、datanode数据存储目录
sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/namenode
sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/datanode
将Hadoop目录的所有者更改为hduser
sudo chown hduser:hduser -R /usr/local/hadoop/
将HDFS进行格式化
hadoop namenode -format
这个操作会删除所有的数据
启动Hadoop
现在开始启动Hadoop,可以使用以下的两种方式:
- 分别启动HDFS,YARN,使用
start-dfs.sh
启动HDFS、start-YARN.sh
启动Hadoop - 同时启动HDFS、YARN,使用
start-all.sh
启动HDFS
hduser@hduser:~$ start-dfs.sh
Starting namenodes on [localhost]
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is 17:2d:e5:cb:ea:df:b4:5e:77:6e:8c:84:ad:04:36:28.
Are you sure you want to continue connecting (yes/no)? yes
localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hduser-namenode-hduser.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hduser-datanode-hduser.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
ECDSA key fingerprint is 17:2d:e5:cb:ea:df:b4:5e:77:6e:8c:84:ad:04:36:28.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hduser-secondarynamenode-hduser.out
启动Hadoop MapReduce 框架YARN
hduser@hduser:~$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hduser-resourcemanager-hduser.out
localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hduser-nodemanager-hduser.out
同时启动HDFS、YARN
hduser@hduser:~$ start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hduser-namenode-hduser.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hduser-datanode-hduser.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hduser-secondarynamenode-hduser.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hduser-resourcemanager-hduser.out
localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hduser-nodemanager-hduser.out
使用jps查看已经启动的进程
jps(Java Virtual Machine Process Status Tool)可以查看当前运行的进程(process)
hduser@hduser:~$ jps
14005 NameNode
14140 DataNode
14647 NodeManager
14965 Jps
14509 ResourceManager
14313 SecondaryNameNode
可以看到:
- HDFS功能:NameNode、SecondaryNameNode、DataNode已经启动
- MapReduce2(YARN):ResourceManager、NodeManager已经启动
打开Hadoop Resource-Manager Web界面
Hadoop ResourceManager Web界面可用于查看当前Hadoop的状态:Node节点、应用程序、进程运行状态。
打开Hadoop ResourceManager Web界面
界面的网址:http://localhost:8088/
查看已经运行的节点
NameNode HDFS Web界面
网址:http://localhost:50070/
查看Live Nodes
查看DataNodes