首先VMware中准备一台Centos 系统(我用的是 CentOS-6.7-x86_64-bin-DVD1.iso)
1:修改主机名 (如果需要权限的话用sudo)
命令:[hadoop@hadoop11 ~]$ vi /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=hadoop11
2:设置系统默认启动级别(如果需要权限的话用sudo)
在root管理员身份下输入: [hadoop@hadoop11 ~]$ vi /etc/inittab
将最后一行修改为3启动级别 id:3:initdefault:
3:用root用户给Hadoop用户配置sudoer权限
[root@hadoop11 hadoop]# vi /etc/sudoers
(大概在89行左右)
root ALL=(ALL) ALL
在这一行下面加入
Hadoop ALL=(ALL) ALL
4:配置ip ,我是用修改配置文件的方式(如果需要权限的话用sudo)
[hadoop@hadoop11 ~]$ vi /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE="eth0"
BOOTPROTO=none
IPV6INIT="yes"
NM_CONTROLLED="yes"
ONBOOT="yes"
TYPE="Ethernet"
UUID="0f9a4b6e-ec25-4831-9188-c7b904406202"
IPADDR=192.168.55.111
PREFIX=24
GATEWAY=192.168.55.2
DNS1=192.168.55.2
DEFROUTE=yes
IPV4_FAILURE_FATAL=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
NAME="System eth0"
HWADDR=00:0C:29:71:6B:38
IPV6_PEERDNS=yes
IPV6_PEERROUTES=yes
LAST_CONNECT=1501315727
修改成自己的IP
5:关闭防火墙
service iptables stop(关闭)
service iptables status(查看状态)
chkconfig iptables off(永久关闭防火墙)
6:添加内网域名映射(如果需要权限的话用sudo)
sudo vi /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.55.101 hadoop01
192.168.55.102 hadoop02
192.168.55.103 hadoop03
192.168.55.104 hadoop04
192.168.55.105 hadoop05
192.168.55.106 hadoop06
192.168.55.107 hadoop07
192.168.55.108 hadoop08
192.168.55.109 hadoop09
192.168.55.110 hadoop10
192.168.55.111 hadoop11
192.168.55.112 hadoop12
192.168.55.113 hadoop13
192.168.55.114 hadoop14
192.168.55.115 hadoop15
192.168.55.116 hadoop16
7:安装JDK
sftp> put C:/Users/Administrator/Desktop/Test/jdk-8u73-linux-x64.tar.gz
此方法默认上传到 sftp> pwd
/home/hadoop
的路径下
8:解压到如下路径 [hadoop@hadoop11 apps]$ pwd
/home/hadoop/apps
[hadoop@hadoop11 apps]$
[hadoop@hadoop11 apps]$ tar -zxvf jdk1.8.0_73/
9:配置环境变量
[hadoop@hadoop11 apps]$ vi /etc/profile
在最后加入:
export JAVA_HOME=/home/hadoop/apps/jdk1.8.0_73
export PATH=$PATH:$JAVA_HOME/bin
10:同步时间
(1)开启ntpd服务
service ntpd start(开启ntpd服务)
chkconfig ntpd on(永久开启服务)
chkconfig | grep ntpd(查看)
(2)输入命令:tzselect 按提示一次选择 5 9 1 1 1 生成一个Shanghai 的文件
然后输入命令:crontab 202.120.2.101 (获取时间)
接着reboot重启电脑
为了保证时间同步接着 crontab 202.120.2.101
(3)设置任务 每小时获取一次时间
[hadoop@hadoop11 apps]$ crontab -e
在和面加上:
0 */1 * * * ntpdate 202.120.2.101
(4)执行命令 cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime(把时间文件考到本地)
万一不行可以手动设置时间 date -s "201X-0X-29 10:52:00" 哈哈哈
完成以上步骤可以适当的设一个快照(也没有大的必要)
下面是分布式Hadoop的集群安装
这里为了节省时间,我就搭建三个节点的集群
HDFS | YARN | |
hadoop11 | NameNode+DateNode (主节点) | NodeManager |
hadoop12 | DataNode + SecondaryNamenode | NodeManager |
hadoop13 | DataNode | NodeManager + ResourceManager 主节点 |
设置副本数为2
具体步骤:
1:上传hadoop-2.6.5-centos-6.7.tar.gz到
sftp> pwd
/home/hadoop/apps
2:解压 tar -zxvf hadoop-2.6.5-centos-6.7.tar.gz
3:修改 hadoop-env.sh
jdk的环境 export JAVA_HOME=/hone/apps/jdk1.8.0_73
4:修改core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop11:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoopdata</value>
</property>
5:修改hdfs-site.xml
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/hadoopdata/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/hadoopdata/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.secondary.http.address</name>
<value>hadoop12:50090</value>
</property>
6:修改 mapred-site.xml(集群只有mapred-site.xml.template,可以从这个文件进行复
制,或者直接改名也可,在里面加入以下内容即可)
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
7:修改yarn-site.xml
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop13</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
8:修改slaves
hadoop11
hadoop12
hadoop13
9:改hadoop配置环境变量
[hadoop@hadoop11 apps]$ vi /etc/profile
最后面加上
export HADOOP_HOME=/home/hadoop/apps/hadoop-2.6.5
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
10:然后在hadoop11的基础上克隆hadoop12和hadoop13
11:修改hadoop12和hadoop13的网卡 IP 主机名(IP和主机名修改成相应的和上面一样,注释掉eth0,然后把eth1改成eth0)
12:配置三台机器之间的免密登录
(经验:然后可以重启以下)
13:在HDFS的主节点上执行初始化
hadoop namenode -format
查看最后是否初始化成功
17/07/31 23:13:13 INFO common.Storage: Storage directory /home/hadoop/hadoopdata/name has been successfully formatted. 17/07/31 23:13:13 INFO namenode.FSImageFormatProtobuf: Saving image file /home/hadoop/hadoopdata/name/current/fsimage.ckpt_0000000000000000000 using no compression 17/07/31 23:13:14 INFO namenode.FSImageFormatProtobuf: Image file /home/hadoop/hadoopdata/name/current/fsimage.ckpt_0000000000000000000 of size 323 bytes saved in 0 seconds. 17/07/31 23:13:14 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0 17/07/31 23:13:14 INFO util.ExitUtil: Exiting with status 0 17/07/31 23:13:14 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at hadoop11/192.168.55.111 ************************************************************/ |
start-dfs.sh
15:在hadoop13上启动
start-yarn.sh
16:查看进程 jps
[hadoop@hadoop11 ~]$ jps 2743 DataNode 3292 NodeManager 4093 Jps 2653 NameNode | [hadoop@hadoop12 ~]$ jps 2561 DataNode 2902 NodeManager 2649 SecondaryNameNode 3611 Jps | [hadoop@hadoop13 apps]$ jps 2679 NodeManager 2856 ResourceManager 3324 Jps 2557 DataNode |
hadoop fs -ls /
18:如果有没有启动的进程,在适当的节点可以试试:
hadoop-daemon.sh start datanode
hadoop-daemon.sh start namenode yarn-daemon.sh start nodemanager
yarn-daemon.sh start resourcemanager
可以通过50070端口管理HDFS集群的信息
可以通过8088端口,查看mapreduce的运行状态