特点:(1)用于生产,至少3台机器
(2)真正的分布式
(3)具备Hadoop的所有的功能
准备工作:
安装JDK、配置主机名、免密码登录
关闭防火墙、同步时间(date命令)
设置环境变量
HADOOP_HOME=/root/training/hadoop-3.1.2
export HADOOP_HOME
PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export PATH
export HDFS_DATANODE_USER=root
export HDFS_DATANODE_SECURE_USER=root
export HDFS_NAMENODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
配置免密码登录使用xshell工具比较好一点,因为这个工具有一个发送所有服务器命令的工具
首先我们让三台服务器按瓷砖方式排列,为了方便观看
然后工具--》发送键输入所有回话
然后我们执行ssh-keygen -t rsa
然后我们把公钥复制到bigdata112 、bigdata113、bigdata 114上
[root@bigdata112 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub bigdata112
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host 'bigdata112 (192.168.112.112)' can't be established.
ECDSA key fingerprint is SHA256:1p5LjAD2uf2rePwjLPF7PzLZzqXO50aNNBl7wf3EvdI.
ECDSA key fingerprint is MD5:c2:1f:a8:c8:42:8b:14:82:46:ee:fe:c2:dc:30:88:33.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@bigdata112's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'bigdata112'"
and check to make sure that only the key(s) you wanted were added.
[root@bigdata112 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub bigdata113
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host 'bigdata113 (192.168.112.113)' can't be established.
ECDSA key fingerprint is SHA256:C5rqfGWIywYqBSSEWAKVx5mzurzddDQmjFrBcH9dOLM.
ECDSA key fingerprint is MD5:ae:59:c3:eb:77:6b:8b:02:27:8a:80:b2:a7:28:29:7f.
Are you sure you want to continue connecting (yes/no)?
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
The authenticity of host 'bigdata113 (192.168.112.113)' can't be established.
ECDSA key fingerprint is SHA256:C5rqfGWIywYqBSSEWAKVx5mzurzddDQmjFrBcH9dOLM.
ECDSA key fingerprint is MD5:ae:59:c3:eb:77:6b:8b:02:27:8a:80:b2:a7:28:29:7f.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@bigdata113's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'bigdata113'"
and check to make sure that only the key(s) you wanted were added.
[root@bigdata112 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub bigdata114
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host 'bigdata114 (192.168.112.114)' can't be established.
ECDSA key fingerprint is SHA256:C5rqfGWIywYqBSSEWAKVx5mzurzddDQmjFrBcH9dOLM.
ECDSA key fingerprint is MD5:ae:59:c3:eb:77:6b:8b:02:27:8a:80:b2:a7:28:29:7f.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@bigdata114's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'bigdata114'"
and check to make sure that only the key(s) you wanted were added.
[root@bigdata112 ~]# ssh bigdata112
Last login: Fri Mar 20 23:26:42 2020 from 192.168.112.113
[root@bigdata112 ~]# exit
logout
Connection to bigdata112 closed.
[root@bigdata112 ~]# ssh bigdata113
Last login: Fri Mar 20 23:27:02 2020 from 192.168.112.113
[root@bigdata113 ~]# exit
logout
Connection to bigdata113 closed.
[root@bigdata112 ~]# ssh bigdata114
Last login: Fri Mar 20 23:27:12 2020 from 192.168.112.113
[root@bigdata114 ~]# exit
logout
Connection to bigdata114 closed.
上边我们这些操作都是通过发送所有会话命令做到的,并进行了相应的测试
然后我们需要确认一下时间,是不是统一的时间,设置时间
[root@bigdata112 ~]# date -s '2020-03-20 15:34:50'
同样我们使用发送所有会话工具,配置环境变量
[root@bigdata112 ~]# vim ~/.bash_profile
然后如下配置写入进去就可以了
设置环境变量
HADOOP_HOME=/root/training/hadoop-3.1.2
export HADOOP_HOME
PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export PATH
export HDFS_DATANODE_USER=root
export HDFS_DATANODE_SECURE_USER=root
export HDFS_NAMENODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
需要注意的是hadoop_home是根据你使用的是那个版本,你就用哪个版本
配置完了需要使配置文件生效一下source ~/.bash_profile
接下来我们只需要配置bigdata112的hadoop就可以了,配置好bigdata112的hadoop后,然后把它复制到bigdata113、bigdata114
然后我们把hadoop的包上传到bigdata112的/root/tools/的目录,并进行解压到/root/training/的目录下
tar -zxvf hadoop-3.1.2.tar.gz -C ../training/
然后按照伪分布配置方式配置这台bigdata112的环境,配置文件有所调整
hadoop-env.sh
export JAVA_HOME=/root/training/jdk1.8.0_181
hdfs-site.xml
<!--配置数据块的冗余度-->
<!--默认是3-->
<!--一般来说数据块的冗余度跟数据节点个数一致-->
<!--但最大不超过3-->
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
core-site.xml
<!--配置NameNode地址-->
<!--9000是RPC通信的端口-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://bigdata112:9000</value>
</property>
<!--HDFS对应的操作系统目录-->
<!--一定要改默认值是Linux的tmp目录-->
<property>
<name>hadoop.tmp.dir</name>
<value>/root/training/hadoop-3.1.2/tmp</value>
</property>
mapred-site.xml
<!--MR运行的框架-->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
yarn-site.xml
<!--配置ResourceManager的地址-->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>bigdata112</value>
</property>
<!--MR运行的方式是Shuffle-->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
workers
bigdata113
bigdata114
[root@bigdata112 hadoop]# vim hadoop-env.sh
[root@bigdata112 hadoop]# vim hdfs-site.xml
[root@bigdata112 hadoop]# vim core-site.xml
[root@bigdata112 hadoop]# vim mapred-site.xml
[root@bigdata112 hadoop]# vim yarn-site.xml
[root@bigdata112 hadoop]# vim workers
配置完这些,我们需要在bigdata上执行hdfs格式话(如果hdfs操作目录不存在的话,执行这条命令后会创建的)
hdfs namenode -format
然后我们就把这个配置完的hadoop传输到bigdata113和bigdata114上
scp -r /root/training/hadoop-3.1.2/ root@bigdata113:/root/training
scp -r /root/training/hadoop-3.1.2/ root@bigdata114:/root/training
传输需要一定的时间,传输问我们只需启动主节点就可以了,所以我们在bigdata112上启动就可以了
[root@bigdata112 training]# start-all.sh
Starting namenodes on [bigdata112]
Last login: Fri Mar 20 23:26:42 CST 2020 from 192.168.112.112 on pts/3
Starting datanodes
Last login: Fri Mar 20 16:20:29 CST 2020 on pts/0
Starting secondary namenodes [bigdata112]
Last login: Fri Mar 20 16:20:32 CST 2020 on pts/0
Starting resourcemanager
Last login: Fri Mar 20 16:20:49 CST 2020 on pts/0
Starting nodemanagers
Last login: Fri Mar 20 16:21:08 CST 2020 on pts/0
在启动之前在从节点上是没有data目录的就是存储数据块的目录,但是启动之后就有了
然后我们就测试一个wordcount的例子
首先需要把一个文件上传到hdfs中
[root@bigdata112 temp]# hdfs dfs -mkdir /input
[root@bigdata112 temp]# hdfs dfs -put data.txt /input
[root@bigdata112 temp]# hdfs dfs -ls /input
Found 1 items
-rw-r--r-- 2 root supergroup 62 2020-03-20 16:39 /input/data.txt
然后我们执行一下单词计数的事例
hadoop jar hadoop-mapreduce-examples-3.1.2.jar wordcount /input/data.txt /output/wc
可以执行成功,但是下面报了一段小错
Container [pid=4291,containerID=container_1584692490661_0001_01_000002] is running 471046656B beyond the 'VIRTUAL' memory limit. Current usage: 109.0 MB of 1 GB physical memory used; 2.5 GB of 2.1 GB virtual memory used. Killing container.
这个报错也没有影响结果
需要修改个配置文件