hadoop搭建全分布环境

特点:(1)用于生产,至少3台机器
               (2)真正的分布式
               (3)具备Hadoop的所有的功能

准备工作:
            安装JDK、配置主机名、免密码登录
            关闭防火墙、同步时间(date命令)
            设置环境变量
                HADOOP_HOME=/root/training/hadoop-3.1.2
                export HADOOP_HOME

                PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH            
                export PATH    
                
                export HDFS_DATANODE_USER=root
                export HDFS_DATANODE_SECURE_USER=root
                export HDFS_NAMENODE_USER=root
                export HDFS_SECONDARYNAMENODE_USER=root            
                export YARN_RESOURCEMANAGER_USER=root            
                export YARN_NODEMANAGER_USER=root

配置免密码登录使用xshell工具比较好一点,因为这个工具有一个发送所有服务器命令的工具

首先我们让三台服务器按瓷砖方式排列,为了方便观看

然后工具--》发送键输入所有回话

然后我们执行ssh-keygen -t rsa

然后我们把公钥复制到bigdata112 、bigdata113、bigdata 114上

[root@bigdata112 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub bigdata112
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host 'bigdata112 (192.168.112.112)' can't be established.
ECDSA key fingerprint is SHA256:1p5LjAD2uf2rePwjLPF7PzLZzqXO50aNNBl7wf3EvdI.
ECDSA key fingerprint is MD5:c2:1f:a8:c8:42:8b:14:82:46:ee:fe:c2:dc:30:88:33.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@bigdata112's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'bigdata112'"
and check to make sure that only the key(s) you wanted were added.

[root@bigdata112 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub bigdata113
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host 'bigdata113 (192.168.112.113)' can't be established.
ECDSA key fingerprint is SHA256:C5rqfGWIywYqBSSEWAKVx5mzurzddDQmjFrBcH9dOLM.
ECDSA key fingerprint is MD5:ae:59:c3:eb:77:6b:8b:02:27:8a:80:b2:a7:28:29:7f.
Are you sure you want to continue connecting (yes/no)? 
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
The authenticity of host 'bigdata113 (192.168.112.113)' can't be established.
ECDSA key fingerprint is SHA256:C5rqfGWIywYqBSSEWAKVx5mzurzddDQmjFrBcH9dOLM.
ECDSA key fingerprint is MD5:ae:59:c3:eb:77:6b:8b:02:27:8a:80:b2:a7:28:29:7f.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@bigdata113's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'bigdata113'"
and check to make sure that only the key(s) you wanted were added.

[root@bigdata112 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub bigdata114
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host 'bigdata114 (192.168.112.114)' can't be established.
ECDSA key fingerprint is SHA256:C5rqfGWIywYqBSSEWAKVx5mzurzddDQmjFrBcH9dOLM.
ECDSA key fingerprint is MD5:ae:59:c3:eb:77:6b:8b:02:27:8a:80:b2:a7:28:29:7f.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@bigdata114's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'bigdata114'"
and check to make sure that only the key(s) you wanted were added.

[root@bigdata112 ~]# ssh bigdata112
Last login: Fri Mar 20 23:26:42 2020 from 192.168.112.113
[root@bigdata112 ~]# exit
logout
Connection to bigdata112 closed.
[root@bigdata112 ~]# ssh bigdata113
Last login: Fri Mar 20 23:27:02 2020 from 192.168.112.113
[root@bigdata113 ~]# exit
logout
Connection to bigdata113 closed.
[root@bigdata112 ~]# ssh bigdata114
Last login: Fri Mar 20 23:27:12 2020 from 192.168.112.113
[root@bigdata114 ~]# exit
logout
Connection to bigdata114 closed.
上边我们这些操作都是通过发送所有会话命令做到的,并进行了相应的测试

然后我们需要确认一下时间,是不是统一的时间,设置时间
[root@bigdata112 ~]# date -s '2020-03-20 15:34:50'

同样我们使用发送所有会话工具,配置环境变量

[root@bigdata112 ~]# vim ~/.bash_profile

然后如下配置写入进去就可以了

设置环境变量
                HADOOP_HOME=/root/training/hadoop-3.1.2
                export HADOOP_HOME

                PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH            
                export PATH    
                
                export HDFS_DATANODE_USER=root
                export HDFS_DATANODE_SECURE_USER=root
                export HDFS_NAMENODE_USER=root
                export HDFS_SECONDARYNAMENODE_USER=root            
                export YARN_RESOURCEMANAGER_USER=root            
                export YARN_NODEMANAGER_USER=root    

需要注意的是hadoop_home是根据你使用的是那个版本,你就用哪个版本

配置完了需要使配置文件生效一下source ~/.bash_profile

接下来我们只需要配置bigdata112的hadoop就可以了,配置好bigdata112的hadoop后,然后把它复制到bigdata113、bigdata114

然后我们把hadoop的包上传到bigdata112的/root/tools/的目录,并进行解压到/root/training/的目录下

 tar -zxvf hadoop-3.1.2.tar.gz -C ../training/
然后按照伪分布配置方式配置这台bigdata112的环境,配置文件有所调整

hadoop-env.sh
            export JAVA_HOME=/root/training/jdk1.8.0_181

        hdfs-site.xml
            <!--配置数据块的冗余度-->
            <!--默认是3-->
            <!--一般来说数据块的冗余度跟数据节点个数一致-->
            <!--但最大不超过3-->
            <property>
                <name>dfs.replication</name>
                <value>2</value>
            </property>
 
            <property>
                <name>dfs.permissions</name>
                <value>false</value>
            </property>
    
        core-site.xml
            <!--配置NameNode地址-->
            <!--9000是RPC通信的端口-->
            <property>
                <name>fs.defaultFS</name>
                <value>hdfs://bigdata112:9000</value>
            </property>
            
            <!--HDFS对应的操作系统目录-->
            <!--一定要改默认值是Linux的tmp目录-->
            <property>
                <name>hadoop.tmp.dir</name>
                <value>/root/training/hadoop-3.1.2/tmp</value>
            </property>                
            
        mapred-site.xml
            <!--MR运行的框架-->
            <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
            </property>            

            <property>
                <name>yarn.app.mapreduce.am.env</name>
                <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
            </property>    

            <property>
                <name>mapreduce.map.env</name>
                <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
            </property>    

            <property>
                <name>mapreduce.reduce.env</name>
                <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
            </property>    
        
        yarn-site.xml
            <!--配置ResourceManager的地址-->
            <property>
                <name>yarn.resourcemanager.hostname</name>
                <value>bigdata112</value>
            </property>    
            
            <!--MR运行的方式是Shuffle-->
            <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
            </property>                
        
        workers
            bigdata113
            bigdata114

[root@bigdata112 hadoop]# vim hadoop-env.sh 
[root@bigdata112 hadoop]# vim hdfs-site.xml 
[root@bigdata112 hadoop]# vim core-site.xml 
[root@bigdata112 hadoop]# vim mapred-site.xml 
[root@bigdata112 hadoop]# vim yarn-site.xml 
[root@bigdata112 hadoop]# vim workers 

配置完这些,我们需要在bigdata上执行hdfs格式话(如果hdfs操作目录不存在的话,执行这条命令后会创建的)

hdfs namenode -format
然后我们就把这个配置完的hadoop传输到bigdata113和bigdata114上

scp -r /root/training/hadoop-3.1.2/ root@bigdata113:/root/training
scp -r /root/training/hadoop-3.1.2/ root@bigdata114:/root/training

传输需要一定的时间,传输问我们只需启动主节点就可以了,所以我们在bigdata112上启动就可以了
[root@bigdata112 training]# start-all.sh
Starting namenodes on [bigdata112]
Last login: Fri Mar 20 23:26:42 CST 2020 from 192.168.112.112 on pts/3
Starting datanodes
Last login: Fri Mar 20 16:20:29 CST 2020 on pts/0
Starting secondary namenodes [bigdata112]
Last login: Fri Mar 20 16:20:32 CST 2020 on pts/0
Starting resourcemanager
Last login: Fri Mar 20 16:20:49 CST 2020 on pts/0
Starting nodemanagers
Last login: Fri Mar 20 16:21:08 CST 2020 on pts/0

在启动之前在从节点上是没有data目录的就是存储数据块的目录,但是启动之后就有了

然后我们就测试一个wordcount的例子

首先需要把一个文件上传到hdfs中

[root@bigdata112 temp]# hdfs dfs -mkdir /input
[root@bigdata112 temp]# hdfs dfs -put data.txt /input
[root@bigdata112 temp]# hdfs dfs -ls /input
Found 1 items
-rw-r--r--   2 root supergroup         62 2020-03-20 16:39 /input/data.txt
然后我们执行一下单词计数的事例

hadoop jar hadoop-mapreduce-examples-3.1.2.jar wordcount /input/data.txt /output/wc
可以执行成功,但是下面报了一段小错

Container [pid=4291,containerID=container_1584692490661_0001_01_000002] is running 471046656B beyond the 'VIRTUAL' memory limit. Current usage: 109.0 MB of 1 GB physical memory used; 2.5 GB of 2.1 GB virtual memory used. Killing container.

这个报错也没有影响结果
需要修改个配置文件

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

枣泥馅

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值