1、所需软件
- 所需要的环境包括
java
,ssh
必须保证sshd
一直运行, 以便用Hadoop
脚本管理远端Hadoop
守护进程
Windows下的附加软件需求
Cygwin
提供上述软件之外的shell支持。
2、安装软件
sudo apt-get install ssh
sudo apt-get install rsync
- 由于
hadoop
是基于java
编写的,因此需要安装jdk
3、下载安装
参考资料:https://www.jianshu.com/p/cdae5bab030f
-
为了获取
Hadoop
的发行版,从Apache的某个镜像服务器上下载最近的 稳定发行版wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/stable/hadoop-3.3.0.tar.gz
tar -xvf hadoop-3.3.0.tar.gz -C /usr/local
cd /usr/local
mv hadoop-3.3.0 hadoop -
给
hadoop
配置环境变量vim /etc/profile
结合之前安装的jdk1.8
在文件末尾添加如下内容
export JAVA_HOME=/usr/local/jdk1.8
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
source /etc/profile
测试是否安装成功
hadoop version
root@iZuf63fv674pbylkkxs48qZ:/usr/local# hadoop version
Hadoop 3.3.0
Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r aa96f1871bfd858f9bac59cf2a81ec470da649af
Compiled by brahma on 2020-07-06T18:44Z
Compiled with protoc 3.7.1
From source with checksum 5dc29b802d6ccd77b262ef9d04d19c4
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-3.3.0.jar
root@iZuf63fv674pbylkkxs48qZ:/usr/local#
4、修改配置文件
sudo vim /usr/local/hadoop/etc/hadoop/core-site.xml
添加如下内容
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
说明以上配置文件的内容
<!-- 指定HDFS老大(namenode)的通信地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://0.0.0.0:9000</value>
</property>
<!-- 指定hadoop运行时产生文件的存储路径 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
</property>
-
也是在相同的路径下,修改
dfs.data.dir /usr/local/hadoop/hdfs/data datanode上数据块的物理存储位置hdfs-site.xml
中添加<!-- 设置hdfs副本数量 --> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property>
-
在
hadoop-env.sh
中更改JAVA-HOME
,注释掉export JAVA_HOME=${JAVA_HOME}
添加修改为
export JAVA_HOME=/usr/local/jdk1.8
5、测试,启动
下面的操作均是在Hadoop
的安装路径下
/usr/local/hadoop
-
格式化
namenode
:/usr/local/hadoop# ./bin/hdfs namenode -format
-
启动
hdfs
。 开启NameNode
和DataNode
守护进程 -
如果报错如下
root@iZuf63fv674pbylkkxs48qZ:/usr/local/hadoop# ./sbin/start-dfs.sh
Starting namenodes on [localhost]
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
Starting datanodes
ERROR: Attempting to operate on hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.
Starting secondary namenodes [iZuf63fv674pbylkkxs48qZ]
ERROR: Attempting to operate on hdfs secondarynamenode as root
ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.
解决方案
在/hadoop/sbin
路径下
在start-dfs.sh
,stop-dfs.sh
文件的顶部添加:
HDFS_DATANODE_USER=root
HADOOP_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
在 start-yarn.sh
,stop-yarn.sh
顶部添加:
#!/usr/bin/env bash
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
在sbin
目录下重新执行./start-all.sh
,即可启动
-
重启的过程中,遇到下面的问题
root@iZuf63fv674pbylkkxs48qZ:/usr/local/hadoop/sbin# sudo ./start-dfs.sh
WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
Starting namenodes on [localhost]
localhost: root@localhost: Permission denied (publickey,password).
Starting datanodes
localhost: root@localhost: Permission denied (publickey,password).
Starting secondary namenodes [iZuf63fv674pbylkkxs48qZ]
iZuf63fv674pbylkkxs48qZ: root@izuf63fv674pbylkkxs48qz: Permission denied (publickey,password).
5.1 在虚拟机上搭建Hadoop
集群
- 在安装的时候,为每一台主机配置静态
ip
。配置静态ip
的自我配置教程如下
http://note.youdao.com/s/dDpr8UkW
-
更改
Ubunut
系统的下载源sudo vim /etc/apt/source.list
将里面的 http://archive.ubuntu.com/ubuntu/
修改为 http://mirrors.aliyun.com/ubuntu/
-
首先安装好一台
Ubunut
系统后,作为master
,在系统中安装配置好静态ip
,安装jdk
,hadoop
后,按照相同的配置克隆master
得到node1,node2
。此处提到的名称需要使用如下命令修改sudo vim /etc/hostname
-
修改
host
文件sudo vim /etc/hosts
在文件末尾添加如下内容,
192.168.8.6 master
192.168.8.7 node1
192.168.8.8 node2
-
配置ssh无密码登录
-
输入
cd ~
回到根目录 -
使用
ssh-keygen
,一直回车得到下面的结果helloful@master:~$ cd ~
helloful@master:~$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/helloful/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/helloful/.ssh/id_rsa
Your public key has been saved in /home/helloful/.ssh/id_rsa.pub
The key fingerprint is:
SHA256:tGRzFOZansyT58ahQZyWOaIxESIVbPSkzYowY7AQyYM helloful@master
The key’s randomart image is:
±–[RSA 3072]----+
|*o.+=.+. +. |
|E+ .oB . = + |
|=… * * % |
|.+ . . B & + |
| . . . S O o |
| B . |
| . + |
| . |
| |
±—[SHA256]-----+ -
输入
cd .ssh
-
输入
cat ./id_rsa.pub >> authorized_keys
helloful@master:~$ cd .ssh helloful@master:~/.ssh$ cat ./id_rsa.pub >> authorized_keys helloful@master:~/.ssh$ cat authorized_keys ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDffnfOM4rgcxtm8lkBzPojolSX1zz26r5+hOd0Iy5lgS7atDZgZqQ7JITShpwaENNJ7N8qumsjwnyulBsP5DSRGa0oXzJTafO+Drj47p5V+bI4Nejl+SjXrB6X5RIFD8VmuIrXNMtRx4bQQ4oZQyAF/qSa4wcnsBz8gMPuY3JAnArlsm9MCHfhvTg/zeVTbJjjbyc+8tGXVsa0AVmL5lcrxOcBPc0bP53/agwzPMHuBtlTbvpX2X57XxvKFov8WngSbMZYRWALsW9EvvBZg1oyPVEXo16WK80hWRlZKWiQANJgdWF3sFIiac22ml12NoH7KzmmDEDigd0pqAPaBOlcLvCzWigOJf22hmW8UDTP68kvjR8M4JPDjkwDC5UjO4mzRQUEukeXqGMOxM7drHlyqKpoVE1/zi9rKFSroCnd59a5HIv+0pobMkjwQATh8ZUBEGeEK7yXNBnQTvxFvA8qmJZ62WzGguaty4AWDDQ9HMTkA1twvmlCqBksFSQOpFM= helloful@master helloful@master:~/.ssh$
-
对每一台主机都执行上面的生成秘钥操作,然后把
node1.node2
的秘钥复制到master
中的authorized_keys
文件中,同理,把master,node2
的复制到node1
。以此类推。
-
-
修改
hadoop
的配置文件,需要修改的文件为-
cd ~ cd /usr/local/hadoop/etc/hadoop
-
sudo vim core-site.xml
添加如下内容
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop/tmp</value> <description>Abase for other temporary directories.</description> </property> <property> <name>fs.default.name</name> <value>hdfs://master:9000</value> </property> </configuration>
-
sudo vim hadoop-env.sh
添加一行,内容如下,为自己安装的
jdk
的路径export JAVA_HOME=/usr/local/jdk1.8
-
修改
mapred-site.xml
中的地址为自己的地址sudo vim mapred-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> <value>master:49001</value> </property> <property> <name>mapred.local.dir</name> <value>/usr/local/hadoop/var</value> </property> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
-
修改
workers
文件master
修改为node1 node2
node1
修改为master node2
node2
修改为master node1
-
修改
<?xml version="1.0"?>yarn-site.xml
文件<name>yarn.resourcemanager.hostname</name> <value>master</value>
<description>The address of the applications manager interface in the RM.</description> <name>yarn.resourcemanager.address</name> <value>${yarn.resourcemanager.hostname}:8032</value>
<description>The address of the scheduler interface.</description> <name>yarn.resourcemanager.scheduler.address</name> <value>${yarn.resourcemanager.hostname}:8030</value>
<description>The http address of the RM web application.</description> <name>yarn.resourcemanager.webapp.address</name> <value>${yarn.resourcemanager.hostname}:8088</value>
<description>The https adddress of the RM web application.</description> <name>yarn.resourcemanager.webapp.https.address</name> <value>${yarn.resourcemanager.hostname}:8090</value>
<name>yarn.resourcemanager.resource-tracker.address</name> <value>${yarn.resourcemanager.hostname}:8031</value>
<description>The address of the RM admin interface.</description> <name>yarn.resourcemanager.admin.address</name> <value>${yarn.resourcemanager.hostname}:8033</value>
<name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value>
<name>yarn.scheduler.maximum-allocation-mb</name> <value>1024</value> <discription>每个节点可用内存,单位MB,默认8182MB</discription>
<name>yarn.nodemanager.vmem-pmem-ratio</name> <value>2.1</value>
<name>yarn.nodemanager.resource.memory-mb</name> <value>1024</value>
<name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value>
-
-
启动
hadoop
脚本格式化脚本
cd /usr/local/hadoop/bin sudo ./hadoop namenode -format
启动脚本
-
此处需要特别注意,因为上面的
ssh-keygen
是在用户权限下生成的,因此,运行过程不要加sudo
cd /usr/local/hadoop/sbin
./start-all.sh -
运行此过程遇到的问题
-
如果提示不能创建文件,在每个主机上执行下面的指令
sudo chmod 777 -R /usr/local/hadoop/
-
-
使用
web
查看运行在win10下,虚拟机master ip+9870 port
比如,其中192.168.8.6是master的ip
192.168.8.6:9870
-
6、Hadoop
系统学习
参考资料:https://www.zhihu.com/question/333417513