hadoop3.3.0集群搭建
环境和所需软件说明
三台设备环境为:两台为ubuntu20.04,一台为ubuntu16.04
hadoop版本为hadoop-3.3.0
依赖软件:ssh,java1.8
一、配置节点名和IP:
说明:三台设备一个为namenode(节点名设置为master),两台为datanode(节点名设为node1和node2)
sudo gedit /etc/hostname
将三台设备的hostname分别改为 master,node1,node2
sudo gedit /etc/hosts
添加下列内容:
192.168.1.101 master
192.168.1.102 node1
192.168.1.103 node2
设置静态ip为上述ip地址,节点名和设置的IP地址应该一一对应。
查看静态ip
sudo apt-get install net-tools
ifconfig -a
二、安装ssh并且设置免密登录:
- 安装:
sudo apt-get update
sudo apt-get install openssh-server -y
sudo ps -e |grep ssh (看是否安装成功,有sshd,说明ssh服务已经启动)
- 免密登录
ssh-keygen -t rsa -P "" ( ~/.ssh/下生成两个文件:id_rsa和id_rsa.pub)
cd ~/.ssh
cat id_rsa.pub >> authorized_keys (重定向输出到authorized_keys文本中)
这里就可以ssh无密码登录本机了
ssh localhost
exit
将master 节点的authorized_keys文本拷贝到各个节点node1和node2
scp ~/.ssh/authorized_keys lsk@node1:~/.ssh/
scp ~/.ssh/authorized_keys lsk@node1:~/.ssh/
ssh node1
exit (退出node1)
ssh node2
exit (退出node2)
三、安装java1.8
sudo apt-get install openjdk-8-jdk
java -version (查看是否安装成功)
whereis java (查看java位置,找java-8-openjdk-amd64,用来配置环境)
sudo gedit ~/.bashrc
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 (添加内容,java环境信息)
四、安装配置hadoop环境(master端运行)
下载hadoop3.3.0包 :hadoop-3.3.0.tar.gz
下载完成后解压到 /opt
修改opt权限:
sudo chmod -R 777 /opt
修改配置文件 /opt/hadoop-3.3.0/etc/hadoop 文件夹下core-site.xml,hdfs-site.xml,yarn-site.xml,mapred-site.xml,hadoop-env.sh,worker
首先创建保存数据的文件夹
cd /opt/hadoop-3.3.0/
mkdir hdfs
cd hdfs
mkdir name data tmp
gedit core-site.xml (添加下列内容)
<configuration>
<!-- 指定HDFS中NameNode的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<!-- 指定Hadoop运行时产生文件的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop-3.3.0/hdfs/tmp</value>
</property>
</configuration>
gedit hdfs-site.xml (添加下列内容)
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:50090</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/opt/hadoop-3.3.0/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/opt/hadoop-3.3.0/hdfs/data</value>
</property>
</configuration>
gedit yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
</configuration>
gedit mapred-site.xml (添加下列内容)
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=/opt/hadoop-3.3.0</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=/opt/hadoop-3.3.0</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=/opt/hadoop-3.3.0</value>
</property>
</configuration>
gedit hadoop-env.sh (添加下列内容)
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
gedit workers (添加下列内容)
node1
node2
向系统环境中添加hadoop环境变量。(这个操作各个节点都需要执行)
sudo gedit ~/.bashrc
export HADOOP_HOME=/opt/hadoop-3.3.0
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
将修改好的hadoop安装包拷贝到各个节点:
cd /opt
scp -r hadoop-3.3.0 lsk@node1:/opt/
scp -r hadoop-3.3.0 lsk@node2:/opt/
五、测试运行
初始化 hadoop
source ~/.bashrc
hadoop namenode -format (这一步只在master做)
启动hadoop:
start-all.sh (开启)
stop-all.sh (关闭)
开启状态时使用以下命令查看hadoop运行进程
jps
登录到其他节点查看hadoop运行状态
ssh node1
jps
运行一个小案例
hadoop jar opt/hadoop-3.3.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.0.jar pi 10 10 (输出结果为3.20000说明运行成功)