hadoop集群搭建
local模式
首先准备三台虚拟机(centos7版)
配置网络和主机之间的映射关系,保证三台虚拟机之间能相互找到(需要root用户,三台虚拟机均需要配置):
vi /etc/hosts
在后面添加网络和主机的映射:
192.168.187.100 hadoop0
192.168.187.101 hadoop1
192.168.187.102 hadoop2
检验:
ping hadoop0
ping hadoop1
ping hadoop2
关闭防火墙(需要root用户,三台虚拟机均需要操作):
systemctl stop firewalld
systemctl disable firewalld
systemctl status firewalld
关闭selinux
vi /etc/selinux/config
修改:
SELINUX=disabled
init 6
配置ssh,使得三台虚拟机之间可以相互免密登录(三台虚拟机均需要操作):
ssh-keygen
ssh-copy-id hadoop0
将hadoop0中authorized_keys文件分别发送到hadoop1和hadoop2上
scp /home/hadoop/.ssh/authorized_keys hadoop@hadoop1:/home/hadoop/.ssh/
scp /home/hadoop/.ssh/authorized_keys hadoop@hadoop2:/home/hadoop/.ssh/
安装java和hadoop
在/home/hadoop目录下创建grapram和software目录
mkdir /home/hadoop/program
mkdir /home/hadoop/software
将本地Java压缩包和hadoop压缩包上传到software目录中
解压缩
tar -zxvf ibm-semeru-open-jdk_x64_linux_8u302b08_openj9-0.27.0.tar.gz
tar -zxvf hadoop-3.3.1.tar.gz
移动解压文件到program目录下:
mv jdk8u302-b08 /home/hadoop/program/jdk-1.8
mv hadoop-3.3.1 /home/hadoop/program/hadoop-3.3
配置java和hadoop的环境变量
cd
vi .bashrc
添加:
export JAVA_HOME=/home/hadoop/program/jdk-1.8
export PATH=$JAVA_HOME/bin:$PATH
export HADOOP_HOME=/home/hadoop/program/hadoop-3.3
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export PATH=.:$PATH
source .bashrc
此时local模式下的hadoop部署成功
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.1.jar pi 3 4
伪分布式部署
hadoop-env.sh
cd $HADOOP_HOME/etc/hadoop
vi hadoop-env.sh
添加自己安装的Java路径
export JAVA_HOME=/home/hadoop/program/jdk-1.8
core-site.xml
vi core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop0:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/hadoop/program/hadoop-3.3/hdfs/tmp</value>
</property>
hdfs-site.xml
vi hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/program/hadoop-3.3/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/program/hadoop-3.3/hdfs/data</value>
</property>
创建存储文件
cd $HADOOP_HOME
mkdir hdfs
mkdir hdfs/name
mkdir hdfs/data
mkdir hdfs/tmp
登录网址可以访问到hadoop
hdfs namenode –format
start-dfs.sh
访问网址:
http://localhost:9870
mapred-site.xml
cd $HADOOP_HOME/etc/hadoop
vi mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
yarn-site.xml
vi yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME</value>
</property>
yarn配置完成
start-yarn.sh
可访问网址:
http://localhost:8088
伪分布式部署完成
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.1.jar pi 3 4
可访问网页看到任务
全分布式部署
删除伪分布式的数据重新创建文件
rm -rf /home/hadoop/program/hadoop-3.3/logs
rm -rf /home/hadoop/program/hadoop-3.3/hdfs
mkdir -p /home/hadoop/program/hadoop-3.3/hdfs/tmp
mkdir -p /home/hadoop/program/hadoop-3.3/hdfs/data
mkdir -p /home/hadoop/program/hadoop-3.3/hdfs/name
workers
cd $HADOOP_HOME/etc/hadoop
vim workers
删除原有数据
添加数据节点:
hadoop0
hadoop1
hadoop2
yarn-site.xml
vim yarn-site.xml
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop0</value>
</property>
将hadoop0上的相关文件发送到hadoop1和hadoop2上
cd
scp -r program hadoop@hadoop1:/home/hadoop/program
scp -r program hadoop@hadoop2:/home/hadoop/program
scp .bashrc hadoop@hadoop1:/home/hadoop
scp .bashrc hadoop@hadoop2:/home/hadoop
source .bashrc (hadoop1和hadoop2上)
启动全分布式的hadoop集群
hdfs namenode -format
start-dfs.sh
stop-dfs.sh
start-yarn.sh
stop-yarn.sh
全分布式集群部署完成
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.1.jar pi
3 4