Hadoop分布式环境搭建
以下均使用hadoop用于操作,并在容器里进行
1 配置主节点
1)获取hadoop.tar.gz 并传到 /home/hadoop路径下减压(此包为没有经过任何配置的,可以在指定目录下的 master和slave 目录中分别取到主节点和从节点的配置包,在后面步骤中根据需要进行修改)
执行 tar -xvf hadoop.tar.gz 减压得到hadoop目录
2)配置环境变量
修改/etc/profile,增加如下内容
export HADOOP_HOME= /home/hadoop/hadoop-2.7.3
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export YARN_HOME=$HADOOP_HOME
export HADOOP_ROOT_LOGGER=INFO,console
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH: /home/jdk1.8.0_161/jre/lib/amd64/server: /home/hadoop/hadoop-2.7.3/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_HOME/lib/native"
3)修改完成后执行:
source /etc/profile
4)修改$HADOOP_HOME/etc/hadoop/hadoop-env.sh,修改JAVA_HOME 如下:
export JAVA_HOME=/home/jdk1.8.0_161
5)修改$HADOOP_HOME/etc/hadoop/slaves,将原来的localhost删除,改成如下内容:
TSlave1
TSlave2
6)修改$HADOOP_HOME/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://TMaster:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoop-2.7.3/tmp</value>
</property>
<!-- 解决HIVE hadoop 用户不能远程连接问题 -->
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
</configuration>
7)修改$HADOOP_HOME/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>TMaster:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/hadoop-2.7.3/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/hadoop-2.7.3/hdfs/data</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
</configuration>
8)配置mapred-site
复制template,生成xml,命令如下:
cp mapred-site.xml.template mapred-site.xml
修改$HADOOP_HOME/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>TMaster:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>TMaster:19888</value>
</property>
</configuration>
9)修改$HADOOP_HOME/etc/hadoop/yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>TMaster:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>TMaster:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>TMaster:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>TMaster:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>TMaster:8088</value>
</property>
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
</configuration>
配置slave
此处因为使用同一镜像,我们省略了此步骤,docker是不是很方便
格式化namenode
1)在Master节点启动集群,启动之前格式化一下namenode:
hadoop namenode -format
此处目前先如此,后期需要使用Dockerfile来解决
启动
1)查看集群是否启动成功:
jps
Master显示:
SecondaryNameNode
ResourceManager
NameNode
Slave显示:
NodeManager
DataNode
退出hadoop 安全模式
在Hadoop正常启动后,使用hadoop 用户执行如下命令退出安全模式:
hadoop dfsadmin -safemode leave