Hadoop伪分布式环境搭建
搭建环境
hadoop的搭建在虚拟机上完成,具体的配置如下:
虚拟机:VMWare Workstation Pro 操作系统:Ubuntu20.04 Hadoop:2.7.4 JDK:1.8.0_271 maven:3.6.3 protobuf:2.5.0
安装过程
JDK安装
将下载的JDK 解压到目录/opt/local下,并在/etc/profile最后添加
export JAVA_HOME=/opt/local/jdk1.8.0_271
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
执行命令使其生效
source /etc/profile
Maven安装
将下载的Maven 解压到目录/opt/local下,并在/etc/profile最后添加
export MAVEN_HOME=/opt/local/apache-maven-3.6.3
export PATH=${MAVEN_HOME}/bin:${PATH}
执行命令使其生效
source /etc/profile
protobuf安装
将下载的protobuf 解压到目录/opt/local下,在编译前需要安装g++,首先在目录下执行
sudo ./configure --prefix=/opt/local/protobuf
sudo make
sudo make install
完成后在/etc/profile最后添加
export PATH=$PATH:/opt/local/protobuf/bin
执行命令使其生效
source /etc/profile
Hadoop编译
在解压后的源码下执行
mvn package -Pdist -DskipTests -Dtar
过一段时间后Build Success即为成功
Hadoop搭建
生成的可执行文件位于目录/opt/local/hadoop-2.7.4-src/hadoop-dist/target下(相当于在官网直接下载的执行文件),将压缩包解压至/usr/local下,改名为hadoop,并且给/usr/local/hadoop设置访问权限。
sudo chmod 777 -R /usr/local/hadoop
接着配置环境变量(按照原文配置环境环境变量在hadoop version时出现Could not find or load main class org.apache.hadoop.util.VersionInfo的错误,经过查询后改为以下配置信息成功。如果是直接下载的可执行文件,则原文信息应该是可行的),在/etc/profile最后添加如下内容
export HADOOP_HOME=/home/centos/HADOOP/hadoop-2.6.4
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_CONF_DIR=$HADOOP_HOME
export HADOOP_PREFIX=$HADOOP_HOME
export HADOOP_LIBEXEC_DIR=$HADOOP_HOME/libexec
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH
export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
执行命令使其生效
source /etc/profile
Hadoop配置
配置hadoop-env.sh
# The java implementation to use.
export JAVA_HOME=/usr/java/jdk1.8.0_271
配置yarn-env.sh
export JAVA_HOME=/usr/java/jdk1.8.0_271
配置core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
配置hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/data</value>
</property>
</configuration>
配置yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>127.0.0.1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>127.0.0.1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>127.0.0.1:8031</value>
</property>
</configuration>
配置完毕后关机重启系统
Hadoop测试
hadoop version进行验证,显示hadoop版本即为安装配置完成。接着对namenode进行格式化,hdfs namenode -format。当successfully formatted并且Existing with 0代表成功。需要注意的是,系统需要安装open-ssh,否则在在这一步会提示连接失败。jps显示进行,出现多个进程代表正确。
总结
本文主要是是hadoop源码编译并安装,在安装hadoop的过程中主要参考了两篇博客内容,链接如下: