1、安装JDK1.8
配置环境变量:
JAVA_HOME=D:\java\jdk1.8.0_181
PATH=PATH;%JAVA_HOME%\bin
2、下载Hadoop3.1.4
访问下载链接: https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.1.4/hadoop-3.1.4.tar.gz 下载hadoop-3.1.4.tar.gz,并解压到规划的路径下,如下图:
3、下载Windows工具包
访问 https://github.com/cdarlint/winutils, 找到对应版本的hadoop(小版本可能不存在),下载hadoop.dll 和 winutils.exe,并复制到...\hadoop-3.1.4\bin目录下
4、配置hadoop配置文件
配置文件路径...\hadoop-3.1.4\etc\hadoop
4.1 core-site.xml(配置默认hdfs的访问端口)
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
4.2 hdfs-site.xml(设置web管理页面的端口。设置复制数为1,即不进行复制。namenode文件路径以及datanode数据路径。)
<configuration>
<property>
<name>dfs.http.address</name>
<value>localhost:50070</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/hadoop/data/dfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/hadoop/data/dfs/datanode</value>
</property>
</configuration>
4.3 yarn-site.xml(这里yarn设置使用了mr混洗)
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
4.4 mapred-site.xml(设置mr使用的框架,这里使用yarn)
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
4.5 hadoop-env.cmd(设置JDK的HOME路径)
set JAVA_HOME=D:\java\jdk1.8.0_181
5、格式化hdfs
执行下面的命令,格式化hdfs:
cd ...\hadoop-3.1.4\bin (脚本的路径)
hadoop namenode -format (格式化命令)
6、启动hadoop
cd ...\hadoop-3.1.4\sbin (脚本的路径)
start-dfs.cmd (单独启动hdfs)
start-yarn.cmd (单独启动yarn)
start-all.cmd (一键启动hdfs和yarn)
7、通过管理端验证hadoop
浏览器输入:http://localhost:8088,查看集群状态
浏览器输入:http://localhost:50070,查看Hadoop状态
8、关闭Hadoop
cd ...\hadoop-3.1.4\sbin (脚本的路径)
stop-dfs.cmd (单独启动hdfs)
stop-yarn.cmd (单独启动yarn)
stop-all.cmd (一键启动hdfs和yarn)