一、先安装Java1.8及以上
具体步骤:
https://blog.csdn.net/D1124615130/article/details/106013744
二、安装Hadoop
- 下载Hadoop.tar.gz并解压
- 配置环境变量,添加bin和sbin
HADOOP_HOME=/opt/hadoop-3.0.0 PATH=/你的目录/hadoop-3.0.0/bin:$PATH PATH=/你的目录/hadoop-3.0.0/sbin:$PATH export HADOOP_HOME PATH
- 在etc\hadoop\core-site.xml中添加
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop-3.0.0/tmp</value>
</property>
</configuration>
- 创建目录date,比如date放在hadoop-3.0.0目录下,在data目录下有两个目录namenode和datenode
- 在etc\hadoop\hdfs-site.xml中添加
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:你的地址/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:你的地址/data/datanode</value>
</property>
</configuration>
-
如果使用MapReduce建议在etc/hadoop/yarn-site.xml中添加
其中yarn application classpath是控制台输入hadoop classpath得到的,用于解决hadoop找不到或无法加载主类,org.apache.hadoop.mapreduce.v2.app.MRAppMaster
后面几个标签是设置内存大小的,没设置这个标签时,我运行MapReduce经常会卡住,报内存不足。
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>/opt/hadoop-3.0.0/etc/hadoop:/opt/hadoop-3.0.0/share/hadoop/common/lib/*:/opt/hadoop3.0.0/share/hadoop/common/*:/opt/hadoop-3.0.0/share/hadoop/hdfs:/opt/hadoop-3.0.0/share/hadoop/hdfs/lib/*:/opt/hadoop-3.0.0/share/hadoop/hdfs/*:/opt/hadoop-3.0.0/share/hadoop/mapreduce/*:/opt/hadoop-3.0.0/share/hadoop/yarn:/opt/hadoop-3.0.0/share/hadoop/yarn/lib/*:/opt/hadoop-3.0.0/share/hadoop/yarn/*
</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>20480</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>4</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>
</configuration>
-如果用MapReduce建议 在etc/hadoop/mapred-site.xml添加
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>1536</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1024M</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>3072</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx2560M</value>
</property>
</configuration>
- 修改etc/hadoop/hadoop-env.sh
export JAVA_HOME=
改为
export JAVA_HOME=你的Java地址如/opt/jdk1.8.0_221
- sbin/start-dfs.sh和stop-dfs.sh的末尾添加
用于设置操作hdfs的账户
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
- sbin/start-yarn.sh和stop-yarn.sh的末尾添加
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
设置ssh免密登陆
sbin/start-dfs.sh中设置的用户是root
于是进入root用户
博客如下
https://blog.csdn.net/D1124615130/article/details/106191264
测试
sbin/start-all.sh
jps
输出
3589 NodeManager
9685 Jps
2921 DataNode
2748 NameNode
3165 SecondaryNameNode
3407 ResourceManager
应该就算成功了。
关闭防火墙service ufw stop
然后浏览器输入虚拟机地址:8088
可以看到