Hadoop: Setting up a Single Node Cluster
官网下载hadoop 注意java-hadoop版本
Pseudo-Distributed Operation
配置etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
hadoop目录
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop-3.2/data/<value>
<property>
配置root访问权限
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
配置etc/hadoop/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
检擦ssh服务是否存在
ssh localhost
格式化文件系统Format the filesystem:
$ bin/hdfs namenode -format
在sbin/start-dfs.sh 、 stop-dfs.sh start-yarn.sh stop-yarn.sh中加入启动用户
export HDFS_NAMENODE_USER="root"
export HDFS_DATANODE_USER="root"
export HDFS_SECONDARYNAMENODE_USER="root"
export YARN_RESOURCEMANAGER_USER="root"
export YARN_NODEMANAGER_USER="root"
启动dfs sbi/start-dfs.sh
注意 fs.namenode.http-address在hadoop-3.2.0版本上的默认值是 0.0.0.0:9870
webui http:localhost:9870
YARN on a Single Node
修改:etc/hadoop/mapred-site.xml
指定MapReduce程序应该放在哪个资源调度集群上运行。若不指定为yarn,那么MapReduce程序就只会在本地运行而非在整个集群中运行
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
修改配置etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hdfs</value>
</property>
#配置yarn集群中的重节点,指定map产生的中间结果传递给reduce采用的机制是shuffle
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
启动ResourceManager资源调度和nodeManager
sbin/start-yarn.sh
webui http://localhost:8088/
sbin/stop-yarn.sh
配置环境变量
export HADOOP_HOME=/opt/hadoop-3.1.2
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin