解压并移动
解压tar文件之后,移动文件夹到/usr/local/hadoop文件夹
tar -zxf hadoop-xxxx.tar
mv hadoop-xxxxx /usr/local/hadoop
配置文件
修改如下配置文件
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://sparkproject1:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/data/tmp</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>spark1</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://spark1:19888/jobhistory/logs/</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
<description>
开启日志聚集功能,任务执行完之后,将日志文件自动上传到文件系统(如HDFS文件系统),
否则通过namenode1:8088页面查看日志文件的时候,会报错
"Aggregation is not enabled. Try the nodemanager at namenode1:54951"
</description>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>302400</value>
<description>
日志文件保存在文件系统(如HDFS文件系统)的最长时间,默认值是-1,即永久有效。
这里配置的值是:7天 = 3600 * 24 * 7 = 302400
</description>
</property>
</configuration>
slaves
sparkproject1
sparkproject2
sparkproject3
文件夹同步
如果想删除,远程删除命令
ssh sparkproject2 "rm -rf /usr/local/hadoop/etc/"
ssh sparkproject3 "rm -rf /usr/local/hadoop/etc/"
文件夹同步(覆盖)
scp -r /usr/local/hadoop/etc/ root@sparkproject2:/usr/local/hadoop/
scp -r /usr/local/hadoop/etc/ root@sparkproject3:/usr/local/hadoop/
启动并测试集群
hdfs namenode -format
sbin/start-dfs.sh
sbin/start-yarn.sh
sbin/mr-jobhistory-daemon.sh start historyserver
是否能访问:
sparkproject1:50070
是否能上传并查看文件:
hdfs dfs -put hello.txt /
hdfs dfs -ls /
======================