Hadoop伪分布模式搭建

最新推荐文章于 2024-04-20 19:12:09 发布

huagong_adu

最新推荐文章于 2024-04-20 19:12:09 发布

阅读量3k

点赞数 4

分类专栏： Hadoop 文章标签： hadoop Hadoop

本文链接：https://blog.csdn.net/huagong_adu/article/details/8452293

版权

Hadoop 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

这两天在看《Hadoop in Action》，尝试着搭建伪分布模式，即在“单节点集群”运行Hadoop，步骤：

1. 配置conf文件夹下hadoop-env.sh文件的JAVA_HOME环境变量指向Java安装目录

export JAVA_HOME=/usr/java/jre1.6.0_23

2. 安装SSH：包括ssh, sshd, ssh-keygen；

3. 生成SSH密钥对，不设置口令：

ssh-keygen -t rsa

4. 设置授权密钥：

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

5. 配置conf下的几个XML文件

core-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>
	<name>fs.default.name</name>
	<value>hdfs://localhost:9000</value>
	<description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation.
	</description>
</property>

</configuration>

mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>
	<name>mapred.job.tracker</name>
	<value>localhost:9001</value>
	<description>The host and port that the MapReduce job tracker runs at.
	</description>
</property>

</configuration>

hdfs-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>
	<name>dfs.replication</name>
	<value>1</value>
	<description>The actual number of replication can be specified when the file is created.
	</description>
</property>

</configuration>

6. 启动Hadoop的几个守护进程

bin/start-all.sh

7. 查看Hadoop的几个守护进程，

jps

如果正常应该可以看到以下结果，包括NameNode, DataNode, SecondaryNameNode, JobTracker, TaskTracker。

16982 DataNode
17249 SecondaryNameNode
16730 NameNode
17746 Jps
17364 JobTracker
17629 TaskTracker

由于我之前配置的时候，SSH没有设置密钥授权，所以每次启动一个守护进程时总要手动输入一次机器密码，进行步骤4之后可以解决这个问题。

第一次启动时可以看到NameNode，关掉再启动时通过jps看不到NameNode，解决方法：关掉这些进程（bin/stop-all.sh），重新格式化，再启动，这时发现少了DataNode这个守护进程，重新格式化还是没有，又重头开始配置Hadoop，结果还是不行。

上网搜了一下，找到了解决方法：

有时数据结构出现问题会产生无法启动datanode的问题。
然后用 hadoop namenode -format  重新格式化后仍然无效，/tmp中的文件并没有清除。
其实还需要清除/tmp/hadoop*里的文件。
执行步骤：
     一、先删除hadoop:/tmp 
       bin/hadoop fs -rmr /tmp
    二、停止 hadoop   
       bin/stop-all.sh
    三、删除/tmp/hadoop*
       rm -rf /tmp/hadoop*
    四、格式化hadoop
       bin/hadoop namenode -format
    五、启动hadoop 
        bin/start-all.sh
之后即可解决这个datanode没法启动的问题

8. 检查运行状态

Hadoop管理界面：http://localhost:50030