hadoop 单节点的实现详细步骤

最新推荐文章于 2024-07-25 15:50:18 发布

floatcqy

最新推荐文章于 2024-07-25 15:50:18 发布

阅读量1.4k

点赞数

文章标签： hadoop output input linux ubuntu java

本文链接：https://blog.csdn.net/floatcqy/article/details/6119199

版权

闲在家中实在无聊，决定开始做毕业设计

hadoop：关于hadoop的资料网上有不少，但是书不多，只有本《hadoop权威指南》，我看过一点书评，据说翻译的很差，后来还是选择阅读原版书籍。（慢慢看还是看到懂的哈）

介绍：Hadoop got its start in Nutch. A few of us were attempting to build an open sourceweb search engine and having trouble managing computations running on even ahandful of computers. Once Google published its GFS and MapReduce papers, theroute became clear. They’d devised systems to solve precisely the problems we werehaving with Nutch. So we started, two of us, half-time, to try to recreate these systemsas a part of Nutch （nutch项目中的一个子课题，受到googlelab 发表两篇论文的启发，2个人，兼职的情况下，实现了一个能在20台机器上跑的hadoop。后来的得到yahoo的亲睐，將他们召入项目组。实现了超级稳定，而且方便易用的 hadoop）

tips1：yahoo自从google出现后，就很少做对事情了，这个算是少有的一个了。

tips2：很佩服两个为开源软件如此煞费苦心的开发者，确实，当物质文明发展到一定程度，做的很多事情都会变的高尚起来～～

这本书已经看了一段时间，看的不是很细致，具体的一些细节以后再慢慢讨论吧。

今天还是希望可以先行安装 hadoop，singalnode版，一睹为快吧！

一.搭建运行环境

A .操作系统

首先，hadoop是基于linux下运行的，win32上的实现还不是很完备。我选择使用ubuntu，这个操作系统是目前流行最广的方便于个人使用的linux 操作系统。

Ubuntu 每半年更新一次，有上千个应用程序，而且自从8.xx版本后安装就变得非常的方便，通过一个叫wubi的加载软件，使得操作系统的安装如同安装qq一样简单，而且与windows可以并存哦！

很多人不用linux主要是担心，软件不够兼容，经过本人最近的尝试，在linux中发现了很多常用软件的替代品或linux版。如：pps based on linux，qq based on linux or qq web,openword,ibus (很接近搜狗输入法了),...

tips3：ubuntu创始人是一个很传奇的人物，Mark shuttleworth---企业家，天文爱好者～～曾经搭乘俄罗斯的航天飞船上过太空，就是在自由的太空中，遥望自己的一生，生出：做一个人人都爱用的自由的操作系统的想法。

Ubuntu的安装网上查阅。

B.安装ssh 和 jdk：

Required software for Linux and Windows include:

1.Java^TM 1.6.x, preferably from Sun, must be installed.

2.ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons.

如果没有装SSH,则 sudo apt-get install openssh-server

Now check that you can ssh to the localhost without a passphrase:

尝试下链接本机：
$ ssh localhost

设置无须通过密码访问（你不嫌烦也不可不设置）：

   If you cannot ssh to localhost without a passphrase, execute the following commands:
   $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
   $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

由于Sun Java runtime是执行hadoop的必备工具，因此我们要安装jre或jdk。我这里装jdk

安装java6 的时候如果通过apt-get 会出现一些问题：

ubuntu默认的源下载的jdk已经换成openjdk，据说与sun-java6-jdk区别不大，但网上也有人说区别不小。我看到网上一个貌似很资深的人说：两者的api都是相同的，只是implementation不同。感兴趣的试一试哈～～

What I did to solve this problem was to add a new source

sudo add-apt-repository "deb http://archive.canonical.com/ lucid partner"

After that a normal

sudo apt-get update sudo apt-get install sun-java6-jdk

it worked for me.

按照上面的做法没问题，我已经试过了～～

C 。下载hadoop

从官网上下载最新的稳定版，并将其解压。

命令为： tar -xvzf hadoop-0.20.2.tar,gz

之后编辑：

cat>>conf/hadoop-env.sh<<EOF

在后面加上：

export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.22

D.hadoop 支持3种模式，前两钟是单节点的模式，后一种为集群模式

a。非分布式模式，作为一个独立的java程序运行。方便调试。

开始默认就是这个模式，测试：

$ mkdir input
$ cp conf/*.xml input
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
$ cat output/*

The following example copies the unpacked conf directory to use as input and then finds and displays every match of the given regular expression. Output is written to the given output directory.

该测试用例为：创建input目录，將conf目录下xml文件拷贝到input中。然后查找input文件中符合要求的内容，打包放入output文件中。

b。hadoop在pseudo分布式模式下运行，各个hadoop镜像在不同的进程中运行。

Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process。

你学要对几个文件进行修改：

配置：configuration

你懂的：Use the following:

conf/core-site.xml:

<configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value></property></configuration>

conf/hdfs-site.xml:

<configuration> <property> <name>dfs.replication</name> <value>1</value> </property></configuration>

conf/mapred-site.xml:

<configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value></property></configuration>

执行：

格式化一个分布式系统

Format a new distributed-filesystem:
$ bin/hadoop namenode -format

启动各个hadoop镜像

Start the hadoop daemons:
$ bin/start-all.sh

hadoop镜像被写到。。。上

The hadoop daemon log output is written to the ${HADOOP_LOG_DIR} directory (defaults to ${HADOOP_HOME}/logs).

通过浏览器访问可以看到namenode和jobtrackker的日志

Browse the web interface for the NameNode and the JobTracker; by default they are available at:

NameNode - http://localhost:50070/
JobTracker - http://localhost:50030/

测试用例：

Copy the input files into the distributed filesystem:
$ bin/hadoop fs -put conf input

Run some of the examples provided:
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'

Examine the output files:

Copy the output files from the distributed filesystem to the local filesytem and examine them:
$ bin/hadoop fs -get output output
$ cat output/*

View the output files on the distributed filesystem:
$ bin/hadoop fs -cat output/*

When you're done, stop the daemons with:
$ bin/stop-all.sh

今天初步实现hadoop的单节点的配置运行，虽然要写这么大堆东西，但效率明显提高了。明天继续～～深入分析单节点下的hadoop运行过程～～～

1.6.0.2

floatcqy

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
hadoop 单节点的实现详细步骤

闲在家中实在无聊，决定开始做毕业设计 hadoop：关于hadoop的资料网上有不少，但是书不多，只有本《hadoop权威指南》，我看过一点书评，据说翻译的很差，后来还是选择阅读原版书籍。（慢慢看还是看到懂的哈）介绍：Hadoop got its sta
复制链接

扫一扫