转载自: http://cloud.csdn.net/a/20100901/278948.html?1290931484
在装Hadoop之前首先需要:
1.java1.6.x 最好是sun的,1.5.x也可以
2.ssh
安装ssh
$ sudo apt-get install ssh $ sudo apt-get install rsync |
下载Hadoop
从http://hadoop.apache.org/core/releases.html 下载最近发布的版本
最好为hadoop创建一个用户:
比如创建一个group为hadoop user为hadoop的用户以及组
$ sudo addgroup hadoop $ sudo adduser --ingroup hadoop hadoop |
解压下载的hadoop文件,放到/home/hadoop目录下 名字为hadoop
配置JAVA_HOME:
gedit ~/hadoop/conf/hadoop-env.sh |
将Java代码
1. # The java implementation to use. Required. 2. # export JAVA_HOME=/usr/lib/j2sdk1.5-sun # The java implementation to use. Required. # export JAVA_HOME=/usr/lib/j2sdk1.5-sun |
修改成java的安装目录:
# The java implementation to use. Required. export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.15 |
现在可以使用单节点的方式运行:
$ cd hadoop $ mkdir input $ cp conf/*.xml input $ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+' $ cat output/* |
Pseudo-distributed方式跑:
配置ssh
$ su - hadoop $ ssh-keygen -t rsa -P "" Generating public/private rsa key pair. Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): Created directory '/home/hadoop/.ssh'. Your identification has been saved in /home/hadoop/.ssh/id_rsa. Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub. The key fingerprint is: 9d:47:ab:d7:22:54:f0:f9:b9:3b:64:93:12:75:81:27 hadoop@ubuntu |
让其不输入密码就能登录:
hadoop@ubuntu:~$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys |
使用:
$ ssh localhost |
看看是不是直接ok了。
hadoop配置文件:
conf/core-site.xml |
Java代码
1. <?xml version="1.0"?> 2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 3. 4. <!-- Put site-specific property overrides in this file. --> 5. 6. <configuration> 7. <property> 8. <name>hadoop.tmp.dir</name> 9. <value>/home/hadoop/hadoop-datastore/hadoop-${user.name}</value> 10. </property> 11. <property> 12. <name>fs.default.name</name> 13. <value>hdfs://localhost:9000</value> 14. </property> 15. </configuration> <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/hadoop-datastore/hadoop-${user.name}</value> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration> |
hadoop.tmp.dir配置为你想要的路径,${user.name}会自动扩展为运行hadoop的代码
1. <configuration> 2. <property> 3. <name>dfs.replication</name> 4. <value>1</value> 5. </property> 6. </configuration> <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration> dfs.replication为默认block复制数量 conf/mapred-site.xml Xml代码 1. <configuration> 2. <property> 3. <name>mapred.job.tracker</name> 4. <value>localhost:9001</value> 5. </property> 6. </configuration> <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> </configuration> 执行 格式化分布式文件系统: $ bin/hadoop namenode -format 启动hadoop: Java代码 1. $ bin/start-all.sh $ bin/start-all.sh 可以从 NameNode - http://localhost:50070/ JobTracker - http://localhost:50030/ 查看NameNode和JobTracker 运行例子: $ bin/hadoop fs -put conf input $ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+' look at the run result: $ bin/hadoop fs -get output output $ cat output/* |