参考:
http://fuliang.iteye.com/blog/562396
install java:
JAVA相信大家都很熟悉了。下面就来看一下Ubuntu 12.04 怎样安装java 7。做好准备工作,首先,需要删除OpenJDK。
sudo apt-get purge openjdk*
安装oracle java 7,打开终端运行下面的命令:
sudo add-apt-repository ppa:eugenesan/java
sudo apt-get update
sudo apt-get install oracle-java7-installer
下载Hadoop
从http://hadoop.apache.org/core/releases.html 下载最近发布的版本
最好为hadoop创建一个用户:
比如创建一个group为hadoop user为hadoop的用户以及组
$ sudo adduser --ingroup hadoop hadoop
解压下载的hadoop文件,放到/home/hadoop目录下 名字为hadoop
配置JAVA_HOME:
将
- # The java implementation to use. Required.
- # export JAVA_HOME=/usr/lib/j2sdk1.5-sun
修改成java的安装目录:(我的是:/usr/lib/jvm/java-6-sun-1.6.0.15)
export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.15
现在可以使用单节点的方式运行:
$ mkdir input
$ cp conf/*.xml input
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
$ cat output/*
Pseudo-distributed方式跑:
配置ssh
$ ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Created directory '/home/hadoop/.ssh'.
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
9d:47:ab:d7:22:54:f0:f9:b9:3b:64:93:12:75:81:27 hadoop@ubuntu
让其不输入密码就能登录:
使用:
看看是不是直接ok了。
hadoop配置文件:
conf/core-site.xml
- <?xml version="1.0"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <!-- Put site-specific property overrides in this file. -->
- <configuration>
- <property>
- <name>hadoop.tmp.dir</name>
- <value>/home/hadoop/hadoop-datastore/hadoop-${user.name}</value>
- </property>
- <property>
- <name>fs.default.name</name>
- <value>hdfs://localhost:9000</value>
- </property>
- </configuration>
hadoop.tmp.dir配置为你想要的路径,${user.name}会自动扩展为运行hadoop的用户名
conf/hdfs-site.xml
- <configuration>
- <property>
- <name>dfs.replication</name>
- <value>1</value>
- </property>
- </configuration>
dfs.replication为默认block复制数量
conf/mapred-site.xml
- <configuration>
- <property>
- <name>mapred.job.tracker</name>
- <value>localhost:9001</value>
- </property>
- </configuration>
执行
格式化分布式文件系统:
启动hadoop:
- $ bin/start-all.sh
可以从
JobTracker - http://localhost:50030/
查看NameNode和JobTracker
运行例子:
look at the run result:
$ cat output/*
参考: 1、http://hadoop.apache.org/common/docs/current/quickstart.html
2、http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29