下面是1.1.0 release版本的singlealone安装情况
1. 安装
1.1 官方网站http://hadoop.apache.org,下载hadoop-1.10.tar.gz
1.2 解压缩: tar zxvf hadoop-1.10.tar.gz /home/hadoop
1.4修订 /home/hadoop/hadoop-1.1.0/conf/hadoop-env.sh
配置java路径 export JAVA_HOME=/usr/lib/jvm/java
配置hadoop路径 export HADOOP_HOME=/home/hadoop/hadoop-1.1.0
2. 配置ssh
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
若是一个集群的话,把master的authorized_keys文件追加到各个slave中的authorized_keys文件。
service sshd 使之有效。
3. 配置Hadoop内部的环境
1. conf/core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>localhost:9000</value>
</property>
</configuration>
2. conf/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
3.conf/mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
4 验证
bin/start-all.sh
1. /home/bruce/1.txt,内容为:
hello world
hello hadoop
文件内容输入
2.# bin/hadoop dfs -put /home/bruce/1.txt brucetest
# bin/hadoop dfs -ls 或者 # bin/hadoop dfs -ls /user/bruce
# bin/hadoop dfs -cat brucetest或者 # bin/hadoop dfs -cat /user/bruce/brucetest 3. 执行map/reduce bin/hadoop jar hadoop-examples-1.1.0.jar wordcount brucetest outdir 4. 查看结果 bin/hadoop dfs -cat outdir/* hadoop 1 hello 2 world 1
下面是0.2版本的安装情况
1. 下载0.2 版本文件,解压
2 . 伪分布 配置
设置conf中的hadoop-env.sh,设置各个主从结点的hadoop JAVA_HOME变量
设置conf中的如下三个配置文件
conf/core-site.xml: <configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration> conf/hdfs-site.xml: <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration> conf/mapred-site.xml: <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> </configuration>
设置ssh登陆
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
启动正常,如下所示:
bruce@ubuntu:/home/hadoop/bin$ ./start-all.sh starting namenode, logging to /home/hadoop/bin/../logs/hadoop-bruce-namenode-ubuntu.out localhost: starting datanode, logging to /home/hadoop/bin/../logs/hadoop-bruce-datanode-ubuntu.out localhost: starting secondarynamenode, logging to /home/hadoop/bin/../logs/hadoop-bruce-secondarynamenode-ubuntu.out starting jobtracker, logging to /home/hadoop/bin/../logs/hadoop-bruce-jobtracker-ubuntu.out localhost: starting tasktracker, logging to /home/hadoop/bin/../logs/hadoop-bruce-tasktracker-ubuntu.out
3. web监控
- NameNode - http://localhost:50070/
- JobTracker - http://localhost:50030/
将输入文件拷贝到分布式文件系统:
$hadoop fs -mkdir brucecppstudy
$hadoop fs -put /home/bruce/study/cpp/* brucecppstudy
$ hadoop fs -put conf input
查看NameNode web展示结果如下:
Cluster Summary
58 files and directories, 42 blocks = 100 total. Heap Size is 7.56 MB / 966.69 MB (0%)
Configured Capacity : 19.19 GB
DFS Used : 137.78 KB
Non DFS Used : 4.5 GB
DFS Remaining : 14.69 GB
DFS Used% : 0 %
DFS Remaining% : 76.55 %
Live Nodes : 1
Dead Nodes : 0
运行发行版提供的示例程序:
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
查看输出文件:
将输出文件从分布式文件系统拷贝到本地文件系统查看:
$ bin/hadoop fs -get output output
$ cat output/*
将输出文件从分布式文件系统拷贝到本地文件系统查看:
$ bin/hadoop fs -get output output
$ cat output/*
或者
在分布式文件系统上查看输出文件:
$ bin/hadoop fs -cat output/*
完成全部操作后,停止守护进程:
$ bin/stop-all.sh