Ubuntu下Hadoop单机部署及分布式集群部署

Ubuntu下Hadoop单机部署及分布式集群部署


CDH 4.5 手动安装 
http://wenku.baidu.com/view/6544c87f2e3f5727a5e962a3.html 

hadoop-1.1.2.tar.gz 也测试通过 
重要 
安装文档http://wenku.baidu.com/view/685f71b165ce050876321329.html 
在选择网络连接时,选择桥接模式 


设置root用户密码 
打开终端 ctrl+Alt+T 
修改root 密码 sudo passwd root 
输入密码 
用户root用户登录 su root 

ubuntu 8.10 默认没有安装ssh服务 ,需要手动安装以后才能实现  
sudo apt-get install ssh 
或sudo apt-get install openssh-server //安装openssh-server 
用ifconfig查看ip地址 
远程用crt连接 
ubuntu 10.2.128.46 
ubuntu1 10.2.128.20 
ubuntu2 10.2.128.120 

安装vim 
sudo apt-get install vim 

1、安装JDK 

1.1、到官网下载相关的JDK 

这里下载的是 jdk-6u23-linux-i586.bin。 

下载地址:http://www.oracle.com/technetwork/java/javase/downloads/index.html 
找jdk6 

放置在/home/qiaowang 
sudo sh jdk-6u23-linux-i586.bin 
cp -rf jdk1.6.0_33/ /usr/lib/ 

sudo gedit /etc/environment 
export JAVA_HOME=/usr/lib/jdk1.6.0_33 
export JRE_HOME=/usr/lib/jdk1.6.0_33/jre 
export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib 

vim /etc/profile 
export JAVA_HOME=/usr/lib/jdk1.6.0_33 
export JRE_HOME=/usr/lib/jdk1.6.0_33/jre 
export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib 
export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH:$JAVA_HOME/bin 

加在umak前即可 

source /etc/profile 
reboot 
root@qiaowang-virtual-machine:/etc# java -version 
java version "1.6.0_33" 
Java(TM) SE Runtime Environment (build 1.6.0_33-b03) 
Java HotSpot(TM) Client VM (build 20.8-b03, mixed mode) 

JDK环境的操作需要在所有的namenode和datanode上面进行操作。 

2、增加一个用户组用户,用于hadoop运行及访问。 
sudo addgroup hadoop 
sudo adduser --ingroup hadoop hadoop 

查看用户所属组: 
id 用户名 

查看组内用户: 
groups 用户名 

查看所有用户: 
cat /etc/shadow 

删除用户 
在root用户下:userdel -r newuser 
在普通用户下:sudo userdel -r newuser 
先退出 再删除 

3、生成SSH证书,配置SSH加密key 

  su - hadoop                         //切换到hadoop用户 
  ssh-keygen -t rsa -P ""             //生成ssh key 
  cd .ssh/ 
  cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys  //设置允许ssh访问 
  cat /home/hadoop/.ssh/id_rsa.pub >> /home/hadoop/.ssh/authorized_keys 
设置完成后通过ssh localhost测试一下。 



把#去掉即可,系统就能通过authorized_keys来识别公钥了 

4、下载hadoop发行版,地址: 
http://hadoop.apache.org/common/releases.html#Download 
http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-0.20.2/ 

最新版本 
hadoop-2.0.0-cdh4.5.0.tar.gz 

已拷贝到opt 

tar -zxvf hadoop-0.20.2.tar.gz 
tar -zxvf hadoop-2.0.0-cdh4.5.0.tar.gz 

5、修改主机名 qiaowang-virtual-machine 
root@qiaowang-virtual-machine:/opt# hostname 
qiaowang-virtual-machine 
假定我们发现我们的机器的主机名不是我们想要的,通过对"/etc/sysconfig/network"文件修改其中"HOSTNAME"后面的值,改成我们规划的名称。 
vim /etc/hostname 
Master.Hadoop 

执行 
hostname m1hadoop.focus.cn 

root@Master:~# hostname 
Master.Hadoop 

vim /etc/hosts 
127.0.1.1       Master.Hadoop 

后面的配置 
参考 
http://wenku.baidu.com/view/6544c87f2e3f5727a5e962a3.html 

1、core-site.xml 
<property> 
<name>fs.default.name</name> 
<value>hdfs://m1hadoop.xingmeng.com:8020</value> 
<final>true</final> 
</property> 
<property> 
<name>hadoop.tmp.dir</name> 
<value>/home/hadoop/tempdata</value> 
</property> 

2、yarn-site.xml 
<property> 
<name>yarn.resourcemanager.address</name> 
<value>m1hadoop.xingmeng.com:8032</value> 
</property> 
<property> 
<name>yarn.resourcemanager.scheduler.address</name> 
<value>m1hadoop.xingmeng.com:8030</value> 
</property> 
<property> 
<name>yarn.resourcemanager.resource-tracker.address</name> 
<value>m1hadoop.xingmeng.com:8031</value> 
</property> 
<property> 
<name>yarn.resourcemanager.admin.address</name> 
<value>m1hadoop.xingmeng.com:8033</value> 
</property> 
<property> 
<name>yarn.resourcemanager.webapp.address</name> 
<value>m1hadoop.xingmeng.com:8088</value> 
</property> 
<property> 
<name>yarn.nodemanager.aux-services</name> 
<value>mapreduce.shuffle</value> 
</property> 
<property> 
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> 
<value>org.apache.hadoop.mapred.ShuffleHandler</value> 
</property> 

3、mapred-site.xml 
<property> 
<name>mapreduce.framework.name</name> 
<value>yarn</value> 
</property> 
<property> 
<name>mapred.system.dir</name> 
<value>file:/home/hadoop/mapred_system</value> 
<final>true</final> 
</property> 
<property> 
<name>mapred.local.dir</name> 
<value>file:/home/hadoop/mapred_local</value> 
<final>true</final> 
</property> 

4、hdfs-site.xml 
<property> 
<name>dfs.namenode.name.dir</name> 
<value>/home/hadoop/name</value> 
</property> 
<property> 
<name>dfs.datanode.data.dir</name> 
<value>/home/hadoop/data</value> 
</property> 
<property> 
<name>dfs.replication</name> 
<value>2</value> 
</property> 
<property> 
<name>dfs.permissions</name> 
<value>false</value> 
</property> 

./hdfs namenode -format 
/home/hadoop/cdh/sbin/ 

/home/hadoop/cdh/bin/hadoop fs -ls /user/hadoop 

如果ip有变化 请先修改 /etc/host 

datanode 没起来 
删除/home/hadoop/data/下的文件 
-------------------------------------------------- 
4.关掉ipv6 
  修改hadoop根目录下conf/hadoop-env.sh文件(还没下载hadoop的下载解压先~) 
  export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true 

cat /proc/sys/net/ipv6/conf/all/disable_ipv6 
为0  

备选情况:为1是成功,应使用以下方式 
net.ipv6.conf.all.disable_ipv6 = 1 
net.ipv6.conf.default.disable_ipv6 = 1 
net.ipv6.conf.lo.disable_ipv6 = 1、 

5、将hadoop目录所有者更改为hadoop 
chown -R hadoop:hadoop /opt/hadoop-0.20.2/ 
mv hadoop-0.20.2 hadoop 
6.安装hadoop 
下面说说如何配置和启动: 
基本思路 
a、配置JDK 
b  配置core-site.xml 
c  mapred-site.xml 
d  hdfs-site.xml 

创建存放数据的目录 
mkdir /opt/hadoop-datastore 

打开conf/core-site.xml,配置如下 
<configuration> 
  <property> 
    <name>hadoop.tmp.dir</name> 
    <value>/opt/hadoop-datastore/</value> 
    <description>A base for other temporary directories.</description> 
  </property> 

  <property> 
    <name>fs.default.name</name> 
    <value>hdfs://localhost:54310</value> 
    <description>The name of the default file system.  A URI whose 
  scheme and authority determine the FileSystem implementation.  The 
  uri's scheme determines the config property (fs.SCHEME.impl) naming 
  the FileSystem implementation class.  The uri's authority is used to 
  determine the host, port, etc. for a filesystem.</description> 
  </property> 
</configuration> 

mapred-site.xml如下: 

<configuration> 
<property> 
  <name>mapred.job.tracker</name> 
  <value>localhost:54311</value> 
  <description>The host and port that the MapReduce job tracker runs 
  at.  If "local", then jobs are run in-process as a single map 
  and reduce task. 
  </description> 
</property> 
</configuration> 

hdfs-site.xml如下: 
<configuration> 
<property> 
  <name>dfs.replication</name> 
  <value>1</value> 
  <description>Default block replication. 
  The actual number of replications can be specified when the file is created. 
  The default is used if replication is not specified in create time. 
  </description> 
</property> 
</configuration> 

vim hadoop-env.sh 
export JAVA_HOME=/usr/lib/jdk1.6.0_33 

ok,配置完毕 
格式化HDFS: 
/opt/hadoop/bin/hadoop namenode -format 
输出 
root@Master:/opt/hadoop# /opt/hadoop/bin/hadoop namenode -format 
12/07/13 14:27:29 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************ 
STARTUP_MSG: Starting NameNode 
STARTUP_MSG:   host = Master.Hadoop/127.0.1.1 
STARTUP_MSG:   args = [-format] 
STARTUP_MSG:   version = 0.20.2 
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010 
************************************************************/ 
Re-format filesystem in /opt/hadoop-datastore/dfs/name ? (Y or N) y 
Format aborted in /opt/hadoop-datastore/dfs/name 
12/07/13 14:27:35 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************ 
SHUTDOWN_MSG: Shutting down NameNode at Master.Hadoop/127.0.1.1 
************************************************************/ 

启动HDFS和MapReduce 
改为hadoop用户 
/opt/hadoop/bin/start-all.sh 

输出 
starting namenode, logging to /opt/hadoop/bin/../logs/hadoop-root-namenode-Master.Hadoop.out 
The authenticity of host 'localhost (127.0.0.1)' can't be established. 
ECDSA key fingerprint is 3e:55:d8:be:47:46:21:95:29:9b:9e:c5:fb:02:f4:d2. 
Are you sure you want to continue connecting (yes/no)? yes 
localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts. 
root@localhost's password: 
localhost: starting datanode, logging to /opt/hadoop/bin/../logs/hadoop-root-datanode-Master.Hadoop.out 
root@localhost's password: 
localhost: starting secondarynamenode, logging to /opt/hadoop/bin/../logs/hadoop-root-secondarynamenode-Master.Hadoop.out 
starting jobtracker, logging to /opt/hadoop/bin/../logs/hadoop-root-jobtracker-Master.Hadoop.out 
root@localhost's password: 
localhost: starting tasktracker, logging to /opt/hadoop/bin/../logs/hadoop-root-tasktracker-Master.Hadoop.out 

9、停止服务的脚本是: 
/opt/hadoop/bin/stop-all.sh 


10.启动成功后,用jps查看下。 
2914 NameNode 
3197 JobTracker 
3896 Jps 
3024 DataNode 
3126 SecondaryNameNode 
3304 TaskTracker 

5.运行wordcount.java 

在Hadoop所在目录里有几个jar文件,其中hadoop-examples-0.20.203.0.jar就是我们需要的,它里面含有wordcount,咱们使用命令建立测试的文件 

(1)先在本地磁盘建立两个输入文件file01 和file02: 
$ echo “Hello World Bye World” > file01 
$ echo “Hello Hadoop Goodbye Hadoop” > file02 

./hadoop fs -ls / 

(2)在hdfs 中建立一个input 目录:./hadoop fs -mkdir input 

删除./hadoop dfs -rmr input 
(3)将file01 和file02 拷贝到hdfs 中: 
./hadoop fs -copyFromLocal /home/qiaowang/file0* input 

./hadoop fs -ls /user/root/input 
Found 2 items 
-rw-r--r--   1 root supergroup         22 2012-07-13 15:07 /user/root/input/file01 
-rw-r--r--   1 root supergroup         28 2012-07-13 15:07 /user/root/input/file02 



root@Master:/opt/hadoop/bin# ./hadoop fs -cat /user/root/input/file01/ 
Hello World Bye World 

(4)执行wordcount: 
$ Hadoop jar hadoop-0.20.1-examples.jar wordcount input output 

$ bin/hadoop jar hadoop-0.20.1-examples.jar wordcount input output 


Exceptioninthread"m 

ain" java.io.IOException: Error openingjobjar: hadoop-0.20.2-examples.jar 

at org.apache.hadoop.util.RunJar.main(RunJar.java:90) 
Caused by: java.util.zip.ZipException: error in opening zip file 
        at java.util.zip.ZipFile.open(Native Method) 
        at java.util.zip.ZipFile.<init>(ZipFile.java:114) 
        at java.util.jar.JarFile.<init>(JarFile.java:135) 
        at java.util.jar.JarFile.<init>(JarFile.java:72) 
        at org.apache.hadoop.util.RunJar.main(RunJar.java:88) 


解决办法: 
注意路径: 
./hadoop jar /opt/hadoop/hadoop-0.20.2-examples.jar wordcount input output 

输出 

12/07/13 15:20:22 INFO input.FileInputFormat: Total input paths to process : 2 
12/07/13 15:20:22 INFO mapred.JobClient: Running job: job_201207131429_0001 
12/07/13 15:20:23 INFO mapred.JobClient:  map 0% reduce 0% 
12/07/13 15:20:32 INFO mapred.JobClient:  map 100% reduce 0% 
12/07/13 15:20:44 INFO mapred.JobClient:  map 100% reduce 100% 
12/07/13 15:20:46 INFO mapred.JobClient: Job complete: job_201207131429_0001 
12/07/13 15:20:46 INFO mapred.JobClient: Counters: 17 
12/07/13 15:20:46 INFO mapred.JobClient:   Job Counters 
12/07/13 15:20:46 INFO mapred.JobClient:     Launched reduce tasks=1 
12/07/13 15:20:46 INFO mapred.JobClient:     Launched map tasks=2 
12/07/13 15:20:46 INFO mapred.JobClient:     Data-local map tasks=2 
12/07/13 15:20:46 INFO mapred.JobClient:   FileSystemCounters 
12/07/13 15:20:46 INFO mapred.JobClient:     FILE_BYTES_READ=79 
12/07/13 15:20:46 INFO mapred.JobClient:     HDFS_BYTES_READ=50 
12/07/13 15:20:46 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=228 
12/07/13 15:20:46 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=41 
12/07/13 15:20:46 INFO mapred.JobClient:   Map-Reduce Framework 
12/07/13 15:20:46 INFO mapred.JobClient:     Reduce input groups=5 
12/07/13 15:20:46 INFO mapred.JobClient:     Combine output records=6 
12/07/13 15:20:46 INFO mapred.JobClient:     Map input records=2 
12/07/13 15:20:46 INFO mapred.JobClient:     Reduce shuffle bytes=45 
12/07/13 15:20:46 INFO mapred.JobClient:     Reduce output records=5 
12/07/13 15:20:46 INFO mapred.JobClient:     Spilled Records=12 
12/07/13 15:20:46 INFO mapred.JobClient:     Map output bytes=82 
12/07/13 15:20:46 INFO mapred.JobClient:     Combine input records=8 
12/07/13 15:20:46 INFO mapred.JobClient:     Map output records=8 
12/07/13 15:20:46 INFO mapred.JobClient:     Reduce input records=6 


(5)完成之后,查看结果: 
root@Master:/opt/hadoop/bin# ./hadoop fs -cat /user/root/output/part-r-00000 
Bye     1 
GoodBye 1 
Hadoop  2 
Hello   2 
World   2 

root@Master:/opt/hadoop/bin# jps 
3049 TaskTracker 
2582 DataNode 
2849 JobTracker 
10386 Jps 
2361 NameNode 
2785 SecondaryNameNode 

OK 以上部分,已完成了 ubuntu下单机hadoop的搭建。 

-------------------------------------------------------- 
下面我们进行集群的搭建(3台ubuntu服务器) 

参考 
http://www.linuxidc.com/Linux/2011-04/35162.htm 
http://www.2cto.com/os/201202/118992.html 

1、三台机器:已安装jdk,添加hadoop用户 
ubuntu 10.2.128.46 master 
ubuntu1 10.2.128.20 slave1 
ubuntu2 10.2.128.120 slave2 

修改三台机器所有的/etc/hosts文件如下: 
127.0.0.1       localhost 
10.2.128.46     master.Hadoop 
10.2.128.20     slave1.Hadoop 
10.2.128.120    slave2.Hadoop 

以下操作均在Hadoop用户下操作 
2、生成SSH证书,配置SSH加密key 

  su - hadoop                         //切换到hadoop用户 
  ssh-keygen -t rsa -P ""             //生成ssh key 
  cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys  //设置允许ssh访问 

在namenode(Master)上 
hadoop@Master:~/.ssh$ scp authorized_keys Slave1.Hadoop:/home/hadoop/.ssh/ 
hadoop@Master:~/.ssh$ scp authorized_keys Slave2.Hadoop:/home/hadoop/.ssh/ 

测试:ssh node2或者ssh node3(第一次需要输入yes)。 
如果不须要输入密码则配置成功,如果还须要请检查上面的配置能不能正确。 
hadoop@Master:~/.ssh$ ssh Slave1.Hadoop 
Welcome to Ubuntu precise (development branch) 
hadoop@Master:~/.ssh$ ssh Slave2.Hadoop 
Welcome to Ubuntu precise (development branch) 

2、hadoop-0.20.2.tar.gz 拷贝到/home/qiaowang/install_Hadoop目录下 
可采用的方法 
1 ) 安装Hadoop集群通常要将安装软件解压到集群内的所有机器上。并且安装路径要一致,如果我们用HADOOP_HOME指代安装的根路径,通常,集群里的所有机器的 
HADOOP_HOME路径相同。 
2 ) 如果集群内机器的环境完全一样,可以在一台机器上配置好,然后把配置好的软件即hadoop-0.20.203整个文件夹拷贝到其他机器的相同位置即可。 
3 ) 可以将Master上的Hadoop通过scp拷贝到每一个Slave相同的目录下,同时根据每一个Slave的Java_HOME 的不同修改其hadoop-env.sh 。 


3,相关配置 
4) 为了方便,使用hadoop命令或者start-all.sh等命令,修改Master上/etc/profile 新增以下内容: 
export JAVA_HOME=/usr/lib/jdk1.6.0_33 
export JRE_HOME=/usr/lib/jdk1.6.0_33/jre 
export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib 
export HADOOP_HOME=/opt/hadoop 
export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin 
修改完毕后,执行source /etc/profile 来使其生效。 

配置conf下的文件: 
vim hadoop-env.sh 
export JAVA_HOME=/usr/lib/jdk1.6.0_33 

vim core-site.xml 
---------------------------------- 
<configuration> 
  <property> 
    <name>hadoop.tmp.dir</name> 
    <value>/opt/hadoop-datastore/</value> 
    <description>A base for other temporary directories.</description> 
  </property> 

  <property> 
    <name>fs.default.name</name> 
    <value>hdfs://Master.Hadoop:54310</value> 
    <description>The name of the default file system.  A URI whose 
  scheme and authority determine the FileSystem implementation.  The 
  uri's scheme determines the config property (fs.SCHEME.impl) naming 
  the FileSystem implementation class.  The uri's authority is used to 
  determine the host, port, etc. for a filesystem.</description> 
  </property> 
</configuration> 
----------------------------------------- 
vim hdfs-site.xml 
------------------------------------------ 
<configuration> 
<property> 
  <name>dfs.replication</name> 
  <value>3</value> 
  <description>Default block replication. 
  The actual number of replications can be specified when the file is created. 
  The default is used if replication is not specified in create time. 
  </description> 
</property> 
</configuration> 
------------------------------------- 
vim mapred-site.xml 
------------------------------------ 
<configuration> 
<property> 
  <name>mapred.job.tracker</name> 
  <value>Master.Hadoop:54311</value> 
  <description>The host and port that the MapReduce job tracker runs 
  at.  If "local", then jobs are run in-process as a single map 
  and reduce task. 
  </description> 
</property> 
</configuration> 
------------------------------------- 
vim masters 
Master.Hadoop 
root@Master:/opt/hadoop/conf# vim slaves 
Slave1.Hadoop 
Slave2.Hadoop 


采用方法3,将Master上的Hadoop拷贝到每个Slave下 
切换为root用户 
su root 
执行 scp -r hadoop Slave1.Hadoop:/opt/ 
在Slave1.Hadoop上 
su root 
chown -R hadoop:hadoop /opt/hadoop/ 
创建目录 
mkdir /opt/hadoop-datastore/ 
chown -R hadoop:hadoop /opt/hadoop-datastore/ 
同理 其他Slave 

在namenode执行 格式化hadoop 
root@Master:/opt/hadoop/bin# hadoop namenode -format 
输出: 
12/07/23 18:54:36 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************ 
STARTUP_MSG: Starting NameNode 
STARTUP_MSG:   host = Master.Hadoop/10.2.128.46 
STARTUP_MSG:   args = [-format] 
STARTUP_MSG:   version = 0.20.2 
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010 
************************************************************/ 
Re-format filesystem in /opt/hadoop-datastore/dfs/name ? (Y or N) y 
Format aborted in /opt/hadoop-datastore/dfs/name 
12/07/23 18:54:45 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************ 
SHUTDOWN_MSG: Shutting down NameNode at Master.Hadoop/10.2.128.46 
************************************************************/ 

启动hadoop 
./start-all.sh 
root@Master:/opt# chown -R hadoop:hadoop /opt/hadoop/ 
root@Master:/opt# chown -R hadoop:hadoop /opt/hadoop-datastore/ 
root@Master:/opt# su hadoop 
hadoop@Master:/opt$ cd hadoop/bin/ 

hadoop@Master:/opt/hadoop/bin$ ./start-all.sh 

遇到的问题: 
starting namenode, logging to /opt/hadoop/bin/../logs/hadoop-hadoop-namenode-Master.Hadoop.out 
Slave1.Hadoop: datanode running as process 7309. Stop it first. 
Slave2.Hadoop: datanode running as process 4920. Stop it first. 
Master.Hadoop: starting secondarynamenode, logging to /opt/hadoop/bin/../logs/hadoop-hadoop-secondarynamenode-Master.Hadoop.out 
starting jobtracker, logging to /opt/hadoop/bin/../logs/hadoop-hadoop-jobtracker-Master.Hadoop.out 
Slave1.Hadoop: tasktracker running as process 7477. Stop it first. 
Slave2.Hadoop: tasktracker running as process 5088. Stop it first. 

网上参考: 
可能是楼主启动集群后,又重复格式化namenode导致的。 
如果只是测试学习,可以使用如下解决方法: 
1、首先kill掉26755、21863和26654几个进程。如果kill 26755不行,可以kill -kill 26755。 
2、手动删除conf/hdfs-site.xml文件中配置的dfs.data.dir目录下的内容。 
3、执行$HADOOP_HOME/bin/hadoop namenode -format 
4、启动集群$HADOOP_HOME/bin/start-all.sh 
后果: 
HDFS中内容会全部丢失。 

解决方案:重新进行了格式化 
su hadoop 
hadoop@Master:/opt/hadoop/bin$ ./hadoop namenode -format 
12/07/24 10:43:29 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************ 
STARTUP_MSG: Starting NameNode 
STARTUP_MSG:   host = Master.Hadoop/10.2.128.46 
STARTUP_MSG:   args = [-format] 
STARTUP_MSG:   version = 0.20.2 
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010 
************************************************************/ 
Re-format filesystem in /opt/hadoop-datastore/dfs/name ? (Y or N) y 
Format aborted in /opt/hadoop-datastore/dfs/name 
12/07/24 10:43:32 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************ 
SHUTDOWN_MSG: Shutting down NameNode at Master.Hadoop/10.2.128.46 
************************************************************/ 

hadoop@Master:/opt/hadoop/bin$ ./start-all.sh 
starting namenode, logging to /opt/hadoop/bin/../logs/hadoop-hadoop-namenode-Master.Hadoop.out 
Slave1.Hadoop: starting datanode, logging to /opt/hadoop/bin/../logs/hadoop-hadoop-datanode-Slave1.Hadoop.out 
Slave2.Hadoop: starting datanode, logging to /opt/hadoop/bin/../logs/hadoop-hadoop-datanode-Slave2.Hadoop.out 
Master.Hadoop: starting secondarynamenode, logging to /opt/hadoop/bin/../logs/hadoop-hadoop-secondarynamenode-Master.Hadoop.out 
starting jobtracker, logging to /opt/hadoop/bin/../logs/hadoop-hadoop-jobtracker-Master.Hadoop.out 
Slave2.Hadoop: starting tasktracker, logging to /opt/hadoop/bin/../logs/hadoop-hadoop-tasktracker-Slave2.Hadoop.out 
Slave1.Hadoop: starting tasktracker, logging to /opt/hadoop/bin/../logs/hadoop-hadoop-tasktracker-Slave1.Hadoop.out 
hadoop@Master:/opt/hadoop/bin$ 

-------------------------------------------------------------------------------------------------------------------------- 
以下为验证部分: 
hadoop@Master:/opt/hadoop/bin$  ./hadoop dfsadmin -report 
Safe mode is ON 
Configured Capacity: 41137831936 (38.31 GB) 
Present Capacity: 31127531520 (28.99 GB) 
DFS Remaining: 31127482368 (28.99 GB) 
DFS Used: 49152 (48 KB) 
DFS Used%: 0% 
Under replicated blocks: 0 
Blocks with corrupt replicas: 0 
Missing blocks: 0 

------------------------------------------------- 
Datanodes available: 2 (2 total, 0 dead) 

Name: 10.2.128.120:50010 
Decommission Status : Normal 
Configured Capacity: 20568915968 (19.16 GB) 
DFS Used: 24576 (24 KB) 
Non DFS Used: 4913000448 (4.58 GB) 
DFS Remaining: 15655890944(14.58 GB) 
DFS Used%: 0% 
DFS Remaining%: 76.11% 
Last contact: Tue Jul 24 10:50:43 CST 2012 


Name: 10.2.128.20:50010 
Decommission Status : Normal 
Configured Capacity: 20568915968 (19.16 GB) 
DFS Used: 24576 (24 KB) 
Non DFS Used: 5097299968 (4.75 GB) 
DFS Remaining: 15471591424(14.41 GB) 
DFS Used%: 0% 
DFS Remaining%: 75.22% 
Last contact: Tue Jul 24 10:50:41 CST 2012 

web查看方式:http://10.2.128.46:50070/ 
查看job信息 
http://10.2.128.46:50030/jobtracker.jsp 

要想检查守护进程是否正在运行,可以使用 jps 命令(这是用于 JVM 进程的ps 实用程序)。这个命令列出 5 个守护进程及其进程标识符。 
hadoop@Master:/opt/hadoop/conf$ jps 
2823 Jps 
2508 JobTracker 
2221 NameNode 
2455 SecondaryNameNode 

netstat -nat 
tcp        0      0 10.2.128.46:54311       0.0.0.0:*               LISTEN     
tcp        0      0 10.2.128.46:54310       10.2.128.46:44150       ESTABLISHED 
tcp      267      0 10.2.128.46:54311       10.2.128.120:48958      ESTABLISHED 
tcp        0      0 10.2.128.46:54310       10.2.128.20:41230       ESTABLISHED 


./hadoop dfs -ls / 
hadoop@Master:/opt/hadoop/bin$ ./hadoop dfs -ls / 
Found 2 items 
drwxr-xr-x   - root supergroup          0 2012-07-13 15:20 /opt 
drwxr-xr-x   - root supergroup          0 2012-07-13 15:20 /user 

hadoop@Master:/opt/hadoop$ bin/hadoop fs -mkdir input 
遇到问题: 
mkdir: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /user/hadoop/input. Name node is in safe mode. 
那什么是Hadoop的安全模式呢? 

在分布式文件系统启动的时候,开始的时候会有安全模式,当分布式文件系统处于安全模式的情况下,文件系统中的内容不允许修改也不允许删除,直到安全 模式结束。 
安全模式主要是为了系统启动的时候检查各个DataNode上数据块的有效性,同时根据策略必要的复制或者删除部分数据块。 

运行期通过命令也可以进入安全模式。在实践过程中,系统启动的时候去修改和删除文件也会有安全模式不允许修改的出错提示,只需要等待一会儿即可。 

现在就清楚了,那现在要解决这个问题,我想让Hadoop不处在safe mode 模式下,能不能不用等,直接解决呢? 

答案是可以的,只要在Hadoop的目录下输入: 

hadoop@Master:/opt/hadoop/bin$ ./hadoop dfsadmin -safemode leave 

hadoop@Master:/opt/hadoop$ bin/hadoop fs -mkdir input 

hadoop@Master:/opt/hadoop/bin$ cd .. 
hadoop@Master:/opt/hadoop$ bin/hadoop fs -mkdir input 
hadoop@Master:/opt/hadoop$ bin/hadoop fs -put conf/core-site.xml input 
hadoop@Master:/opt/hadoop$ bin/hadoop jar hadoop-0.20.2-examples.jar grep input output 'dfs[a-z.]+' 

6.补充 
Q: bin/hadoop jar hadoop-0.20.2-examples.jar grep input output 'dfs[a-z.]+' 什么意思啊? 
A: bin/hadoop jar(使用hadoop运行jar包) hadoop-0.20.2_examples.jar(jar包的名字) grep (要使用的类,后边的是参数)input output 'dfs[a-z.]+' 
整个就是运行hadoop示例程序中的grep,对应的hdfs上的输入目录为input、输出目录为output。 
Q: 什么是grep? 
A: A map/reduce program that counts the matches of a regex in the input. 

查看结果: 
hadoop@Master:/opt/hadoop$ bin/hadoop fs -ls /user/hadoop/output 
Found 2 items 
drwxr-xr-x   - hadoop supergroup          0 2012-07-24 11:29 /user/hadoop/output/_logs 
-rw-r--r--   3 hadoop supergroup          0 2012-07-24 11:30 /user/hadoop/output/part-00000 

hadoop@Master:/opt/hadoop$ bin/hadoop fs -rmr /user/hadoop/outputtest 
Deleted hdfs://Master.Hadoop:54310/user/hadoop/outputtest 
hadoop@Master:/opt/hadoop$ bin/hadoop fs -rmr /user/hadoop/output 
Deleted hdfs://Master.Hadoop:54310/user/hadoop/output 

改用其他例子 
hadoop@Master:/opt/hadoop$ bin/hadoop jar /opt/hadoop/hadoop-0.20.2-examples.jar wordcount input output 
hadoop@Master:/opt/hadoop$  bin/hadoop fs -ls /user/hadoop/output 
Found 2 items 
drwxr-xr-x   - hadoop supergroup          0 2012-07-24 11:43 /user/hadoop/output/_logs 
-rw-r--r--   3 hadoop supergroup        772 2012-07-24 11:43 /user/hadoop/output/part-r-00000 

hadoop@Master:/opt/hadoop$ bin/hadoop fs -cat /user/hadoop/output/part-r-00000 
(fs.SCHEME.impl)        1 
-->     1 
<!--    1 
</configuration>        1 
</property>     2 
<?xml   1 
<?xml-stylesheet        1 

测试成功! 


重启遇到的错误 
INFO ipc.Client: Retrying connect to server: master/192.168.0.45:54310. Already tried 0 time 
./hadoop dfsadmin -report 
cd /opt/hadoop-datastore/ 
/opt/hadoop/bin/stop-all.sh 
rm -rf * 
/opt/hadoop/bin/hadoop namenode -format 
如有debug设置 请删除debug设置 
/opt/hadoop/bin/start-all.sh 
./hadoop dfsadmin -report 
------------------------------------------------------------------- 
Hadoop mapreduce java Demo 

<dependency> 
  <groupId>org.apache.hadoop</groupId> 
  <artifactId>hadoop-core</artifactId> 
  <version>1.1.2</version> 
</dependency> 

Java代码   收藏代码
  1. package cn.focus.dc.hadoop;  
  2.   
  3. import java.io.IOException;  
  4. import java.util.*;  
  5.   
  6. import org.apache.hadoop.fs.Path;  
  7. import org.apache.hadoop.io.*;  
  8. import org.apache.hadoop.mapred.*;  
  9.   
  10. /** 
  11.  * @author qiaowang 
  12.  */  
  13. public class WordCount {  
  14.   
  15.     public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {  
  16.         private final static IntWritable one = new IntWritable(1);  
  17.   
  18.         private Text word = new Text();  
  19.   
  20.         public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter)  
  21.                 throws IOException {  
  22.             String line = value.toString();  
  23.             StringTokenizer tokenizer = new StringTokenizer(line);  
  24.             while (tokenizer.hasMoreTokens()) {  
  25.                 word.set(tokenizer.nextToken());  
  26.                 output.collect(word, one);  
  27.             }  
  28.         }  
  29.     }  
  30.   
  31.     public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {  
  32.         public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output,  
  33.                 Reporter reporter) throws IOException {  
  34.             int sum = 0;  
  35.             while (values.hasNext()) {  
  36.                 sum += values.next().get();  
  37.             }  
  38.             output.collect(key, new IntWritable(sum));  
  39.         }  
  40.     }  
  41.   
  42.     public static void main(String[] args) throws Exception {  
  43.         JobConf conf = new JobConf(WordCount.class);  
  44.         conf.setJobName("wordcount");  
  45.   
  46.         conf.setOutputKeyClass(Text.class);  
  47.         conf.setOutputValueClass(IntWritable.class);  
  48.   
  49.         conf.setMapperClass(Map.class);  
  50.         conf.setCombinerClass(Reduce.class);  
  51.         conf.setReducerClass(Reduce.class);  
  52.   
  53.         conf.setInputFormat(TextInputFormat.class);  
  54.         conf.setOutputFormat(TextOutputFormat.class);  
  55.   
  56.         FileInputFormat.setInputPaths(conf, new Path(args[0]));  
  57.         FileOutputFormat.setOutputPath(conf, new Path(args[1]));  
  58.   
  59.         JobClient.runJob(conf);  
  60.     }  
  61.   
  62. }  


方式一 
在linux下 
创建 wordcount_classes文件夹 
hadoop@Master:~/wordcount_classes$ ls 
cn  WordCount.java 
hadoop@Master:~/wordcount_classes$ pwd 
/home/hadoop/wordcount_classes 

/usr/lib/jdk1.6.0_33/bin/javac -classpath 
/opt/hadoop/hadoop-core-1.1.2.jar -d 
/home/hadoop/wordcount_classes/ WordCount.java 

编译后 
hadoop@Master:~/wordcount_classes/cn/focus/dc/hadoop$ pwd 
/home/hadoop/wordcount_classes/cn/focus/dc/hadoop 
hadoop@Master:~/wordcount_classes/cn/focus/dc/hadoop$ ls 
WordCount.class  WordCount$Map.class  WordCount$Reduce.class 

打jar包 

hadoop@Master:~$ /usr/lib/jdk1.6.0_33/bin/jar -cvf /home/hadoop/wordcount.jar -C wordcount_classes/ . 
added manifest 
adding: cn/(in = 0) (out= 0)(stored 0%) 
adding: cn/focus/(in = 0) (out= 0)(stored 0%) 
adding: cn/focus/dc/(in = 0) (out= 0)(stored 0%) 
adding: cn/focus/dc/hadoop/(in = 0) (out= 0)(stored 0%) 
adding: cn/focus/dc/hadoop/WordCount.class(in = 1573) (out= 756)(deflated 51%) 
adding: cn/focus/dc/hadoop/WordCount$Map.class(in = 1956) (out= 804)(deflated 58%) 
adding: cn/focus/dc/hadoop/WordCount$Reduce.class(in = 1629) (out= 652)(deflated 59%) 
adding: WordCount.java(in = 2080) (out= 688)(deflated 66%) 
hadoop@Master:~$ ls 
file01  file02  hadoop-1.1.2.tar.gz  wordcount_classes  wordcount.jar 

运行: 
/opt/hadoop/bin/hadoop jar /home/hadoop/wordcount.jar cn.focus.dc.hadoop.WordCount /user/hadoop/input /user/hadoop/output 

查看结果 
hadoop@Master:~$ /opt/hadoop/bin/hadoop fs -cat /user/hadoop/output/part-00000 
Bye     1 
Goodbye 1 
Hadoop  2 
Hello   2 
World   2 

方式二: 
在window的工程目录下直接用maven命令打包(包括依赖包) 
mvn -U clean dependency:copy-denpendencies compile package 
在target下获得jar包 和dependency下的jar包 

copy到linux下 

目录结构如下: 
hadoop@Master:~/hadoop_stat/dependency$ ls 
hadoop-core-1.1.2.jar 
hadoop@Master:~/hadoop_stat$ ls 
dependency  hadoop-stat-1.0.0-SNAPSHOT.jar 

运行: 
/opt/hadoop/bin/hadoop jar /home/hadoop/hadoop_stat/hadoop-stat-1.0.0-SNAPSHOT.jar cn.focus.dc.hadoop.WordCount /user/hadoop/input /user/hadoop/output 

hadoop@Master:~/hadoop_stat$ /opt/hadoop/bin/hadoop fs -cat /user/hadoop/output/part-00000 
Bye     1 
Goodbye 1 
Hadoop  2 
Hello   2 
World   2 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值