参考文献:
1. http://hadoop.apache.org/common/docs/r0.19.2/cn/quickstart.html#%E8%BF%90%E8%A1%8CHadoop%E9%9B%86%E7%BE%A4%E7%9A%84%E5%87%86%E5%A4%87%E5%B7%A5%E4%BD%9C
2. http://yymmiinngg.iteye.com/blog/706699
3. http://www.linuxidc.com/Linux/2011-08/41661.htm
一. 配置jdk
重要:配置环境变量
vi /etc/profile/
在最后面加入:
set Java Environment
export JAVA_HOME=/usr/lib/jvm/java-6-sun
export CLASSPATH=".:$JAVA_HOME/lib:$CLASSPATH"
export PATH="$JAVA_HOME/:$PATH"VA_HOME=/usr/lib/jvm/java-6-sun
PATH=$JAVA_HOME/bin:$PATH
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export JAVA_HOME
export PATH
export CLASSPATH
export CATALINA_HOME=/usr/local/tomcat
export CLASSPATH=.:$JAVA_HOME/lib:$CATALINA_HOME/lib
export PATH=$PATH:$CATALINA_HOME/bin
ANT_HOME=/usr/local/ant
PATH=$JAVA_HOME/bin:$ANT_HOME/bin:$PATH
export ANT_HOME PATH
二、配置ssh
安装SSH
sudo apt-get install openssh-server
a. 用 ssh-key-gen 在本地主机上创建公钥和密钥
[root@www.linuxidc.com ~]# ssh-keygen -t rsa
Enter file in which to save the key (/home/jsmith/.ssh/id_rsa):[Enter key]
Enter passphrase (empty for no passphrase): [Press enter key]
Enter same passphrase again: [Pess enter key]
Your identification has been saved in /home/jsmith/.ssh/id_rsa.
Your public key has been saved in /home/jsmith/.ssh/id_rsa.pub.
The key fingerprint is: 33:b3:fe:af:95:95:18:11:31:d5:de:96:2f:f2:35:f9
root@www.linuxidc.com
完成后,在home跟目录下会产生隐藏文件夹.ssh
$ cd .ssh
之后ls 查看文件
cp id_rsa.pub authorized_keys
uthorized
hadoop@hadoop .ssh]$chmod 644 authorized_keys
b. 用 ssh-copy-id 把公钥复制到远程主机上
[root@www.linuxidc.com ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub root@Datanode1 //Datanode1 为IP
root@Datanode1's password:
Now try logging into the machine, with ―ssh ?root@Datanode1‘‖, and check in:
.ssh/authorized_keys to make sure we haven‘t added extra keys that you weren‘t expecting.
[注: ssh-copy-id 把密钥追加到远程主机的 .ssh/authorized_key 上.]
c. 直接登录远程主机
[root@www.linuxidc.com ~]# ssh Datanode1
Last login: Sun Nov 16 17:22:33 2008 from 192.168.1.2
[注: SSH 不会询问密码.]
[root@Datanode1 ~]
[注: 你现在已经登录到了远程主机上]
d. 注意:在这里,执行都在Namenode上面,而且Namenode也需要对自己进行无密码操作即
执行下面的命令:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
【不用执行这句】 [root@www.linuxidc.com ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub root@www.linuxidc.com操作,
其他的,按照a-c重复操作Datanode2和Datanode3就行了
一定要能无密码访问,否则不能集群hadoop一定失败.
三、配置hadoop(在conf文件夹下)
A。namenode:
a. core-site.xml:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://10.108.32.97:9000</value> //10.108.32.97为IP
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/yourname/tmp</value> //注意:tmp目录必须为空
</property>
</configuration>
b. hadoop-env.sh
# The java implementation to use. Required.
export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.06
c. hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/yourname/hdfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/yourname/hdfs/data</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
d. mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>10.108.32.97:9001</value>
</property>
</configuration>
e. conf/masters:
namenode 的iP地址
f. conf/slaves:
datanode的ip地址
g. scp -r /home/yourname/hadoop slave1:/home/dataname1/
scp -r /home/yourname/hadoop slave2:/home/dataname2/
B.格式化一个新的分布式文件系统:
$ bin/hadoop namenode -format
启动Hadoop守护进程:
$ bin/start-all.sh
Hadoop守护进程的日志写入到 ${HADOOP_LOG_DIR} 目录 (默认是 ${HADOOP_HOME}/logs).
浏览NameNode和JobTracker的网络接口,它们的地址默认为:
NameNode - http://localhost:50070/
JobTracker - http://localhost:50030/
将输入文件拷贝到分布式文件系统:
$ bin/hadoop fs -put conf input
运行发行版提供的示例程序:
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
查看输出文件:
将输出文件从分布式文件系统拷贝到本地文件系统查看:
$ bin/hadoop fs -get output output
$ cat output/*
或者
在分布式文件系统上查看输出文件:
$ bin/hadoop fs -cat output/*
完成全部操作后,停止守护进程:
$ bin/stop-all.sh