公司一个项目涉及到云计算,通过学习了解计划采用Hadoop,在网上找文章学习,开始做Hadoop集群安装。
安装3个虚拟机,安装Ubuntu Linux 11.10版本,设置为单CPU,512M内存,20GB硬盘,ip:分别为192.168.1.16、17、22,16做NameNode和JobTracker,17、22做DataNode。
1.建立pc01用户,密码为6个1。
2.恢复root用户和密码,并修改主机名和对应的hosts文件,停止防火墙
恢复root
- sudo passwd root
修改主机名,运行
- sudo gedit /etc/hostname
修改对应hosts文件
- sudo gedit /etc/hosts
- 127.0.0.1 localhost
- 192.168.1.16 UbuntuHM1
- 192.168.1.17 UbuntuHS1
- 192.168.1.22 UbuntuHS2
- hostname
看是否修改成功。
停止防火墙
- sudo ufw disable
3.去 Java官网 下载JDK1.7.03 Linux版本jdk-7u3-linux-i586.tar.gz
4.解压JDK
- sudo tar xvzf jdk-7u3-linux-i586.tar.gz /usr/local/lib
- sudo gedit /etc/profile
- #set JAVA environment
- export JAVA_HOME=/usr/local/lib/jdk1.7.0_03
- export CLASSPATH=".:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$CLASSPATH"
- export PATH="$JAVA_HOME/bin:$PATH"
- java -version
- java version "1.7.0_03"
- Java(TM) SE Runtime Environment (build 1.7.0_03-b04)
- Java HotSpot(TM) Client VM (build 22.1-b02, mixed mode)
- sudo apt-get install ssh
- sudo apt-get install rsync
8.执行
- sudo tar xvzf hadoop-1.0.1.tar.gz
9.运行
- sudo chmod 777 hadoop-1.0.1 -R
10.参照文章介绍,在本地生成空密码的公钥和私钥,运行
- ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
- cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
- ssh localhost
13.将192.168.1.16机器上的公有key拷贝到其它两台机器,分别执行
- scp ~/.ssh/id_dsa.pub pc01@192.168.1.17:/home/pc01/.ssh/16_dsa.pub
- scp ~/.ssh/id_dsa.pub pc01@192.168.1.22:/home/pc01/.ssh/16_dsa.pub
- cat ~/.ssh/16_dsa.pub >> authorized_keys
- ssh 192.168.1.17
- ssh 192.168.1.22
- export JAVA_HOME=/usr/local/lib/jdk1.7.0_03
17.在192.168.1.16机器上修改hadoop配置文件,修改hadoop目录下conf目录下的core-site.xml、hdfs-site.xml,更改文件
core-site.xml
- <?xml version="1.0"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <configuration>
- <property>
- <name>fs.default.name</name>
- <value>hdfs://192.168.1.16:54310</value>
- </property>
- <property>
- <name>hadoop.tmp.dir</name>
- <value>/home/pc01/hadoop-1.0.1/tmp</value>
- </property>
- </configuration>
- <?xml version="1.0"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <configuration>
- <property>
- <name>dfs.replication</name>
- <value>2</value>
- </property>
- </configuration>
- <?xml version="1.0"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <configuration>
- <property>
- <name>mapred.job.tracker</name>
- <value>hdfs://192.168.1.16:54311</value>
- </property>
- <property>
- <name>mapred.child.java.opts</name>
- <value>-Xmx512m</value>
- </property>
- </configuration>
- 192.168.1.16
- 192.168.1.17
- 192.168.1.22
- scp -rp ~/hadoop-1.0.1/conf pc01@192.168.1.17/home/pc01/hadoop-1.0.1/
- ./hadoop namenode -format
- ./start-all.sh
- ./hadoop dfs -ls
25.通过访问http://192.168.1.16:50070来查看hdfs的状态,访问http://192.168.1.16:50030来查看map/reduce的状态。如果出现错误,或Hadoop集群未启动,可以查看$HADOOP_HOME/logs/下的日志文件。
26.停止Hadoop,运行
- ./stop-all.sh
- ./hadoop namenode -format
- <property>
- <name>dfs.permissions</name>
- <value>false</value>
- </property>
- ./hadoop dfs -chmod 777 /user/pc01
- 或
- ./hadoop fs -chmod 777 /user/pc01