2012-01-30 13:51:49| 分类: 数据挖掘-Hadoop | 标签: |字号大中小 订阅
安装准备
1、 操作系统:Linux
2、 两台机器
192.168.6.106(master)
192.168.6.151(slave)
3、 两台机器间ssh免密码访问设置
1) 用dsa的密钥生成密匙对,命令为:"ssh-keygen -t dsa" 。
# ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
2) 把这个密钥对中的公共密钥复制到你要访问的机器上去,并保存为 :
~/.ssh/authorized_keys.
[root@freepp ~]# scp -P 22 ~/.ssh/id_dsa.pub root@192.168.6.151:~/.ssh/authorized_keys
Address 192.168.6.151 maps to bogon, but this does not map back to the address - POSSIBLE BREAK-IN ATTEMPT!
root@192.168.6.151's password:
authorized_keys 100% 607 0.6KB/s 00:00
此次操作需要输入访问机器的登录密码。
[root@freepp ~]# ssh 192.168.6.151
Address 192.168.6.151 maps to bogon, but this does not map back to the address - POSSIBLE BREAK-IN ATTEMPT!
Last login: Tue Jan 17 01:07:53 2012 from 192.168.6.106
此次操作不再需要输入访问机器的登录密码。
3) 第三步:设置ssh登录localhost免密码
# cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
4、 下载jdk1.6.0_21版本
5、 下载hadoop-0.21.0版本
下载地址:http://apache.etoak.com//hadoop/common/hadoop-0.21.0/
安装操作
Java
运行Hadoop需要Java6或更高版本。
1、 安装
yum install jdk-6u21-linux-i586-rpm.bin
这里采用linux的yum 命令来进行Java的安装,若系统找不到yum命令,需自行安装,具体安装过程就不再这里累赘。Java默认的安装路径为:/usr/java。
2、 配置Java环境变量
修改/ect/profile配置文件,在末尾追加以下内容并保存:
export JAVA_HOME=/usr/java/jdk1.6.0_21 export JRE_HOME=/usr/java/jdk1.6.0_21/jre export PATH=$PATH:$JAVA_HOME/bin export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH |
并输入
[root@freepp ~]# source /etc/profile
使之生效。
3、 安装验证
[root@freepp ~]# java -version
java version "1.6.0_21"
Java(TM) SE Runtime Environment (build 1.6.0_21-b06)
Java HotSpot(TM) Client VM (build 17.0-b16, mixed mode, sharing)
[root@freepp ~]# javac
Usage: javac <options> <source files>
where possible options include:
-g Generate all debugging info
-g:none Generate no debugging info
-g:{lines,vars,source} Generate only some debugging info
-nowarn Generate no warnings
-verbose Output messages about what the compiler is doing
-deprecation Output source locations where deprecated APIs are used
-classpath <path> Specify where to find user class files and annotation processors
-cp <path> Specify where to find user class files and annotation processors
-sourcepath <path> Specify where to find input source files
-bootclasspath <path> Override location of bootstrap class files
-extdirs <dirs> Override location of installed extensions
-endorseddirs <dirs> Override location of endorsed standards path
-proc:{none,only} Control whether annotation processing and/or compilation is done.
-processor <class1>[,<class2>,<class3>...]Names of the annotation processors to run; bypasses default discovery process
-processorpath <path> Specify where to find annotation processors
-d <directory> Specify where to place generated class files
-s <directory> Specify where to place generated source files
-implicit:{none,class} Specify whether or not to generate class files for implicitly referenced files
-encoding <encoding> Specify character encoding used by source files
-source <release> Provide source compatibility with specified release
-target <release> Generate class files for specific VM version
-version Version information
-help Print a synopsis of standard options
-Akey[=value] Options to pass to annotation processors
-X Print a synopsis of nonstandard options
-J<flag> Pass <flag> directly to the runtime system
如出现上述内容则说明Java安装成功,并且环境变量配置成功。
Hadoop
这里安装的是Hadoop-0.21.0版本。
1、 上传
将下载的hadoop压缩包hadoop-0.21.0.tar.gz通过ssh工具上传到/usr目录下。
2、 安装
[root@freepp ~]# cd /usr
[root@freepp ~]# tar -zvxf hadoop-0.21.0.tar.gz
配置
master
1、 配置Hadoop环境变量
修改/etc/profile配置文件,在末尾追加以下内容并保存:
export HADOOP_HOME=/usr/hadoop-0.21.0 export PATH=$PATH:$HADOOP_HOME/bin |
并输入
[root@freepp ~]# source /etc/profile
使之生效。
2、 修改 $HADOOP_HOME/conf/hadoop-env.sh文件的环境变量
# The java implementation to use. Required. export JAVA_HOME=/usr/java/jdk1.6.0_21 |
3、 修改 $HADOOP_HOME/conf/masters和slaves两个文件
# vi masters
192.168.6.106 |
#vi slaves
192.168.6.151 |
4、 修改 $HADOOP_HOME/conf/core-site.xml、hdfs-site.xml和mapred-site.xml三个文件
#vi core-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>hadoop.tmp.dir</name> <value>/hadoopdata</value> <description>A base for other temporary directories.</description> </property> <property> <name>fs.default.name</name> <value>hdfs://192.168.6.106:9000</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description> </property> </configuration> |
#vi hdfs-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration> |
#vi mapred-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> <value>192.168.6.106:9001</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property></configuration> |
5、 修改etc/hosts配置文件
#vi /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail. 127.0.0.1 localhost 192.168.6.106 m106 192.168.6.151 s151 |
slaves
1、 复制master的profile和hosts到slave机器
将master机器中的 /etc/profile和/etc/hosts拷贝到slave机器上。
执行命令如下:
# scp -P 22 /etc/profile root@192.168.6.151: /etc/profile
#scp -P 22 /etc/hosts root@192.168.6.151: /etc/hosts
注意profile需要做生效操作(执行命令:source /etc/profile)。
运行
1、 启动/停止hadoop
通过shell脚本启动hadoop。
#sh /usr/hadoop-0.21.0/bin/start-all.sh
通过shell脚本停止hadoop。
#sh /usr/hadoop-0.21.0/bin/stop-all.sh
2、 格式化一个新的HDFS文件系统
[root@freepp bin]#cd /usr/hadoop-0.21.0/bin
[root@freepp bin]#hadoop namenode -format
[root@freepp bin]# cd /usr/hadoop-0.21.0/bin
[root@freepp bin]# hadoop fs -mkdir test
12/01/17 11:48:35 INFO security.Groups: Group mapping impl=org.apache.hadoop.sec urity.ShellBasedUnixGroupsMapping; cacheTimeout=300000
12/01/17 11:48:35 WARN conf.Configuration: mapred.task.id is deprecated. Instead,use mapreduce.task.attempt.id
[root@freepp bin]# hadoop fs -ls /
12/01/17 11:48:44 INFO security.Groups: Group mapping impl=org.apache.hadoop.sec urity.ShellBasedUnixGroupsMapping; cacheTimeout=300000
12/01/17 11:48:44 WARN conf.Configuration: mapred.task.id is deprecated. Instead,use mapreduce.task.attempt.id
Found 3 items
drwxr-xr-x - root supergroup 0 2012-01-16 04:09 /hadoopdata
drwxr-xr-x - root supergroup 0 2012-01-16 04:09 /jobtracker
drwxr-xr-x - root supergroup 0 2012-01-17 11:48 /user
如果控制台返回结果,表示初始化成功。可以向里面录入数据。