1 Linux虚拟环境搭建
1.1 安装VmWare
安装完可以看到vmnet1和vmnet8两块虚拟网卡
1.2 安装linux虚拟机
安装好以后检查应该可以上外网
1.3 配置Linux虚拟机
1.3.1 用root用户登录
先用sudo passwdroot改root密码,
然后在系统的"登录窗口"选项中设置允许本地管理员登陆就可以了
而Ubuntu9.10不需要在登录窗口中设置,目前我用这个版本也不需要设置
1.3.2 开启ssh服务
本机开放SSH服务就需要安装openssh-server
sudo apt-get install openssh-server
然后确认sshserver是否启动了:
如果看到sshd那说明ssh-server已经启动了。
如果没有则可以这样启动:sudo/etc/init.d/ssh start
1.3.3 开启ftp服务
sudo apt-get install vsftpd
修改etc/vsftpd.conf配置文件
write_enable=YES #允许上传
anon_upload_enable=YES #允许匿名上传
anon_mkdir_write_enable=YES #允许匿名用户建立文件夹
local_umask=022 #更改上传文件的权限,777-022=755
重启
sudo /etc/init.d/vsftpd restart
1.3.4 开启telnet服务
1)sudo apt-get install xinetd telnetd
2)sudo vi /etc/inetd.conf 并加入以下一行
telnet stream tcp nowait telnetd /usr/sbin/tcpd /usr/sbin/in.telnetd
3)sudo vi /etc/xinetd.conf 并加入以下内容:
#Simple configuration file for xinetd
#
#Some defaults, and include /etc/xinetd.d/
defaults
{
#Please note that you need a log_type line to be able to use log_on_success
#and log_on_failure. The default is the following :
# log_type = SYSLOG daemon info
instances = 60
log_type = SYSLOG authpriv
log_on_success =HOST PID
log_on_failure = HOST
cps = 25 30
}
includedir /etc/xinetd.d
4)sudo vi /etc/xinetd.d/telnet并加入以下内容:
#default: on
#description: The telnet server serves telnet sessions; it uses /
#unencrypted username/password pairs for authentication.
service telnet
{
disable = no
flags = REUSE
socket_type = stream
wait = no
user = root
server = /usr/sbin/in.telnetd
log_on_failure += USERID
}
5)重启机器或重启网络服务sudo /etc/init.d/xinetdrestart
7)使用root登录
mv /etc/securetty /etc/securetty.bak 这样root可以登录了。也可这样:
修改/etc/pam.d/login这个文件。只需将下面一行注释掉即可。
#auth requiredlib/security/pam_securetty.so
2 在Linux上安装配置Hadoop
2.1 安装JDK1.6
2.1.1 检查jdk当前版本
root@ubuntu:~# java -version
2.1.2 下载和安装JDK
直接安装jre环境(这个不带开发环境,推荐用下面的安装方法)
sudo add-apt-repository " debhttp://us.archive.ubuntu.com/ubuntu/ hardy multiverse"
sudo apt-get update
sudo apt-get install sun-java6-jdk
注:此次安装要在机器的图形终端上执行,不能在远程登录工具上执行
安装带编译环境的jdk
1. 下载gz安装包
http://www.oracle.com/technetwork/java/javase/downloads/index.html
2. 安装
sudo tar zxvf ./jdk-7-linux-i586.tar.gz -C /usr/lib/jvm
cd /usr/lib/jvm
sudo mv jdk1.7.0/ java-7-sun
vi /etc/profile
export JAVA_HOME=/usr/lib/jvm/java-7-sun
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
2.1.3 配置环境变量
vi /etc/profile
#set java env
export JAVA_HOME="/usr/lib/jvm/java-7-sun"
export JRE_HOME="$JAVA_HOME/jre"
exportCLASSPATH=".:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$CLASSPATH"
exportPATH="$JAVA_HOME/bin:$PATH"
2.2 配置ssh免密码登录
2.2.1 检查ssh当前版本
Ssh –version
2.2.2 安装ssh
Sudo atp-get install ssh
2.2.3 配置为可以无密码登录本机
ls –a ~
mkdir ~/.ssh
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
ls ~/.ssh
cat ~/.ssh/id_dsa.pub >>~/.ssh/authorized_keys
2.2.4 验证ssh是否安装成功
Ssh –version
Ssh localhost
2.3 创建hadoop用户
2.3.1 创建用户
root@ubuntu:~# useradd -m -d /home/hadoophadoop
root@ubuntu:~# passwd hadoop
修正:
1. vi /etc/passwd
hadoop:x:1001:1001::/home/hadoop:/bin/bash
修改使用bash
2.3.2 配置环境变量
exportHADOOP_HOME="/home/hadoop/hadoop"
export HADOOP_VERSION="0.20.2"
2.3.3 可用性设置
l Vi ~/. Bashrc 添加
alias p='ps -fuhadoop'
执行
Source .bashrc
2.4 安装伪分布式hadoop
2.4.1 安装hadoop
下载安装包:http://labs.renren.com/apache-mirror/hadoop/core/
将hadoop-0.20.2.tar.gz上传到hadoop用户
$ tar -xf hadoop-0.20.2.tar.gz
$ mv hadoop-0.20.2hadoop
2.4.2 配置hadoop
l $ vi hadoop-env.sh
export JAVA_HOME="/usr/lib/jvm/java-6-sun"
l core-site.xml
<?xml version="1.0"?>
<?xml-stylesheettype="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific propertyoverrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/data/hadoop-${user.name}</value>
<description>A base for other temporarydirectories.</description>
</property>
</configuration>
l Hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl"href="configuration.xsl"?>
<!-- Put site-specific propertyoverrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
l Mapred-site.xml
<?xmlversion="1.0"?>
<?xml-stylesheet type="text/xsl"href="configuration.xsl"?>
<!-- Put site-specific propertyoverrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
2.4.3 格式化hdfs
$ ./hadoop namenode -format
12/03/21 19:07:45 INFO namenode.NameNode:STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = ubuntu/127.0.1.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build =https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707;compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
12/03/21 19:07:46 INFOnamenode.FSNamesystem: fsOwner=hadoop,hadoop
12/03/21 19:07:46 INFOnamenode.FSNamesystem: supergroup=supergroup
12/03/21 19:07:46 INFOnamenode.FSNamesystem: isPermissionEnabled=true
12/03/21 19:07:47 INFO common.Storage:Image file of size 96 saved in 0 seconds.
12/03/21 19:07:47 INFO common.Storage:Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted.
12/03/21 19:07:47 INFO namenode.NameNode:SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode atubuntu/127.0.1.1
************************************************************/
2.4.4 启动hdfs和mapreduce
./start-all.sh
2.4.5 验证是否安装成功
http://192.168.190.129:50070 --hdfs监控页面
http://192.168.190.129:50030 --mapred监控页面
2.5 安装集群hadoop
2.5.1 集群规划
192.168.190.132—master,NameNode,Jobtracker:master
192.168.190.133—slave,DataNode,Tasktracker:slave1
192.168.190.134—slave,DataNode,Tasktracker:slave2
2.5.2 修改通用配置文件
l 将伪分布式相关配置文件备份到localhost目录下
$ mkdir localhost
$ cp hadoop-env.sh hdfs-site.xmlcore-site.xml mapred-site.xml slaves masters ./localhost
l 创建集群配置文件
$ mkdir claster
$ cp hadoop-env.sh hdfs-site.xmlcore-site.xml mapred-site.xml slaves masters ./claster
$ ls claster
core-site.xml hadoop-env.sh hdfs-site.xml mapred-site.xml masters slaves
l hadoop-env.sh
和伪分布式模式一样,不用改。
l core-site.xml
<?xmlversion="1.0"?>
<?xml-stylesheettype="text/xsl" href="configuration.xsl"?>
<!-- Putsite-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
</configuration>
l hdfs-site.xml
<?xmlversion="1.0"?>
<?xml-stylesheettype="text/xsl" href="configuration.xsl"?>
<!-- Putsite-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
l mapred-site.xml
<?xmlversion="1.0"?>
<?xml-stylesheettype="text/xsl" href="configuration.xsl"?>
<!-- Putsite-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:9001</value>
</property>
</configuration>
l masters
添加备份的master节点,这里没有备份master节点,可以不添加.
l Slaves
slave1
slave2
l /etc/hosts
192.168.190.132master # Added by NetworkManager
192.168.190.133slave1
192.168.190.134slave2
127.0.0.1 localhost.localdomain localhost
::1 ubuntu localhost6.localdomain6 localhost6
127.0.0.1 master
2.5.3 克隆节点
从master节点克隆出slave1和slave2
2.5.4 修改特性配置文件
l /etc/hostname
分别修改主机名为master,slave1,slave2
重启主机
2.5.5 配置ssh免密码登录
此步骤的目地是达到master主机可以无密码登录到slave主机,所以只需将master主机的~/.ssh/authorized_keys文件拷贝到slave主机的相同目录下就可以了,因为这里是克隆的,所以不用拷贝了。
坚持是否有效:
在master节点上执行:
Ssh slave1
Ssh slave2
2.5.6 生效配置文件
$ cd hadoop/conf
$ cp ./claster/*.
$ ls -lrt
total 64
-rw-r--r-- 1hadoop hadoop 1195 2010-02-18 23:55 ssl-server.xml.example
-rw-r--r-- 1hadoop hadoop 1243 2010-02-18 23:55 ssl-client.xml.example
-rw-r--r-- 1hadoop hadoop 2815 2010-02-18 23:55 log4j.properties
-rw-r--r-- 1hadoop hadoop 4190 2010-02-18 23:55 hadoop-policy.xml
-rw-r--r-- 1hadoop hadoop 1245 2010-02-18 23:55 hadoop-metrics.properties
-rw-r--r-- 1hadoop hadoop 535 2010-02-18 23:55configuration.xsl
-rw-r--r-- 1hadoop hadoop 3936 2010-02-18 23:55 capacity-scheduler.xml
drwxr-xr-x 2hadoop hadoop 4096 2012-03-26 00:57 localhost
drwxr-xr-x 2hadoop hadoop 4096 2012-03-26 01:00 claster
-rw-r--r-- 1hadoop hadoop 269 2012-03-26 02:07core-site.xml
-rw-r--r-- 1hadoop hadoop 2280 2012-03-26 02:07 hadoop-env.sh
-rw-r--r-- 1hadoop hadoop 251 2012-03-26 02:07hdfs-site.xml
-rw-r--r-- 1hadoop hadoop 264 2012-03-26 02:07mapred-site.xml
-rw-r--r-- 1hadoop hadoop 14 2012-03-26 02:07slaves
-rw-r--r-- 1 hadoophadoop 10 2012-03-26 02:07 masters
2.5.7 格式化hdfs
$ hadoop namenode –format
12/03/26 02:09:31 INFO namenode.NameNode:STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = master/127.0.1.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build =https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707;compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
Re-format filesystem in/tmp/hadoop-hadoop/dfs/name ? (Y or N) Y
12/03/26 02:09:39 INFOnamenode.FSNamesystem: fsOwner=hadoop,hadoop
12/03/26 02:09:39 INFOnamenode.FSNamesystem: supergroup=supergroup
12/03/26 02:09:39 INFOnamenode.FSNamesystem: isPermissionEnabled=true
12/03/26 02:09:40 INFO common.Storage:Image file of size 96 saved in 0 seconds.
12/03/26 02:09:40 INFO common.Storage:Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted.
12/03/26 02:09:40 INFO namenode.NameNode:SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode atmaster/127.0.1.1
************************************************************/
2.5.8 启动集群hadoop
$ start-all.sh
starting namenode, logging to/home/hadoop/hadoop/bin/../logs/hadoop-hadoop-namenode-master.out
slave1: starting datanode, logging to/home/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-slave1.out
slave2: starting datanode, logging to/home/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-slave2.out
localhost: starting secondarynamenode,logging to/home/hadoop/hadoop/bin/../logs/hadoop-hadoop-secondarynamenode-master.out
starting jobtracker, logging to/home/hadoop/hadoop/bin/../logs/hadoop-hadoop-jobtracker-master.out
slave1: starting tasktracker, logging to/home/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-slave1.out
slave2: starting tasktracker, logging to/home/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-slave2.out
2.5.9 检查启动结果
Ps –fu hadoop检查进程是否存在;
登录URL界面监控状态是否正确;
3 MapReduce小试牛刀
3.1 试用hdfs文件系统
$ ./hadoop fs -mkdir heyi
$ ./hadoop fs -ls
Found 1 items
drwxr-xr-x - hadoop supergroup 02012-03-21 20:13 /user/hadoop/heyi
$ ./hadoop fs -put testf.txt heyi/
$ ./hadoop fs -ls heyi
Found 1 items
-rw-r--r-- 1 hadoop supergroup 542012-03-21 20:17 /user/hadoop/heyi/testf.txt
$ ./hadoop fs -cat heyi/testf.txt
aaaaaaaaaaaaaaaaaa
bbbbbbbbbbbbbbbb
bbbccccccccccccc
3.2 第一个hello word程序
l 在home目录下创建源代码文件
WordCount.java
packageorg.myorg;
importjava.io.IOException;
importjava.util.*;
importorg.apache.hadoop.fs.Path;
importorg.apache.hadoop.conf.*;
importorg.apache.hadoop.io.*;
importorg.apache.hadoop.mapred.*;
importorg.apache.hadoop.util.*;
public class WordCount {
public static class Map extendsMapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable>{
private final static IntWritable one =new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value,OutputCollector<Text, IntWritable> output, Reporter reporter) throwsIOException {
String line = value.toString();
StringTokenizer tokenizer = newStringTokenizer(line);
while (tokenizer.hasMoreTokens()){
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}
}
public static class Reduce extendsMapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable>{
public void reduce(Text key,Iterator<IntWritable> values, OutputCollector<Text, IntWritable>output, Reporter reporter) throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, newIntWritable(sum));
}
}
public static void main(String[] args)throws Exception {
JobConf conf = newJobConf(WordCount.class);
conf.setJobName("wordcount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, newPath(args[0]));
FileOutputFormat.setOutputPath(conf, newPath(args[1]));
JobClient.runJob(conf);
}
}
$ ls|grep -v sun|grep -v jdk
Desktop
Documents
Downloads
examples.desktop
hadoop
hadoop-0.20.2.tar.gz
Music
Pictures
Public
Templates
Videos
wordcount_classes
wordcount.jar
WordCount.java
$
l 编译
javac -classpath${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar -d wordcount_classesWordCount.java
l 打包
$ jar -cvfwordcount.jar -C wordcount_classes/ .
added manifest
adding:WordCount$Map.class(in = 1938) (out= 802)(deflated 58%)
adding:WordCount.class(in = 1546) (out= 750)(deflated 51%)
adding:WordCount$Reduce.class(in = 1611) (out= 648)(deflated 59%)
l 在hdfs上创建输入
$ ./hadoop fs-mkdir input
$ ./hadoop fs-ls
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2012-03-22 00:53 /user/hadoop/input
$ echo"Hello word bye word" > file01
$ echo"Hello hadoop bye hadoop" > file02
$ ./hadoop fs-put file01 input/
$ ./hadoop fs-put file02 input/
$ ./hadoop fs-ls input
Found 2 items
-rw-r--r-- 1 hadoop supergroup 20 2012-03-22 00:55/user/hadoop/input/file01
-rw-r--r-- 1 hadoop supergroup 24 2012-03-22 00:55/user/hadoop/input/file02
l 在hdfs上创建输出
$ ./hadoop fs-mkdir output
$ ./hadoop fs-ls
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2012-03-22 00:55 /user/hadoop/input
drwxr-xr-x - hadoop supergroup 0 2012-03-22 00:56/user/hadoop/output
l 执行程序
$./hadoop/bin/hadoop jar ./wordcount.jar org.myorg.WordCount inputoutput/word_count
12/03/2201:34:07 WARN mapred.JobClient: Use GenericOptionsParser for parsing thearguments. Applications should implement Tool for the same.
12/03/2201:34:07 INFO mapred.FileInputFormat: Total input paths to process : 2
12/03/2201:34:09 INFO mapred.JobClient: Running job: job_201203220026_0002
12/03/2201:34:10 INFO mapred.JobClient: map 0%reduce 0%
12/03/2201:34:56 INFO mapred.JobClient: map 100%reduce 0%
12/03/2201:35:24 INFO mapred.JobClient: map 100%reduce 100%
12/03/2201:35:29 INFO mapred.JobClient: Job complete: job_201203220026_0002
12/03/2201:35:29 INFO mapred.JobClient: Counters: 18
12/03/2201:35:29 INFO mapred.JobClient: JobCounters
12/03/2201:35:29 INFO mapred.JobClient: Launched reduce tasks=1
12/03/2201:35:29 INFO mapred.JobClient: Launched map tasks=2
12/03/2201:35:29 INFO mapred.JobClient: Data-local map tasks=2
12/03/2201:35:29 INFO mapred.JobClient: FileSystemCounters
12/03/2201:35:29 INFO mapred.JobClient: FILE_BYTES_READ=74
12/03/2201:35:29 INFO mapred.JobClient: HDFS_BYTES_READ=44
12/03/2201:35:29 INFO mapred.JobClient: FILE_BYTES_WRITTEN=218
12/03/2201:35:29 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=30
12/03/22 01:35:29INFO mapred.JobClient: Map-ReduceFramework
12/03/2201:35:29 INFO mapred.JobClient: Reduce input groups=4
12/03/2201:35:29 INFO mapred.JobClient: Combine output records=6
12/03/2201:35:29 INFO mapred.JobClient: Mapinput records=2
12/03/2201:35:29 INFO mapred.JobClient: Reduce shuffle bytes=80
12/03/2201:35:29 INFO mapred.JobClient: Reduce output records=4
12/03/2201:35:29 INFO mapred.JobClient: Spilled Records=12
12/03/2201:35:29 INFO mapred.JobClient: Mapoutput bytes=76
12/03/2201:35:29 INFO mapred.JobClient: Mapinput bytes=44
12/03/2201:35:29 INFO mapred.JobClient: Combine input records=8
12/03/2201:35:29 INFO mapred.JobClient: Mapoutput records=8
12/03/2201:35:29 INFO mapred.JobClient: Reduce input records=6
l 检查结果
$./hadoop/bin/hadoop fs -cat output/word_count/part-00000
Hello 2
bye 2
hadoop 2
word 2
3.3 一个单表关联的例子
l 代码
package org.joinorg;
import java.io.IOException;
import java.util.*;
importorg.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
importorg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
importorg.apache.hadoop.util.GenericOptionsParser;
//import org.apache.commons.cli.Options;
public class STjoin {
publicstatic int times = 0;
publicstatic class Map extends Mapper<Object, Text, Text, Text>{
publicvoid map(Object key1, Text value1, Context context) throws IOException,
InterruptedException{
Stringchildname = new String();
Stringparentname = new String();
Stringrelationtype = new String();
Stringline = value1.toString();
inti = 0;
while(line.charAt(i)!=''){
i++;
}
String[]values = {line.substring(0, i), line.substring(i+1)};
if(values[0].compareTo("child")!= 0){
childname= values[0];
parentname= values[1];
relationtype= "1";
context.write(newText(values[1]), new Text(relationtype + "+" +
childname+ "+" + parentname));
relationtype= "2";
context.write(newText(values[0]), new Text(relationtype + "+" +
childname+ "+" + parentname));
}
}
}
publicstatic class Reduce extends Reducer<Text, Text, Text, Text>{
publicvoid reduce(Text key2, Iterable<Text> value2, Context context) throws
IOException,InterruptedException{
if(times== 0){
context.write(newText("grandchild"), new Text("grandparent"));
times++;
}
intgrandchildnum = 0;
Stringgrandchild[] = new String[10];
intgrandparentnum = 0;
Stringgrandparent[] = new String[10];
Iteratorite = value2.iterator();
while(ite.hasNext()){
Stringrecord = ite.next().toString();
intlen = record.length();
inti = 2;
if(len== 0) continue;
charrelationtype = record.charAt(0);
Stringchildname = new String();
Stringparentname = new String();
while(record.charAt(i)!= '+'){
childname= childname + record.charAt(i);
i++;
}
i+= 1;
while(i<len){
parentname+= record.charAt(i);
i++;
}
if(relationtype== '1'){
grandchild[grandchildnum]= childname;
grandchildnum++;
}
else{
grandparent[grandparentnum]= parentname;
grandparentnum++;
}
}
if(grandparentnum!= 0 && grandchildnum!=0){
for(intm=0; m<grandchildnum; m++){
for(intn=0; n<grandparentnum; n++){
context.write(newText(grandchild[m]), new Text(grandparent[n]));
}
}
}
}
}
publicstatic void main(String[] args) throws Exception {
Configurationconf = new Configuration();
//String[]otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
String[]otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if(otherArgs.length!= 2){
System.err.println("Usage:wordcount <in> <out>");
System.exit(2);
}
Jobjob = new Job(conf, "single table join");
job.setJarByClass(STjoin.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job,new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job,new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true)? 0 : 1);
}
}
l 编译打包
javac -classpath${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar:${HADOOP_HOME}/lib/commons-cli-1.2.jar-d STjoin_class STjoin.java
$ jar -cvf STjoin_class.jar -CSTjoin_class/ .
l 运行
$ hadoop jar ./STjoin_class.jarorg.joinorg.STjoin inputjoin output/STjoin
$ hadoop fs -cat output/STjoin/*
grandchild grandparent
aaa ccc
bbb ddd
ccc eee
ddd fff
eee ggg
4 在Linux上安装Hive
Hive的热门博客:http://www.iteye.com/blogs/tag/hive
4.1 下载并解压
wget http://mirror.bjtu.edu.cn/apache/hive/hive-0.7.1/hive-0.7.1.tar.gz
hadoop@master:~/hadoop$tar -xzf hive-0.7.1.tar.gz
hadoop@master:~/hadoop$cd hive-0.7.1
hadoop@master:~/hadoop/hive-0.7.1$ls
bin conf docs examples lib LICENSE NOTICE README.txt RELEASE_NOTES.txt scripts src
hadoop@master:~/hadoop$ mv hive-0.7.1hive
4.2 修改hive环境变量
hadoop@master:~/hadoop/hive/bin$ vihive-config.sh
增加
export HADOOP=/home/hadoop/hadoop
exprot HIVE_HOME=/home/hadoop/hadoop/hive
vi .profile
exportHADOOP_HOME="/home/hadoop/hadoop"
exportHIVE_HOME="/home/hadoop/hadoop/hive" #新增
export HADOOP_VERSION="0.20.2"
exportPATH="$HADOOP_HOME/bin:$HIVE_HOME/bin:$PATH" #修改
4.3 检查安装情况
hadoop@master:~$ hive
Hive historyfile=/tmp/hadoop/hive_job_log_hadoop_201203272119_2011178563.txt
hive> create table tt(id int,namestring) row format delimited fields terminated by ',' collection itemsterminated by "\n" stored as textfile;
OK
Time taken: 36.15 seconds
hive> select * from tt;
OK
Time taken: 1.503 seconds
hive>
显示如上表示安装成功。
4.4 配置hive-site.xml
<property>
<name>hive.metastore.warehouse.dir</name> --HDFS上的数据目录
<value>/user/hive/warehouse</value>
<description>location of default database for thewarehouse</description>
</property>
<property>
<name>hive.exec.scratchdir</name>--HDFS上的临时文件目录
<value>/tmp/hive-${user.name}</value>
<description>Scratch space for Hive jobs</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:;databaseName=metastore_db;create=true</value>
<!--<value>jdbc:derby://192.168.190.132:4567/hadoopor;create=true</value>-->
<description>JDBC connect string for a JDBCmetastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.apache.derby.jdbc.EmbeddedDriver</value>
<!--<value>org.apache.derby.jdbc.ClientDriver</value>-->
<description>Driver class name for a JDBCmetastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>--DB链接用户名
<value>APP</value>
<description>username to use against metastoredatabase</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>--DB链接密码
<value>mine</value>
<description>password to use against metastoredatabase</description>
</property>
4.5 启动hive
5 Hive小试牛刀
Apache hive相关帮助和介绍资料可以从以下链接获取
https://cwiki.apache.org/confluence/display/Hive/Home
5.1 创建一个内部表
hadoop@master:~$ hive
Hive historyfile=/tmp/hadoop/hive_job_log_hadoop_201203272213_1746810339.txt
hive> create table hive_table1(namestring, age int) row format delimited fields terminated by ',' stored astextfile;
OK
Time taken: 0.402 seconds
hive>
hadoop@master:~/hadoop/conf$ hadoop fs -ls/user/hive/warehouse
Found 2 items
drwxr-xr-x - hadoop supergroup 02012-03-27 22:25 /user/hive/warehouse/hive_table1
drwxr-xr-x - hadoop supergroup 02012-03-27 21:21 /user/hive/warehouse/tt
hadoop@master:~/hadoop/conf$
5.2 向表中load数据
hive>
> load data LOCAL inpath '/home/hadoop/hadoop/hive/hive_table1.dat'into table hive_table1;
Copying data fromfile:/home/hadoop/hadoop/hive/hive_table1.dat
Copying file:file:/home/hadoop/hadoop/hive/hive_table1.dat
Loading data to table default.hive_table1
OK
Time taken: 2.129 seconds
hive>
hadoop@master:~/hadoop/conf$ hadoop fs -ls/user/hive/warehouse/hive_table1
Found 1 items
-rw-r--r-- 1 hadoop supergroup 492012-03-27 22:40 /user/hive/warehouse/hive_table1/hive_table1.dat
hadoop@master:~/hadoop/conf$ hadoop fs -cat/user/hive/warehouse/hive_table1/hive_table1.dat
Heyi,30
Hljk,29
lajdlf,30
alh,29
allj,27
lsjk,33
hadoop@master:~/hadoop/conf$
5.3 查询结果
hive>
> select * from hive_table1;
OK
Heyi 30
Hljk 29
lajdlf 30
alh 29
allj 27
lsjk 33
Time taken: 1.601 seconds
hive>
5.4 JDBC驱动连接hive操作
5.4.1 启动远程服务接口
hadoop@master:~$ hive --service hiveserver
Starting Hive Thrift Server
5.4.2 编写jdbc客户端代码
//package com.javabloger.hive;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;
//importorg.apache.hadoop.hive.jdbc.HiveDriver;
public class HiveTestCase {
public static void main(String[] args) throws Exception {
Class.forName("org.apache.hadoop.hive.jdbc.HiveDriver");
//
String dropSQL="drop table javabloger";
String createSQL="create table javabloger (key int, valuestring) row format delimited fields terminatedby ',' ";
String insterSQL="LOAD DATA LOCAL INPATH'/home/hadoop/data/kv1.txt' OVERWRITE INTO TABLE javabloger";
String querySQL="SELECT a.* FROM javabloger a";
//Connection con = DriverManager.getConnection("jdbc:derby://localhost:3338/default;databaseName=metastore_db;create=true","APP", "mine");
Connection con =DriverManager.getConnection("jdbc:hive://localhost:10000/default","", "");
Statement stmt = con.createStatement();
stmt.executeQuery(dropSQL);
stmt.executeQuery(createSQL);
stmt.executeQuery(insterSQL);
ResultSet res = stmt.executeQuery(querySQL);
while (res.next()) {
System.out.println("Result: key:"+res.getString(1)+" -> value:" +res.getString(2));
}
System.out.println("ok");
}
}
5.4.3 编译
Javac HiveTestCase.java
5.4.4 执行
使用脚本执行
#hivetest.sh
#!/bin/bash
echo "100,aaa" >/home/hadoop/data/kv1.txt
echo "102,aab" >>/home/hadoop/data/kv1.txt
echo "103,aac" >>/home/hadoop/data/kv1.txt
echo "104,aad" >>/home/hadoop/data/kv1.txt
echo "105,aae" >> /home/hadoop/data/kv1.txt
echo "106,aaf" >>/home/hadoop/data/kv1.txt
HADOOP_CORE=`ls$HADOOP_HOME/hadoop-*-core.jar`
CLASSPATH=.:$HADOOP_CORE:$HIVE_HOME/conf
for i in ${HIVE_HOME}/lib/*.jar ; do
CLASSPATH=$CLASSPATH:$i
done
java -cp $CLASSPATH HiveTestCase
hadoop@master:~$ ./hivetest.sh
12/03/28 19:18:05 INFOjdbc.HiveQueryResultSet: Column names: key,value
12/03/28 19:18:05 INFOjdbc.HiveQueryResultSet: Column types: int,string
Result: key:100 -> value:aaa
Result: key:102 -> value:aab
Result: key:103 -> value:aac
Result: key:104 -> value:aad
Result: key:105 -> value:aae
Result: key:106 -> value:aaf
ok
hadoop@master:~$
6 在Linux上安装Hbase
6.1 下载并解压
wget http://mirror.bjtu.edu.cn/apache/hbase/hbase-0.90.5/hbase-0.90.5.tar.gz
hadoop@master:~/hadoop$tar xzf hbase-0.90.5.tar.gz
hadoop@master:~/hadoop$ mv hbase-0.90.5hbase
hadoop@master:~/hadoop$ cd hbase
hadoop@master:~/hadoop/hbase$ ls
bin conf hbase-0.90.5.jar hbase-webapps LICENSE.txt pom.xml src
CHANGES.txt docs hbase-0.90.5-tests.jar lib NOTICE.txt README.txt
hadoop@master:~/hadoop/hbase$
6.2 替换hadoop-core包
将hbase/lib/hadoop-core-0.20-append-r1056497.jar包替换为
$HADOOP_HOME/ hadoop-0.20.2-core.jar
同时将$ HADOOP_HOME/hadoop-0.20.2-test.jar 拷贝到hbase/lib/目录下
如果不替换jar文件Hbase启动时会因为hadoop和Hbase的客户端协议不一致而导致HMaster启动异常。报错
6.3 修改相关环境变量
l Vi .profile
export HADOOP_HOME="/home/hadoop/hadoop"
exportHIVE_HOME="/home/hadoop/hadoop/hive"
exportHBASE_HOME="/home/hadoop/hadoop/hbase"
export HADOOP_VERSION="0.20.2"
exportPATH="$HADOOP_HOME/bin:$HIVE_HOME/bin:$HBASE_HOME/bin:$PATH"
l hbase-env.sh
exportHBASE_MANAGES_ZK=true
hbase的运行需要用到zookeeper,而hbase-0.90.3自带了zookeeper,所以可以使用hbase自带的zookeeper,在conf/hbase-env.sh 文件中 exportHBASE_MANAGES_ZK=true ,true表示使用hbase自带的zookeeper,如果不想使用其自带的zookeeper,自己下载包安装的化,该项设置为false。 当然如果自己安装zookeeper,启动及关闭先后顺序为:启动Hadoop—>启动ZooKeeper集群—>启动HBase—>停止HBase—>停止ZooKeeper集群—>停止Hadoop
6.4 伪分布式
6.4.1 配置hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
<description>
</description>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>
</description>
</property>
</configuration>
6.4.2 启动hbase
在启动hbase之前需要确保hdfs已经启动,并且已经安装了ZooKeeper,否则启动的时候会报错。
Start-hbase.sh
6.4.3 验证运行情况
hadoop@master:~/hadoop/hbase/logs$ hbaseshell
HBase Shell; enter 'help<RETURN>' forlist of supported commands.
Type "exit<RETURN>" toleave the HBase Shell
Version 0.90.5,r1212209, Fri Dec 9 05:40:36 UTC 2011
hbase(main):001:0> list
TABLE
0 row(s) in 3.0690 seconds
hbase(main):002:0> create 'test','person', 'address'
0 row(s) in 1.6310 seconds
hbase(main):003:0> put 'test', 'hing', 'person:name', 'hing'
0 row(s) in 0.6720 seconds
hbase(main):004:0> put 'test', 'hing','person:age', '28'
0 row(s) in 0.0440 seconds
hbase(main):005:0> put 'test', 'hing','address:position', 'haidian'
0 row(s) in 0.0420 seconds
hbase(main):006:0> put 'test', 'hing','address:zipcode', '100085'
0 row(s) in 0.0340 seconds
hbase(main):007:0> put 'test','forward', 'person:name', 'forward'
0 row(s) in 0.0700 seconds
hbase(main):008:0> put 'test','forward', 'person:age', '27'
0 row(s) in 0.0630 seconds
hbase(main):009:0> put 'test','forward', 'address:position', 'xicheng'
0 row(s) in 0.0320 seconds
hbase(main):010:0> scan 'test'
ROW COLUMN+CELL
forward column=address:position, timestamp=1333007851171, value=xicheng
forward column=person:age,timestamp=1333007842973, value=27
forward column=person:name,timestamp=1333007835784, value=forward
hing column=address:position, timestamp=1333007819916, value=haidian
hing column=address:zipcode,timestamp=1333007826558, value=100085
hing column=person:age, timestamp=1333007813753, value=28
hing column=person:name,timestamp=1333007790586, value=hing
2 row(s) in 0.3380 seconds
hbase(main):011:0> get 'test', 'hing'
COLUMN CELL
address:position timestamp=1333007819916,value=haidian
address:zipcode timestamp=1333007826558,value=100085
person:age timestamp=1333007813753, value=28
person:name timestamp=1333007790586,value=hing
4 row(s) in 0.1200 seconds
hbase(main):012:0>
6.4.4 停止hbase
Stop-hbase.sh
7 Hbase小试牛刀
Hbase API相关参考文档可以从以下链接获取:
http://hbase.apache.org/docs/r0.20.5/api/overview-summary.html
7.1 编写hbase java API程序
import java.io.IOException;
import java.io.ByteArrayOutputStream;
import java.io.DataOutputStream;
import java.io.ByteArrayInputStream;
import java.io.DataInputStream;
import java.util.Map;
import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.io.IntWritable;
importorg.apache.hadoop.conf.Configuration;
importorg.apache.hadoop.hbase.HBaseConfiguration;
importorg.apache.hadoop.hbase.HTableDescriptor;
importorg.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.client.HBaseAdmin;
importorg.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Get;
importorg.apache.hadoop.hbase.client.Delete;
import org.apache.hadoop.hbase.util.*;
import org.apache.hadoop.hbase.KeyValue;
importorg.apache.hadoop.hbase.util.Writables;
importorg.apache.hadoop.hbase.client.Result;
importorg.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
importorg.apache.hadoop.hbase.MasterNotRunningException;
//importorg.apache.hadoop.hbase.ZooKeeperConnectionException;
public class HBaseHandler {
//private static HBaseConfiguration conf = null;
private static Configuration conf = null;
/**
* init config
*/
static {
//conf = HBaseConfiguration.create();
// conf = newHBaseConfiguration();
// conf.addResource("hbase-site.xml");
Configuration HBASE_CONFIG = new Configuration();
HBASE_CONFIG.set("hbase.zookeeper.quorum", "localhost");
HBASE_CONFIG.set("hbase.zookeeper.property.clientPort","2181");
conf = HBaseConfiguration.create(HBASE_CONFIG);
}
/**
* @param args
* @throws IOException
*/
public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
System.out.println("Helloworld");
String[] cfs;
cfs = new String[1];
cfs[0] = "Hello";
createTable("Hello_test",cfs);
}
/**
* create table
* @throws IOException
*/
public static void createTable(String tablename, String[] cfs) throwsIOException {
HBaseAdmin admin = new HBaseAdmin(conf);
if (admin.tableExists(tablename)) {
System.out.println("table isexists");
}
else {
HTableDescriptor tableDesc = new HTableDescriptor(tablename);
for (int i = 0; i < cfs.length; i++) {
tableDesc.addFamily(newHColumnDescriptor(cfs[i]));
}
admin.createTable(tableDesc);
System.out.println("create table success");
}
}
/**
* delete table
* @param tablename
* @throws IOException
*/
public static void deleteTable(String tablename) throws IOException
{
try {
HBaseAdmin admin = new HBaseAdmin(conf);
admin.disableTable(tablename);
admin.deleteTable(tablename);
System.out.println("delete table success");
}
catch (MasterNotRunningException e)
{
e.printStackTrace();
}
}
/**
* insert one record
* @param tablename
* @param cfs
*/
public static void writeRow(String tablename, String[] cfs) {
try {
HTable table = new HTable(conf, tablename);
Put put = new Put(Bytes.toBytes("rows1"));
for (int j = 0; j < cfs.length; j++) {
put.add(Bytes.toBytes(cfs[j]),
Bytes.toBytes(String.valueOf(1)),
Bytes.toBytes("value_1"));
table.put(put);
}
} catch (IOException e) {
e.printStackTrace();
}
}
/**
* delete one record
* @param tablename
* @param rowkey
* @throws IOException
*/
public static void deleteRow(String tablename, String rowkey) throwsIOException {
HTable table = new HTable(conf, tablename);
List list = new ArrayList();
Delete d1 = new Delete(rowkey.getBytes());
list.add(d1);
table.delete(d1);
System.out.println("delete row success");
}
/**
* query one record
* @param tablename
* @param rowkey
*/
public static void selectRow(String tablename, String rowKey)
throws IOException {
HTable table = new HTable(conf, tablename);
Get g = new Get(rowKey.getBytes());
Result rs = table.get(g);
for (KeyValue kv : rs.raw()) {
System.out.print(newString(kv.getRow()) + " ");
System.out.print(new String(kv.getFamily()) + ":");
System.out.print(new String(kv.getQualifier()) + " ");
System.out.print(kv.getTimestamp()+ " ");
System.out.println(newString(kv.getValue()));
}
}
/**
* select all recored from one table
* @param tablename
*/
public static void scaner(String tablename) {
try {
HTable table = new HTable(conf, tablename);
Scan s = new Scan();
ResultScanner rs = table.getScanner(s);
for (Result r : rs) {
KeyValue[] kv = r.raw();
for (int i = 0; i <kv.length; i++) {
System.out.print(newString(kv[i].getRow()) + " ");
System.out.print(newString(kv[i].getFamily()) + ":");
System.out.print(new String(kv[i].getQualifier())+ " ");
System.out.print(kv[i].getTimestamp() + " ");
System.out.println(newString(kv[i].getValue()));
}
}
} catch (IOException e)
{
e.printStackTrace();
}
}
}
7.2 编译
#compile.sh
#!/bin/bash
HADOOP_CORE=`ls$HADOOP_HOME/hadoop-*-core.jar`
CLASSPATH=.:$HADOOP_CORE:$HBASE_HOME/conf
for i in ${HBASE_HOME}/lib/*.jar; do
CLASSPATH=$CLASSPATH:$i
done
javac $1
注意:红色字体,一定要HBASE_HOME的lib,我在试验时由于直接拷贝成HIVE_HOME导致运行时报版本不匹配的错误。
7.3 运行
#execjava.sh
#!/bin/bash
HADOOP_CORE=`ls$HADOOP_HOME/hadoop-*-core.jar`
CLASSPATH=.:$HADOOP_CORE:$HBASE_HOME/conf
for i in ${HBASE_HOME}/lib/*.jar ; do
CLASSPATH=$CLASSPATH:$i
done
java -cp $CLASSPATH $1
8 在Linux下安装ZooKeeper
8.1 下载并解压
wget http://mirror.bjtu.edu.cn/apache/zookeeper/zookeeper-3.3.3/zookeeper-3.3.3.tar.gz
hadoop@master:~/hadoop$ tar -xzf zookeeper-3.3.3.tar.gz
hadoop@master:~/hadoop$ mv zookeeper-3.3.3zookeeper
hadoop@master:~/hadoop$ cd zookeeper
hadoop@master:~/hadoop/zookeeper$ ls
bin conf docs lib README.txt zookeeper-3.3.3.jar zookeeper-3.3.3.jar.sha1
build.xml contrib ivysettings.xml LICENSE.txt recipes zookeeper-3.3.3.jar.asc
CHANGES.txt dist-maven ivy.xml NOTICE.txt src zookeeper-3.3.3.jar.md5
hadoop@master:~/hadoop/zookeeper$
8.2 修改相关环境变量
Vi .profile
exportHADOOP_HOME="/home/hadoop/hadoop"
exportHIVE_HOME="/home/hadoop/hadoop/hive"
exportHBASE_HOME="/home/hadoop/hadoop/hbase"
exportZOOKEEPER_HOME="/home/hadoop/hadoop/zookeeper"
export HADOOP_VERSION="0.20.2"
exportPATH="$HADOOP_HOME/bin:$HIVE_HOME/bin:$HBASE_HOME/bin:$ZOOKEEPER_HOME/bin:$PATH"
8.3 单机下安装zookeeper
8.3.1 配置zoo.cfg
添加如下内容:
tickTime=2000
dataDir=/data/zookeeper/
clientPort=2181
8.3.2 启动ZooKeeper
hadoop@master:~/hadoop/zookeeper/bin$zkServer.sh start
JMX enabled by default
Using config:/home/hadoop/hadoop/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ...
/zookeeper_server.pid: Directorynonexistenth: 120: cannot create /home/hadoop/data/zookeeper/
STARTED
hadoop@master:~/hadoop/zookeeper/bin$2012-03-28 22:23:24,755 - INFO [main:QuorumPeerConfig@90] - Readingconfiguration from: /home/hadoop/hadoop/zookeeper/bin/../conf/zoo.cfg
2012-03-28 22:23:24,773 - WARN [main:QuorumPeerMain@105] - Either no configor no quorum defined in config, running in standalone mode
2012-03-28 22:23:24,898 - INFO [main:QuorumPeerConfig@90] - Reading configurationfrom: /home/hadoop/hadoop/zookeeper/bin/../conf/zoo.cfg
2012-03-28 22:23:24,903 - INFO [main:ZooKeeperServerMain@94] - Startingserver
2012-03-28 22:23:24,986 - INFO [main:Environment@97] - Serverenvironment:zookeeper.version=3.3.3-1073969,built on 02/23/2011 22:27 GMT
2012-03-28 22:23:24,987 - INFO [main:Environment@97] - Serverenvironment:host.name=master
2012-03-28 22:23:24,989 - INFO [main:Environment@97] - Serverenvironment:java.version=1.7.0_03
2012-03-28 22:23:24,991 - INFO [main:Environment@97] - Serverenvironment:java.vendor=Oracle Corporation
2012-03-28 22:23:24,992 - INFO [main:Environment@97] - Serverenvironment:java.home=/usr/lib/jvm/java-7-sun/jre
2012-03-28 22:23:24,992 - INFO [main:Environment@97] - Serverenvironment:java.class.path=/home/hadoop/hadoop/zookeeper/bin/../build/classes:/home/hadoop/hadoop/zookeeper/bin/../build/lib/*.jar:/home/hadoop/hadoop/zookeeper/bin/../zookeeper-3.3.3.jar:/home/hadoop/hadoop/zookeeper/bin/../lib/log4j-1.2.15.jar:/home/hadoop/hadoop/zookeeper/bin/../lib/jline-0.9.94.jar:/home/hadoop/hadoop/zookeeper/bin/../src/java/lib/*.jar:/home/hadoop/hadoop/zookeeper/bin/../conf:.:/usr/lib/jvm/java-7-sun/lib:/usr/lib/jvm/java-7-sun/jre/lib:
2012-03-28 22:23:24,996 - INFO [main:Environment@97] - Serverenvironment:java.library.path=/usr/java/packages/lib/i386:/lib:/usr/lib
2012-03-28 22:23:25,006 - INFO [main:Environment@97] - Serverenvironment:java.io.tmpdir=/tmp
2012-03-28 22:23:25,008 - INFO [main:Environment@97] - Serverenvironment:java.compiler=<NA>
2012-03-28 22:23:25,009 - INFO [main:Environment@97] - Serverenvironment:os.name=Linux
2012-03-28 22:23:25,017 - INFO [main:Environment@97] - Serverenvironment:os.arch=i386
2012-03-28 22:23:25,018 - INFO [main:Environment@97] - Serverenvironment:os.version=2.6.35-22-generic
2012-03-28 22:23:25,019 - INFO [main:Environment@97] - Serverenvironment:user.name=hadoop
2012-03-28 22:23:25,020 - INFO [main:Environment@97] - Serverenvironment:user.home=/home/hadoop
2012-03-28 22:23:25,021 - INFO [main:Environment@97] - Serverenvironment:user.dir=/home/hadoop/hadoop/zookeeper/bin
2012-03-28 22:23:25,110 - INFO [main:ZooKeeperServer@663] - tickTime set to2000
2012-03-28 22:23:25,111 - INFO [main:ZooKeeperServer@672] -minSessionTimeout set to -1
2012-03-28 22:23:25,112 - INFO [main:ZooKeeperServer@681] -maxSessionTimeout set to -1
2012-03-28 22:23:25,217 - INFO [main:NIOServerCnxn$Factory@143] - binding toport 0.0.0.0/0.0.0.0:2181
2012-03-28 22:23:25,318 - INFO [main:FileSnap@82] - Reading snapshot /home/hadoop/data/zookeeper/version-2/snapshot.0
2012-03-28 22:23:25,361 - INFO [main:FileTxnSnapLog@208] - Snapshotting: 0
8.3.3 验证启动情况
hadoop@master:~/hadoop/zookeeper/bin$zkCli.sh -server localhost:2181
9 常用问题
9.1 版本匹配
Hadoop与hbase等存在版本匹配的问题,需要选择匹配的版本才能运行。
我的试验版本是hadoop -0.20.2+hbase- hbase-0.90.0
下载链接:http://archive.apache.org/dist/
9.2 解除安全模式
./hadoop dfsadmin -safemode leave
9.3 File /user/hadoop could only bereplicated to 0 nodes, instead of 1
hadoop@master:~$ hadoop fs -put file01 .
12/03/29 20:24:19 WARN hdfs.DFSClient:DataStreamer Exception: org.apache.hadoop.ipc.RemoteException:java.io.IOException: File /user/hadoop could only be replicated to 0 nodes,instead of 1
默认的hadoop.tmp.dir的路径为/tmp/hadoop-${user.name},而我的linux系统的/tmp目录文件系统的类型往往是Hadoop不支持的。所以这里就需要更改一下hadoop.tmp.dir的路径到别的地方,但是hadoop.tmp.dir的格式必须为???//hadoop-${user.name}这种格式