HBase以表的形式存储数据,每个表由行和列组成,每个列属于一个特定的列族(Column Family)。表中由行列确定的存储单元称为一个元素(Cell),每个元素保存了同一份数据的多个版本,由时间戳来标识。
下面就从安装开始...........
1、下载与安装
选择一个 Apache 下载镜像,下载 HBase Releases. 点击 stable
目录,然后下载后缀为 .tar.gz
的文件; 例如 hbase-0.95-SNAPSHOT.tar.gz
.
解压缩,然后进入到那个要解压的目录.
$ tar xfz hbase-0.95-SNAPSHOT.tar.gz $ cd hbase-0.95-SNAPSHOT
编辑 conf/hbase-site.xml(以下配置均为伪分布模式配置)
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
编辑conf/hbase-env.xml,添加一行JAVA目录
export JAVA_HOME=/usr/lib/jdk/jdk1.7.0_40
运行HBase
伪分布模式下运行方式:
$ bin/start-hbase.sh进入HBase shell之中
$ bin/hbase shell如下所示:
root@ubuntu:~/hbase/hbase-0.94.18# bin/hbase shell HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 0.94.18, r1577788, Sat Mar 15 04:46:47 UTC 2014 hbase(main):001:0>退出:
hbase(main):001:0> exit关闭hbase
$ bin/stop-hbase.shMapReduce与HBase结合使用的例子;
改写之前的WordCount代码,将其结果写入到HBase中;
所使用的JAVA API
1、HBaseConfiguration
对HBase进行配置,以及初始化;
用法:
HBaseConfiguration config = new HBaseConfiguration();
2、HBaseAdmin
可以用来对添加,删除表格等等操作。
用法:
HBaseAdmin admin = new HBaseAdmin(config);
3、HTableDescriptor
HTableDescriptor 类包含了表的名字及其表的列表。可以用来添加列,删除列等等。
用法:
HTableDescriptor htd = new HTableDescriptor(tablename);
4、HColumnDescriptor
HColumnDescriptor 类维护着列的信息。
用法:
HColumnDescriptor col = new HColumnDescriptor("content");
5、Put
对单个行进行添加操作;
用法:
Put put = new Put(Bytes.toBytes(key.toString()));
put.add(Bytes.toBytes("content"),Bytes.toBytes("count"),Bytes.toBytes(String.valueOf(count)));
代码:
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.hbase.mapreduce.TableOutputFormat;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
public class WordCountHbase {
public static class MapClass
extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value,
Context context ) throws IOException,
InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()){
word.set(itr.nextToken());
context.write(word,one);
}
}
}
public static class Reduce extends TableReducer<Text, IntWritable, NullWritable> {
public void reduce(Text key, Iterable<IntWritable> values,
Context context) throws IOException,InterruptedException {
int count = 0;
for (IntWritable val : values) {
count += val.get();
}
Put put = new Put(Bytes.toBytes(key.toString()));
put.add(Bytes.toBytes("content"),Bytes.toBytes("count"),Bytes.toBytes(String.valueOf(count)));
context.write(NullWritable.get(), put);
}
}
public static void createHBaseTable(String tablename)throws IOException{
HTableDescriptor htd = new HTableDescriptor(tablename);
HColumnDescriptor col = new HColumnDescriptor("content");
htd.addFamily(col);
HBaseConfiguration config = new HBaseConfiguration();
HBaseAdmin admin = new HBaseAdmin(config);
if(admin.tableExists(tablename)){
System.out.println("table exists, trying recreate table! ");
admin.disableTable(tablename);
admin.deleteTable(tablename);
}
System.out.println("create new table: " + tablename);
admin.createTable(htd);
}
public static void main(String[] args) throws Exception {
String tablename = "wordcount";
Configuration conf = new Configuration();
conf.set(TableOutputFormat.OUTPUT_TABLE,tablename);
createHBaseTable(tablename);
Job job = new Job(conf, "word count table with"+ args[0]);
job.setJarByClass(WordCountHbase.class);
job.setMapperClass(MapClass.class);
//job.setCombinerClass(Reduce.class);
job.setReducerClass(Reduce.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TableOutputFormat.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
这里编译的时候,需要将Hbase目录下的hbase-0.94.18.jar、 zookeeper-3.4.5.jar、protobuf-java-2.4.0a.jar、guava-11.0.2.jar 拷贝到hadoop/lib目录下。不然就会报一堆错误。
编译指令:
$ javac -classpath hadoop-core-1.2.1.jar:lib/commons-cli-1.2.jar:lib/commons-logging-api-1.0.4.jar:hbase-0.94.18.jar:zookeeper-3.4.5.jar -d practise/WordCountHbase/classes practise/WordCountHbase/src/WordCountHbase.java
打包之后,再运行:
$ bin/hadoop jar practise/WordCountHbase/WordCountHbase.jar WordCountHbase 1.txt
这里的1.txt中存有
hello world
hello hadoop
之后再到HBase里查看,Mapreduce是否已经把结果写入到表中;
hbase(main):001:0> list TABLE tab1 wordcount 2 row(s) in 8.3730 seconds hbase(main):002:0> scan 'wordcount' ROW COLUMN+CELL hadoop column=content:count, timestamp=1397376976400, value=1 hello column=content:count, timestamp=1397376976400, value=2 world column=content:count, timestamp=1397376976400, value=1 3 row(s) in 0.5460 seconds hbase(main):003:0>
以下是一些使用Hbase报错的问题解决方法
HBase异常:hbase-default.xml file seems to be for and old version of HBase的解决方法