一、前言
Hbase 是Apache hadoop 的数据库,能够提供随机、实时的读写访问,具有开源,分布式,可扩展性及面向列存储特点。
特性如下:及模块的可扩展性,一致性读写,可配置的表自动分割策略,RegionServer 自动故障恢复,便利地备份MapReduce 作业的基类,便于客户端访问的javaAPI
为实时查询提供块缓存和Bloom Filter,可通过服务器端进行查询下推预测,提供了支持xml、Protobuf及二进制编码的Thrift网管和Rest-ful网络服务,可扩展JIRB Shell
支持hadoop 或JMX将度量标准倒出到文件或Ganglia
二、Hbase 安装
下面主要介绍完全分布式安装
1、修改con/hbase-site.xml
2、修改conf/regionservers配置
3、Zookeeper配置
修改conf/hbase-env.sh
export HBASE_MANAGES_ZK=true 表示使用 HBase 将把zooKeeper作为自身的一部分运行,其对应 的进程为“HQuorumPeer"
export HBASE_MANAGES_ZK=false 表示使用 必须手动运行hbase.zookeeper.quorum其对应 的进程为“HQuorumMain"
启动Hbase
start-hbse.sh
可能遇到的问题
http://blog.csdn.net/ice_grey/article/details/48756893
三、HBase Shell
主要有以下:
alter
count
describe
delete
deleteall
disable
drop
enable
exists
exit
get
incr
list
put
tools
scan
status
shutdown
truncate
version
四、Hbase 体系
HBase 服务器遵从简单的主从结构,由HRegin Server群和HBase Master 服务器构成。HMaser 服务器管理所有的HRegion 服务器。Hbase 所有服务
是通过zoomkeeper来进行协调的。对用户来说,每一个表是一堆数据的集合,靠主键来区分。一张表被拆分成了多少块,每一块就是一个Region.我们用表名+开始/结束主键来区分每一个HRegion.HRegion 服务器由两个部分组成HLOG和HRegion 部分。每个HRegion 又由许多Store组成,每个Sore存储的其实就是一个列族,每个Store 包含许多StoreFile,StoreFile负责的是实际的数据存储。
HMaster 由ZooKeeper协调保证有一个HMaster运行
HMaster 负责Table和HRegion 的管理工作
五、Hbase Region
HRegion 都有一个“RegionId"来标识它 的唯一性,不同的HRegion由tablename+startKey+regionId。
元数据META表保存的就是HRegion标识符和实际HRegion服务器的映射关系。
根数据表Root Table保存的元数据的存放位置
六、HBase API
数据库
HBaseAdmin
eg: HbaseAdmin admin = new HbaseAdmin(config);
admin.disableTable("tablename");
HBaseConfiguration
eg: Configuration config = new HBaseConfiguration.create();
表
HTable
eg: HTable table = new Htable(conf,Bytes.toBytes(tablename));
ResultScanner scanner = table.getScanner(Bytes.toBytes("cf"));
HTableDescriptor
eg: HtableDescriptor htd = new HtableDescriptor(name);
htd.addFamily(new HcolumnDescriptor("Family"));
列族
HColumnDescriptor
eg: HTableDescriptor htd = new HTableDescriptro(tablename);
HColumnDescriptor col = new HColumnDescriptor("content");
htd.addFamily(col);
行列操作
Put
HTable table = new HTable(conf,Bytes.toBytes(tablename));
Put p = new Put(row)
p.add(family,qualifier,value);
table.put(p);
Get
Htable table = new HTable(conf,Bytes.toBytes(tablename))
Get g = new Get(Bytes.toBytes(row))
Result result = table.get(g);
Scanner
1、api例子:
package hadoop.v12;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.util.Bytes;
public class HBaseTestCase {
//声明静态配置 HBaseConfiguration
static Configuration cfg=HBaseConfiguration.create();
//创建一张表,通过HBaseAdmin HTableDescriptor来创建
public static void creat(String tablename,String columnFamily) throws Exception {
HBaseAdmin admin = new HBaseAdmin(cfg);
if (admin.tableExists(tablename)) {
System.out.println("table Exists!");
System.exit(0);
}
else{
HTableDescriptor tableDesc = new HTableDescriptor(tablename);
tableDesc.addFamily(new HColumnDescriptor(columnFamily));
admin.createTable(tableDesc);
System.out.println("create table success!");
}
}
//添加一条数据,通过HTable Put为已经存在的表来添加数据
public static void put(String tablename,String row, String columnFamily,String column,String data) throws Exception {
HTable table = new HTable(cfg, tablename);
Put p1=new Put(Bytes.toBytes(row));
p1.add(Bytes.toBytes(columnFamily), Bytes.toBytes(column), Bytes.toBytes(data));
table.put(p1);
System.out.println("put '"+row+"','"+columnFamily+":"+column+"','"+data+"'");
}
public static void get(String tablename,String row) throws IOException{
HTable table=new HTable(cfg,tablename);
Get g=new Get(Bytes.toBytes(row));
Result result=table.get(g);
System.out.println("Get: "+result);
}
//显示所有数据,通过HTable Scan来获取已有表的信息
public static void scan(String tablename) throws Exception{
HTable table = new HTable(cfg, tablename);
Scan s = new Scan();
ResultScanner rs = table.getScanner(s);
for(Result r:rs){
System.out.println("Scan: "+r);
}
}
public static boolean delete(String tablename) throws IOException{
HBaseAdmin admin=new HBaseAdmin(cfg);
if(admin.tableExists(tablename)){
try
{
admin.disableTable(tablename);
admin.deleteTable(tablename);
}catch(Exception ex){
ex.printStackTrace();
return false;
}
}
return true;
}
public static void main (String [] agrs) {
String tablename="hbase_tb";
String columnFamily="cf";
try {
HBaseTestCase.creat(tablename, columnFamily);
HBaseTestCase.put(tablename, "row1", columnFamily, "cl1", "data");
HBaseTestCase.get(tablename, "row1");
HBaseTestCase.scan(tablename);
/* if(true==HBaseTestCase.delete(tablename))
System.out.println("Delete table:"+tablename+"success!");
*/
}
catch (Exception e) {
e.printStackTrace();
}
}
}
2、Hbase 和WoudCount结合
package hadoop.v12;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.mapreduce.TableOutputFormat;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
public class WordCountHBase
{
public static class Map extends Mapper<LongWritable,Text,Text,IntWritable>{
private IntWritable i = new IntWritable(1);
public void map(LongWritable key,Text value,Context context) throws IOException, InterruptedException{
String s[] =value.toString().trim().split(" "); //将输入的每行输入以空格分开
for( String m : s){
context.write(new Text(m), i);
}
}
}
public static class Reduce extends TableReducer<Text, IntWritable, NullWritable>{
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException{
int sum = 0;
for(IntWritable i : values){
sum += i.get();
}
Put put = new Put(Bytes.toBytes(key.toString())); //Put实例化,每一个词存一行
put.add(Bytes.toBytes("content"),Bytes.toBytes("count"),Bytes.toBytes(String.valueOf(sum)));//列族为content,列修饰符为count,列值为数目
context.write(NullWritable.get(), put);
}
}
public static void createHBaseTable(String tablename)throws IOException{
HTableDescriptor htd = new HTableDescriptor(tablename);
HColumnDescriptor col = new HColumnDescriptor("content");
htd.addFamily(col);
HBaseConfiguration config = new HBaseConfiguration();
HBaseAdmin admin = new HBaseAdmin(config);
if(admin.tableExists(tablename)){
System.out.println("table exists, trying recreate table! ");
admin.disableTable(tablename);
admin.deleteTable(tablename);
}
System.out.println("create new table: " + tablename);
admin.createTable(htd);
}
public static void main(String args[]) throws Exception{
String tablename = "wordcount";
Configuration conf = new Configuration();
conf.set(TableOutputFormat.OUTPUT_TABLE, tablename);
createHBaseTable(tablename);
String input = args[0]; //设置输入值
Job job = new Job(conf, "WordCount table with " + input);
job.setJarByClass(WordCountHBase.class);
job.setNumReduceTasks(3);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TableOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(input));
System.exit(job.waitForCompletion(true)?0:1);
}
}
参巧:hadoop 实战第二版