分布式系统HBase模块及Java API

HBase是基于列存储的,每个列族都由几个文件保存,不同列族的文件是分离的

HBase只有一个索引——行键,通过巧妙的设计,HBase中的所有访问方法,或者通过行键访问,或者通过行键扫描,从而使得整个系统不会慢下来

HBase是一个稀疏、多维度、排序的映射表,这张表的索引是行键、列族、列限定符和时间戳,数据坐标HBase中需要根据行键、列族、列限定符和时间戳来确定一个单元格,因此,可以视为一个"四维坐标",即[行键,列族列限定符,时间戳]

每个值是一个未经解释的字符串,没有数据类型

 

HBase实现原理

1HBase的实现包括三个主要的功能组件:

1)库函数:链接到每个客户端

2)一个Master主服务器

3)许多个Region服务器

(1、客户端包含访问HBase的接口,同时在缓存中维护着已经访问过的Region位置信息,用来加快后续数据访问过程

 

(2Zookeeper可以帮助选举出一个Master作为集群的总管,并保证在任何时刻总有唯一一个Master在运行,这就避免了Master"单点失效"问题

Zookeeper是一个很好的集群管理工具,被大量用于分布式计算,提供配置维护、域名服务、分布式同步、组服务等。)

(3. Master

主服务器Master主要负责表和Region的管理工作:

管理用户对表的增加、删除、修改、查询等操作

实现不同Region服务器之间的负载均衡

Region分裂或合并后,负责重新调整Region的分布

对发生故障失效的Region服务器上的Region进行迁移

(4. Region服务器

Region服务器是HBase中最核心的模块,负责维护分配给自己的Region,并响应用户的读写请求

(1、用户读写数据过程
用户写入数据时,被分配到相应Region服务器去执行

用户数据首先被写入到MemStoreHlog

只有当操作写入Hlog之后,commit()调用才会将其返回给客户端

当用户读取数据时,Region服务器会首先访问MemStore缓存,如果找不到,再去磁盘上面的StoreFile中寻找

(2、缓存的刷新

系统会周期性地把MemStore缓存里的内容刷写到磁盘的StoreFile文件中,清空缓存,并在Hlog里面写入一个标记、
每次刷写都生成一个新的StoreFile文件,因此,每个Store包含多个StoreFile文件

 

每个Region服务器都有一个自己的HLog文件,每次启动都检查该文件,确认最近一次执行缓存刷新操作之后是否发生新的写入操作;如果发现更新,则先写入MemStore,再刷写到StoreFile,最后删除旧的Hlog文件,开始为用户提供服务
(3
StroreFile的合并

每次刷写都生成一个新的StoreFile,数量太多,影响查找速度

调用Store.compact()把多个合并成一个

合并操作比较耗费资源,只有数量达到一个阈值才启动合并

3Store工作原理

StoreRegion服务器的核心

多个StoreFile合并成一个
触发分裂操作,1个父Region被分裂成两个子Region

单个StoreFile过大时,又

4HLog工作原理

分布式环境必须要考虑系统出错。HBase采用HLog保证系统恢复

HBase系统为每个Region服务器配置了一个HLog文件,它是一种预写式日志(WriteAhead Log

用户更新数据必须首先写入日志后,才能写入MemStore缓存,并且,直到MemStore缓存内容对应的日志已经写入磁盘,该缓存内容才能被刷写到磁盘


Zookeeper
会实时监测每个Region服务器的状态,当某个Region服务器发生故障时,Zookeeper会通知Master

Master首先会处理该故障Region服务器上面遗留的HLog文件,这个遗留的HLog文件中包含了来自多个Region对象的日志记录

系统会根据每条日志记录所属的Region对象对HLog数据进行拆分,分别放到相应Region对象的目录下,然后,再将失效的Region重新分配到可用的Region服务器中,并把与该Region对象相关的HLog日志记录也发送给相应的Region服务器

Region服务器领取到分配给自己的Region对象以及与之相关的HLog日志记录以后,会重新做一遍日志记录中的各种操作,把日志记录中的数据写入到MemStore缓存中,然后,刷新到磁盘的StoreFile文件中,完成数据恢复

共用日志优点:提高对表的写操作性能;缺点:恢复时需要分拆日志

 

2HBaseMaster默认基于WebUI服务端口为60010HBase region服务器默认基于WebUI服务端口为60030.如果master运行在名为master.foo.com的主机中,mater的主页地址就是http://master.foo.com:60010,用户可以通过Web浏览器输入这个地址查看该页面

可以查看HBase集群的当前状态

 

 

 

HBase的版本0.98.8-hadoop2

所需的依赖包

commons-codec-1.7.jar

commons-collections-3.2.1.jar

commons-configuration-1.6.jar

commons-lang-2.6.jar

commons-logging-1.1.3.jar

guava-12.0.1.jar

hadoop-auth-2.5.0.jar

hadoop-common-2.5.0.jar

hbase-client-0.98.8-hadoop2.jar

hbase-common-0.98.8-hadoop2.jar

hbase-protocol-0.98.8-hadoop2.jar

htrace-core-2.04.jar

jackson-core-asl-1.9.13.jar

jackson-mapper-asl-1.9.13.jar

log4j-1.2.17.jar

mysql-connector-java-5.1.7-bin.jar

netty-3.6.6.Final.jar

protobuf-java-2.5.0.jar

slf4j-api-1.7.5.jar

slf4j-log4j12-1.7.5.jar

zookeeper-3.4.6.jar

 

 

首先我们要确保Web项目中已经把必要的Jar包添加到ClassPath了,下面我对一些HBase的连接做了小封装:

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.client.HBaseAdmin;

 

/**

* @author a01513

*

*/

public class HBaseConnector {

    

    private static final String QUOREM = "192.168.103.50,192.168.103.51,192.168.103.52,192.168.103.53,192.168.103.54,192.168.103.55,192.168.103.56,192.168.103.57,192.168.103.58,192.168.103.59,192.168.103.60";

    private static final String CLIENT_PORT = "2181";

    private HBaseAdmin admin;

    private Configuration conf;

      

    

    public HBaseAdmin getHBaseAdmin(){

        getConfiguration();

try {

            admin = new HBaseAdmin(conf);

        } catch (Exception e) {

            e.printStackTrace();

        }

        return admin;

    }

    

    public Configuration getConfiguration(){

        if(conf == null){

            conf = HBaseConfiguration.create();

     conf.set("hbase.zookeeper.quorum", QUOREM);

     conf.set("hbase.zookeeper.property.clientPort", CLIENT_PORT);

        }

        return conf;

    }

HBase JavaAPI

HBaseConfiguration

关系:org.apache.hadoop.hbase.HBaseConfiguration

作用:对HBase进行配置

返回值

函数

描述

void

addResource(Path file)

通过给定的路径所指的文件来添加资源

void

clear()

清空所有已设置的属性

string

get(String name)

获取属性名对应的值

String

getBoolean(String name, boolean defaultValue)

获取为boolean类型的属性值,如果其属性值类型部位boolean,则返回默认属性值

void

set(String name, String value)

通过属性名来设置值

void

setBoolean(String name, boolean value)

设置boolean类型的属性值

用法示例:

 

  1. HBaseConfiguration hconfig = new HBaseConfiguration();
  2. hconfig.set("hbase.zookeeper.property.clientPort","2181");

 

HBaseAdmin

关系:org.apache.hadoop.hbase.client.HBaseAdmin

作用:提供了一个接口来管理HBase数据库的表信息。它提供的方法包括:创建表,删除表,列出表项,使表有效或无效,以及添加或删除表列族成员等。

返回值

函数

描述

      void

addColumn(String tableName, HColumnDescriptor column)

向一个已经存在的表添加咧

checkHBaseAvailable(HBaseConfiguration conf)

静态函数,查看HBase是否处于运行状态

createTable(HTableDescriptor desc)

创建一个表,同步操作

deleteTable(byte[] tableName)

删除一个已经存在的表

enableTable(byte[] tableName)

使表处于有效状态

disableTable(byte[] tableName)

使表处于无效状态

HTableDescriptor[]

listTables()

列出所有用户控件表项

void

modifyTable(byte[] tableName, HTableDescriptor htd)

修改表的模式,是异步的操作,可能需要花费一定的时间

boolean

tableExists(String tableName)

检查表是否存在

用法示例:

 

  1. HBaseAdmin admin = new HBaseAdmin(config);
  2. admin.disableTable("tablename")

 

 

HTableDescriptor

关系:org.apache.hadoop.hbase.HTableDescriptor

作用:包含了表的名字极其对应表的列族

返回值

函数

描述

void

addFamily(HColumnDescriptor)

添加一个列族

HColumnDescriptor

removeFamily(byte[] column)

移除一个列族

byte[]

getName()

获取表的名字

byte[]

getValue(byte[] key)

获取属性的值

void

setValue(String key, String value)

设置属性的值

用法示例:

 

  1. HTableDescriptor htd = new HTableDescriptor(table);
  2. htd.addFamily(new HcolumnDescriptor("family"));

 

HTableDescriptor

关系:org.apache.hadoop.hbase.HTableDescriptor

作用:包含了表的名字极其对应表的列族

返回值

函数

描述

void

addFamily(HColumnDescriptor)

添加一个列族

HColumnDescriptor

removeFamily(byte[] column)

移除一个列族

byte[]

getName()

获取表的名字

byte[]

getValue(byte[] key)

获取属性的值

void

setValue(String key, String value)

设置属性的值

用法示例:

 

  1. HTableDescriptor htd = new HTableDescriptor(table);
  2. htd.addFamily(new HcolumnDescriptor("family"));

 

HTable

关系:org.apache.hadoop.hbase.client.HTable

作用:可以用来和HBase表直接通信。此方法对于更新操作来说是非线程安全的。

返回值

函数

描述

void

checkAdnPut(byte[] row, byte[] family, byte[] qualifier, byte[] value, Put put

自动的检查row/family/qualifier是否与给定的值匹配

void

close()

释放所有的资源或挂起内部缓冲区中的更新

Boolean

exists(Get get)

检查Get实例所指定的值是否存在于HTable的列中

Result

get(Get get)

获取指定行的某些单元格所对应的值

byte[][]

getEndKeys()

获取当前一打开的表每个区域的结束键值

ResultScanner

getScanner(byte[] family)

获取当前给定列族的scanner实例

HTableDescriptor

getTableDescriptor()

获取当前表的HTableDescriptor实例

byte[]

getTableName()

获取表名

static boolean

isTableEnabled(HBaseConfiguration conf, String tableName)

检查表是否有效

void

put(Put put)

向表中添加值

用法示例:

 

  1. HTable table = new HTable(conf, Bytes.toBytes(tablename));
  2. ResultScanner scanner =  table.getScanner(family);

Put

关系:org.apache.hadoop.hbase.client.Put

作用:用来对单个行执行添加操作

返回值

函数

描述

Put

add(byte[] family, byte[] qualifier, byte[] value)

将指定的列和对应的值添加到Put实例中

Put

add(byte[] family, byte[] qualifier, long ts, byte[] value)

将指定的列和对应的值及时间戳添加到Put实例中

byte[]

getRow()

获取Put实例的行

RowLock

getRowLock()

获取Put实例的行锁

long

getTimeStamp()

获取Put实例的时间戳

boolean

isEmpty()

检查familyMap是否为空

Put

setTimeStamp(long timeStamp)

设置Put实例的时间戳

用法示例:

 

  1. HTable table = new HTable(conf,Bytes.toBytes(tablename));
  2. Put p = new Put(brow);//为指定行创建一个Put操作
  3. p.add(family,qualifier,value);
  4. table.put(p);

Get

关系:org.apache.hadoop.hbase.client.Get

作用:用来获取单个行的相关信息

返回值

函数

描述

Get

addColumn(byte[] family, byte[] qualifier)

获取指定列族和列修饰符对应的列

Get

addFamily(byte[] family)

通过指定的列族获取其对应列的所有列

Get

setTimeRange(long minStamp,long maxStamp)

获取指定取件的列的版本号

Get

setFilter(Filter filter)

当执行Get操作时设置服务器端的过滤器

用法示例:

 

  1. HTable table = new HTable(conf, Bytes.toBytes(tablename));
  2. Get g = new Get(Bytes.toBytes(row));

 

Result

关系:org.apache.hadoop.hbase.client.Result

作用:存储Get或者Scan操作后获取表的单行值。使用此类提供的方法可以直接获取值或者各种Map结构(key-value对)

返回值

函数

描述

boolean

containsColumn(byte[] family, byte[] qualifier)

检查指定的列是否存在

NavigableMap<byte[],byte[]>

getFamilyMap(byte[] family)

获取对应列族所包含的修饰符与值的键值对

byte[]

getValue(byte[] family, byte[] qualifier)

获取对应列的最新值

ResultScanner

关系:Interface

作用:客户端获取值的接口

返回值

函数

描述

void

close()

关闭scanner并释放分配给它的资源

Result

next()

获取下一行的值

 

 

import java.io.IOException;

 

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.HColumnDescriptor;

import org.apache.hadoop.hbase.HTableDescriptor;

import org.apache.hadoop.hbase.KeyValue;

import org.apache.hadoop.hbase.client.Delete;

import org.apache.hadoop.hbase.client.Get;

import org.apache.hadoop.hbase.client.HBaseAdmin;

import org.apache.hadoop.hbase.client.HTable;

import org.apache.hadoop.hbase.client.HTablePool;

import org.apache.hadoop.hbase.client.Put;

import org.apache.hadoop.hbase.client.Result;

import org.apache.hadoop.hbase.client.ResultScanner;

import org.apache.hadoop.hbase.client.Scan;

import org.apache.hadoop.hbase.util.Bytes;

 

public class Hbase {

// 声明静态配置

static Configuration conf = null;

static {

conf = HBaseConfiguration.create();

conf.set("hbase.zookeeper.quorum", "localhost");

}

 

/*

* 创建表

*

* @tableName 表名

*

* @family 列族列表

*/

public static void creatTable(String tableName, String[] family)

throws Exception {

HBaseAdmin admin = new HBaseAdmin(conf);

HTableDescriptor desc = new HTableDescriptor(tableName);

for (int i = 0; i < family.length; i++) {

desc.addFamily(new HColumnDescriptor(family[i]));

}

if (admin.tableExists(tableName)) {

System.out.println("table Exists!");

System.exit(0);

} else {

admin.createTable(desc);

System.out.println("create table Success!");

}

}

 

/*

* 为表添加数据(适合知道有多少列族的固定表)

*

* @rowKey rowKey

*

* @tableName 表名

*

* @column1 第一个列族列表

*

* @value1 第一个列的值的列表

*

* @column2 第二个列族列表

*

* @value2 第二个列的值的列表

*/

public static void addData(String rowKey, String tableName,

String[] column1, String[] value1, String[] column2, String[] value2)

throws IOException {

Put put = new Put(Bytes.toBytes(rowKey));// 设置rowkey

HTable table = new HTable(conf, Bytes.toBytes(tableName));// HTabel负责跟记录相关的操作如增删改查等//

// 获取表

HColumnDescriptor[] columnFamilies = table.getTableDescriptor() // 获取所有的列族

.getColumnFamilies();

 

for (int i = 0; i < columnFamilies.length; i++) {

String familyName = columnFamilies[i].getNameAsString(); // 获取列族名

if (familyName.equals("article")) { // article列族put数据

for (int j = 0; j < column1.length; j++) {

put.add(Bytes.toBytes(familyName),

Bytes.toBytes(column1[j]), Bytes.toBytes(value1[j]));

}

}

if (familyName.equals("author")) { // author列族put数据

for (int j = 0; j < column2.length; j++) {

put.add(Bytes.toBytes(familyName),

Bytes.toBytes(column2[j]), Bytes.toBytes(value2[j]));

}

}

}

table.put(put);

System.out.println("add data Success!");

}

 

/*

* 根据rwokey查询

*

* @rowKey rowKey

*

* @tableName 表名

*/

public static Result getResult(String tableName, String rowKey)

throws IOException {

Get get = new Get(Bytes.toBytes(rowKey));

HTable table = new HTable(conf, Bytes.toBytes(tableName));// 获取表

Result result = table.get(get);

for (KeyValue kv : result.list()) {

System.out.println("family:" + Bytes.toString(kv.getFamily()));

System.out

.println("qualifier:" + Bytes.toString(kv.getQualifier()));

System.out.println("value:" + Bytes.toString(kv.getValue()));

System.out.println("Timestamp:" + kv.getTimestamp());

System.out.println("-------------------------------------------");

}

return result;

}

 

/*

* 遍历查询hbase表

*

* @tableName 表名

*/

public static void getResultScann(String tableName) throws IOException {

Scan scan = new Scan();

ResultScanner rs = null;

HTable table = new HTable(conf, Bytes.toBytes(tableName));

try {

rs = table.getScanner(scan);

for (Result r : rs) {

for (KeyValue kv : r.list()) {

System.out.println("row:" + Bytes.toString(kv.getRow()));

System.out.println("family:"

+ Bytes.toString(kv.getFamily()));

System.out.println("qualifier:"

+ Bytes.toString(kv.getQualifier()));

System.out

.println("value:" + Bytes.toString(kv.getValue()));

System.out.println("timestamp:" + kv.getTimestamp());

System.out

.println("-------------------------------------------");

}

}

} finally {

rs.close();

}

}

 

/*

* 遍历查询hbase表

*

* @tableName 表名

*/

public static void getResultScann(String tableName, String start_rowkey,

String stop_rowkey) throws IOException {

Scan scan = new Scan();

scan.setStartRow(Bytes.toBytes(start_rowkey));

scan.setStopRow(Bytes.toBytes(stop_rowkey));

ResultScanner rs = null;

HTable table = new HTable(conf, Bytes.toBytes(tableName));

try {

rs = table.getScanner(scan);

for (Result r : rs) {

for (KeyValue kv : r.list()) {

System.out.println("row:" + Bytes.toString(kv.getRow()));

System.out.println("family:"

+ Bytes.toString(kv.getFamily()));

System.out.println("qualifier:"

+ Bytes.toString(kv.getQualifier()));

System.out

.println("value:" + Bytes.toString(kv.getValue()));

System.out.println("timestamp:" + kv.getTimestamp());

System.out

.println("-------------------------------------------");

}

}

} finally {

rs.close();

}

}

 

/*

* 查询表中的某一列

*

* @tableName 表名

*

* @rowKey rowKey

*/

public static void getResultByColumn(String tableName, String rowKey,

String familyName, String columnName) throws IOException {

HTable table = new HTable(conf, Bytes.toBytes(tableName));

Get get = new Get(Bytes.toBytes(rowKey));

get.addColumn(Bytes.toBytes(familyName), Bytes.toBytes(columnName)); // 获取指定列族和列修饰符对应的列

Result result = table.get(get);

for (KeyValue kv : result.list()) {

System.out.println("family:" + Bytes.toString(kv.getFamily()));

System.out

.println("qualifier:" + Bytes.toString(kv.getQualifier()));

System.out.println("value:" + Bytes.toString(kv.getValue()));

System.out.println("Timestamp:" + kv.getTimestamp());

System.out.println("-------------------------------------------");

}

}

 

/*

* 更新表中的某一列

*

* @tableName 表名

*

* @rowKey rowKey

*

* @familyName 列族名

*

* @columnName 列名

*

* @value 更新后的值

*/

public static void updateTable(String tableName, String rowKey,

String familyName, String columnName, String value)

throws IOException {

HTable table = new HTable(conf, Bytes.toBytes(tableName));

Put put = new Put(Bytes.toBytes(rowKey));

put.add(Bytes.toBytes(familyName), Bytes.toBytes(columnName),

Bytes.toBytes(value));

table.put(put);

System.out.println("update table Success!");

}

 

/*

* 查询某列数据的多个版本

*

* @tableName 表名

*

* @rowKey rowKey

*

* @familyName 列族名

*

* @columnName 列名

*/

public static void getResultByVersion(String tableName, String rowKey,

String familyName, String columnName) throws IOException {

HTable table = new HTable(conf, Bytes.toBytes(tableName));

Get get = new Get(Bytes.toBytes(rowKey));

get.addColumn(Bytes.toBytes(familyName), Bytes.toBytes(columnName));

get.setMaxVersions(5);

Result result = table.get(get);

for (KeyValue kv : result.list()) {

System.out.println("family:" + Bytes.toString(kv.getFamily()));

System.out

.println("qualifier:" + Bytes.toString(kv.getQualifier()));

System.out.println("value:" + Bytes.toString(kv.getValue()));

System.out.println("Timestamp:" + kv.getTimestamp());

System.out.println("-------------------------------------------");

}

/*

* List<?> results = table.get(get).list(); Iterator<?> it =

* results.iterator(); while (it.hasNext()) {

* System.out.println(it.next().toString()); }

*/

}

 

/*

* 删除指定的列

*

* @tableName 表名

*

* @rowKey rowKey

*

* @familyName 列族名

*

* @columnName 列名

*/

public static void deleteColumn(String tableName, String rowKey,

String falilyName, String columnName) throws IOException {

HTable table = new HTable(conf, Bytes.toBytes(tableName));

Delete deleteColumn = new Delete(Bytes.toBytes(rowKey));

deleteColumn.deleteColumns(Bytes.toBytes(falilyName),

Bytes.toBytes(columnName));

table.delete(deleteColumn);

System.out.println(falilyName + ":" + columnName + "is deleted!");

}

 

/*

* 删除指定的列

*

* @tableName 表名

*

* @rowKey rowKey

*/

public static void deleteAllColumn(String tableName, String rowKey)

throws IOException {

HTable table = new HTable(conf, Bytes.toBytes(tableName));

Delete deleteAll = new Delete(Bytes.toBytes(rowKey));

table.delete(deleteAll);

System.out.println("all columns are deleted!");

}

 

/*

* 删除表

*

* @tableName 表名

*/

public static void deleteTable(String tableName) throws IOException {

HBaseAdmin admin = new HBaseAdmin(conf);

admin.disableTable(tableName);

admin.deleteTable(tableName);

System.out.println(tableName + "is deleted!");

}

 

public static void main(String[] args) throws Exception {

 

// 创建表

String tableName = "blog2";

String[] family = { "article", "author" };

// creatTable(tableName, family);

 

// 为表添加数据

 

String[] column1 = { "title", "content", "tag" };

String[] value1 = {

"Head First HBase",

"HBase is the Hadoop database. Use it when you need random, realtime read/write access to your Big Data.",

"Hadoop,HBase,NoSQL" };

String[] column2 = { "name", "nickname" };

String[] value2 = { "nicholas", "lee" };

addData("rowkey1", "blog2", column1, value1, column2, value2);

addData("rowkey2", "blog2", column1, value1, column2, value2);

addData("rowkey3", "blog2", column1, value1, column2, value2);

 

// 遍历查询

getResultScann("blog2", "rowkey4", "rowkey5");

// 根据row key范围遍历查询

getResultScann("blog2", "rowkey4", "rowkey5");

 

// 查询

getResult("blog2", "rowkey1");

 

// 查询某一列的值

getResultByColumn("blog2", "rowkey1", "author", "name");

 

// 更新列

updateTable("blog2", "rowkey1", "author", "name", "bin");

 

// 查询某一列的值

getResultByColumn("blog2", "rowkey1", "author", "name");

 

// 查询某列的多版本

getResultByVersion("blog2", "rowkey1", "author", "name");

 

// 删除一列

deleteColumn("blog2", "rowkey1", "author", "nickname");

 

// 删除所有列

deleteAllColumn("blog2", "rowkey1");

 

// 删除表

deleteTable("blog2");

 

}

}

 

Hbase基本使用示例:

import java.io.IOException;

import java.util.ArrayList;

import java.util.List;

 

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.HColumnDescriptor;

import org.apache.hadoop.hbase.HTableDescriptor;

import org.apache.hadoop.hbase.KeyValue;

import org.apache.hadoop.hbase.MasterNotRunningException;

import org.apache.hadoop.hbase.ZooKeeperConnectionException;

import org.apache.hadoop.hbase.client.Delete;

import org.apache.hadoop.hbase.client.Get;

import org.apache.hadoop.hbase.client.HBaseAdmin;

import org.apache.hadoop.hbase.client.HTable;

import org.apache.hadoop.hbase.client.HTablePool;

import org.apache.hadoop.hbase.client.Put;

import org.apache.hadoop.hbase.client.Result;

import org.apache.hadoop.hbase.client.ResultScanner;

import org.apache.hadoop.hbase.client.Scan;

import org.apache.hadoop.hbase.filter.Filter;

import org.apache.hadoop.hbase.filter.FilterList;

import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;

import org.apache.hadoop.hbase.filter.CompareFilter.CompareOp;

import org.apache.hadoop.hbase.util.Bytes;

 

public class HbaseTest {

 

public static Configuration configuration;

static {

configuration = HBaseConfiguration.create();

configuration.set("hbase.zookeeper.property.clientPort", "2181");

configuration.set("hbase.zookeeper.quorum", "192.168.1.100");

configuration.set("hbase.master", "192.168.1.100:600000");

}

 

public static void main(String[] args) {

// createTable("wujintao");

// insertData("wujintao");

// QueryAll("wujintao");

// QueryByCondition1("wujintao");

// QueryByCondition2("wujintao");

//QueryByCondition3("wujintao");

//deleteRow("wujintao","abcdef");

deleteByCondition("wujintao","abcdef");

}

 

 

public static void createTable(String tableName) {

System.out.println("start create table ......");

try {

HBaseAdmin hBaseAdmin = new HBaseAdmin(configuration);

if (hBaseAdmin.tableExists(tableName)) {// 如果存在要创建的表,那么先删除,再创建

hBaseAdmin.disableTable(tableName);

hBaseAdmin.deleteTable(tableName);

System.out.println(tableName + " is exist,detele....");

}

HTableDescriptor tableDescriptor = new HTableDescriptor(tableName);

tableDescriptor.addFamily(new HColumnDescriptor("column1"));

tableDescriptor.addFamily(new HColumnDescriptor("column2"));

tableDescriptor.addFamily(new HColumnDescriptor("column3"));

hBaseAdmin.createTable(tableDescriptor);

} catch (MasterNotRunningException e) {

e.printStackTrace();

} catch (ZooKeeperConnectionException e) {

e.printStackTrace();

} catch (IOException e) {

e.printStackTrace();

}

System.out.println("end create table ......");

}

 

 

public static void insertData(String tableName) {

System.out.println("start insert data ......");

HTablePool pool = new HTablePool(configuration, 1000);

HTable table = (HTable) pool.getTable(tableName);

Put put = new Put("112233bbbcccc".getBytes());// 一个PUT代表一行数据,再NEW一个PUT表示第二行数据,每行一个唯一的ROWKEY,此处rowkey为put构造方法中传入的值

put.add("column1".getBytes(), null, "aaa".getBytes());// 本行数据的第一列

put.add("column2".getBytes(), null, "bbb".getBytes());// 本行数据的第三列

put.add("column3".getBytes(), null, "ccc".getBytes());// 本行数据的第三列

try {

table.put(put);

} catch (IOException e) {

e.printStackTrace();

}

System.out.println("end insert data ......");

}

 

 

public static void dropTable(String tableName) {

try {

HBaseAdmin admin = new HBaseAdmin(configuration);

admin.disableTable(tableName);

admin.deleteTable(tableName);

} catch (MasterNotRunningException e) {

e.printStackTrace();

} catch (ZooKeeperConnectionException e) {

e.printStackTrace();

} catch (IOException e) {

e.printStackTrace();

}

 

}

 

public static void deleteRow(String tablename, String rowkey) {

try {

HTable table = new HTable(configuration, tablename);

List list = new ArrayList();

Delete d1 = new Delete(rowkey.getBytes());

list.add(d1);

 

table.delete(list);

System.out.println("删除行成功!");

 

} catch (IOException e) {

e.printStackTrace();

}

 

 

}

 

 

public static void deleteByCondition(String tablename, String rowkey) {

//目前还没有发现有效的API能够实现根据非rowkey的条件删除这个功能能,还有清空表全部数据的API操作

 

}

 

 

 

public static void QueryAll(String tableName) {

HTablePool pool = new HTablePool(configuration, 1000);

HTable table = (HTable) pool.getTable(tableName);

try {

ResultScanner rs = table.getScanner(new Scan());

for (Result r : rs) {

System.out.println("获得到rowkey:" + new String(r.getRow()));

for (KeyValue keyValue : r.raw()) {

System.out.println("列:" + new String(keyValue.getFamily())

+ "====值:" + new String(keyValue.getValue()));

}

}

} catch (IOException e) {

e.printStackTrace();

}

}

 

 

public static void QueryByCondition1(String tableName) {

 

HTablePool pool = new HTablePool(configuration, 1000);

HTable table = (HTable) pool.getTable(tableName);

try {

Get scan = new Get("abcdef".getBytes());// 根据rowkey查询

Result r = table.get(scan);

System.out.println("获得到rowkey:" + new String(r.getRow()));

for (KeyValue keyValue : r.raw()) {

System.out.println("列:" + new String(keyValue.getFamily())

+ "====值:" + new String(keyValue.getValue()));

}

} catch (IOException e) {

e.printStackTrace();

}

}

 

 

public static void QueryByCondition2(String tableName) {

 

try {

HTablePool pool = new HTablePool(configuration, 1000);

HTable table = (HTable) pool.getTable(tableName);

Filter filter = new SingleColumnValueFilter(Bytes

.toBytes("column1"), null, CompareOp.EQUAL, Bytes

.toBytes("aaa")); // 当列column1的值为aaa时进行查询

Scan s = new Scan();

s.setFilter(filter);

ResultScanner rs = table.getScanner(s);

for (Result r : rs) {

System.out.println("获得到rowkey:" + new String(r.getRow()));

for (KeyValue keyValue : r.raw()) {

System.out.println("列:" + new String(keyValue.getFamily())

+ "====值:" + new String(keyValue.getValue()));

}

}

} catch (Exception e) {

e.printStackTrace();

}

 

}

 

 

public static void QueryByCondition3(String tableName) {

 

try {

HTablePool pool = new HTablePool(configuration, 1000);

HTable table = (HTable) pool.getTable(tableName);

 

List<Filter> filters = new ArrayList<Filter>();

 

Filter filter1 = new SingleColumnValueFilter(Bytes

.toBytes("column1"), null, CompareOp.EQUAL, Bytes

.toBytes("aaa"));

filters.add(filter1);

 

Filter filter2 = new SingleColumnValueFilter(Bytes

.toBytes("column2"), null, CompareOp.EQUAL, Bytes

.toBytes("bbb"));

filters.add(filter2);

 

Filter filter3 = new SingleColumnValueFilter(Bytes

.toBytes("column3"), null, CompareOp.EQUAL, Bytes

.toBytes("ccc"));

filters.add(filter3);

 

FilterList filterList1 = new FilterList(filters);

 

Scan scan = new Scan();

scan.setFilter(filterList1);

ResultScanner rs = table.getScanner(scan);

for (Result r : rs) {

System.out.println("获得到rowkey:" + new String(r.getRow()));

for (KeyValue keyValue : r.raw()) {

System.out.println("列:" + new String(keyValue.getFamily())

+ "====值:" + new String(keyValue.getValue()));

}

}

rs.close();

 

} catch (Exception e) {

e.printStackTrace();

}

 

}

 

}

Hbase数据获取示例:

/*

* Need Packages:

* commons-codec-1.4.jar

*

* commons-logging-1.1.1.jar

*

* hadoop-0.20.2-core.jar

*

* hbase-0.90.2.jar

*

* log4j-1.2.16.jar

*

* zookeeper-3.3.2.jar

*

*/

 

import java.io.IOException;

import java.util.ArrayList;

import java.util.List;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.KeyValue;

import org.apache.hadoop.hbase.client.Get;

import org.apache.hadoop.hbase.client.HTable;

import org.apache.hadoop.hbase.client.Result;

import org.apache.hadoop.hbase.client.ResultScanner;

import org.apache.hadoop.hbase.client.Scan;

import org.apache.hadoop.hbase.filter.CompareFilter.CompareOp;

import org.apache.hadoop.hbase.filter.FilterList;

import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;

import org.apache.hadoop.hbase.util.Bytes;

 

public class HbaseSelecter

{

public static Configuration configuration = null;

static

{

configuration = HBaseConfiguration.create();

//configuration.set("hbase.master", "192.168.0.201:60000");

configuration.set("hbase.zookeeper.quorum", "idc01-hd-nd-03,idc01-hd-nd-04,idc01-hd-nd-05");

//configuration.set("hbase.zookeeper.property.clientPort", "2181");

}

 

public static void selectRowKey(String tablename, String rowKey) throws IOException

{

HTable table = new HTable(configuration, tablename);

Get g = new Get(rowKey.getBytes());

Result rs = table.get(g);

 

for (KeyValue kv : rs.raw())

{

System.out.println("--------------------" + new String(kv.getRow()) + "----------------------------");

System.out.println("Column Family: " + new String(kv.getFamily()));

System.out.println("Column :" + new String(kv.getQualifier()));

System.out.println("value : " + new String(kv.getValue()));

}

}

 

public static void selectRowKeyFamily(String tablename, String rowKey, String family) throws IOException

{

HTable table = new HTable(configuration, tablename);

Get g = new Get(rowKey.getBytes());

g.addFamily(Bytes.toBytes(family));

Result rs = table.get(g);

for (KeyValue kv : rs.raw())

{

System.out.println("--------------------" + new String(kv.getRow()) + "----------------------------");

System.out.println("Column Family: " + new String(kv.getFamily()));

System.out.println("Column :" + new String(kv.getQualifier()));

System.out.println("value : " + new String(kv.getValue()));

}

}

 

public static void selectRowKeyFamilyColumn(String tablename, String rowKey, String family, String column)

throws IOException

{

HTable table = new HTable(configuration, tablename);

Get g = new Get(rowKey.getBytes());

g.addColumn(family.getBytes(), column.getBytes());

 

Result rs = table.get(g);

 

for (KeyValue kv : rs.raw())

{

System.out.println("--------------------" + new String(kv.getRow()) + "----------------------------");

System.out.println("Column Family: " + new String(kv.getFamily()));

System.out.println("Column :" + new String(kv.getQualifier()));

System.out.println("value : " + new String(kv.getValue()));

}

}

 

public static void selectFilter(String tablename, List<String> arr) throws IOException

{

HTable table = new HTable(configuration, tablename);

Scan scan = new Scan();// 实例化一个遍历器

FilterList filterList = new FilterList(); // 过滤器List

 

for (String v : arr)

{ // 下标0为列簇,1为列名,3为条件

String[] wheres = v.split(",");

 

filterList.addFilter(new SingleColumnValueFilter(// 过滤器

wheres[0].getBytes(), wheres[1].getBytes(),

 

CompareOp.EQUAL,// 各个条件之间是" and "的关系

wheres[2].getBytes()));

}

scan.setFilter(filterList);

ResultScanner ResultScannerFilterList = table.getScanner(scan);

for (Result rs = ResultScannerFilterList.next(); rs != null; rs = ResultScannerFilterList.next())

{

for (KeyValue kv : rs.list())

{

System.out.println("--------------------" + new String(kv.getRow()) + "----------------------------");

System.out.println("Column Family: " + new String(kv.getFamily()));

System.out.println("Column :" + new String(kv.getQualifier()));

System.out.println("value : " + new String(kv.getValue()));

}

}

}

 

public static void main(String[] args) throws Exception

{

if(args.length < 2){

System.out.println("Usage: HbaseSelecter table key");

System.exit(-1);

}

 

System.out.println("Table: " + args[0] + " , key: " + args[1]);

selectRowKey(args[0], args[1]);

 

/*

System.out.println("------------------------行键 查询----------------------------------");

selectRowKey("b2c", "yihaodian1002865");

selectRowKey("b2c", "yihaodian1003396");

 

System.out.println("------------------------行键+列簇 查询----------------------------------");

selectRowKeyFamily("riapguh", "用户A", "user");

selectRowKeyFamily("riapguh", "用户B", "user");

 

System.out.println("------------------------行键+列簇+列名 查询----------------------------------");

selectRowKeyFamilyColumn("riapguh", "用户A", "user", "user_code");

selectRowKeyFamilyColumn("riapguh", "用户B", "user", "user_code");

 

System.out.println("------------------------条件 查询----------------------------------");

List<String> arr = new ArrayList<String>();

arr.add("dpt,dpt_code,d_001");

arr.add("user,user_code,u_0001");

selectFilter("riapguh", arr);

*/

}

}

Hbase 导出特定列 示例(小量数据):

/*

* Need Packages:

* commons-codec-1.4.jar

*

* commons-logging-1.1.1.jar

*

* hadoop-0.20.2-core.jar

*

* hbase-0.90.2.jar

*

* log4j-1.2.16.jar

*

* zookeeper-3.3.2.jar

*

* Example: javac -classpath ./:/data/chenzhenjing/code/panama/lib/hbase-0.90.2.jar:/data/chenzhenjing/code/panama/lib/hadoop-core-0.20-append-for-hbase.jar:/data/chenzhenjing/code/panama/lib/commons-logging-1.0.4.jar:/data/chenzhenjing/code/panama/lib/commons-lang-2.4.jar:/data/chenzhenjing/code/panama/lib/commons-io-1.2.jar:/data/chenzhenjing/code/panama/lib/zookeeper-3.3.2.jar:/data/chenzhenjing/code/panama/lib/log4j-1.2.15.jar:/data/chenzhenjing/code/panama/lib/commons-codec-1.3.jar DiffHbase.java

*/

 

import java.io.BufferedReader;

import java.io.File;

import java.io.IOException;

import java.io.FileInputStream;

import java.io.InputStreamReader;

import java.io.FileOutputStream;

import java.io.OutputStreamWriter;

import java.io.StringReader;

import java.text.SimpleDateFormat;

import java.util.Date;

 

import java.io.IOException;

import java.util.ArrayList;

import java.util.List;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.KeyValue;

import org.apache.hadoop.hbase.client.Get;

import org.apache.hadoop.hbase.client.HTable;

import org.apache.hadoop.hbase.client.Result;

import org.apache.hadoop.hbase.client.ResultScanner;

import org.apache.hadoop.hbase.client.Scan;

import org.apache.hadoop.hbase.filter.CompareFilter.CompareOp;

import org.apache.hadoop.hbase.filter.FilterList;

import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;

import org.apache.hadoop.hbase.util.Bytes;

 

class ColumnUtils {

 

public static byte[] getFamily(String column){

return getBytes(column, 0);

}

 

public static byte[] getQualifier(String column){

return getBytes(column, 1);

}

 

private static byte[] getBytes(String column , int offset){

String[] split = column.split(":");

return Bytes.toBytes(offset > split.length -1 ? split[0] :split[offset]);

}

}

 

public class DiffHbase

{

public static Configuration configuration = null;

static

{

configuration = HBaseConfiguration.create();

configuration.set("hbase.zookeeper.quorum", "idc01-hd-ds-01,idc01-hd-ds-02,idc01-hd-ds-03");

}

 

public static void selectRowKey(String tablename, String rowKey) throws IOException

{

HTable table = new HTable(configuration, tablename);

Get g = new Get(rowKey.getBytes());

Result rs = table.get(g);

 

for (KeyValue kv : rs.raw())

{

System.out.println("--------------------" + new String(kv.getRow()) + "----------------------------");

System.out.println("Column Family: " + new String(kv.getFamily()));

System.out.println("Column :" + new String(kv.getQualifier()) + "t");

System.out.println("value : " + new String(kv.getValue()));

}

}

 

public static void selectRowKeyFamily(String tablename, String rowKey, String family) throws IOException

{

HTable table = new HTable(configuration, tablename);

Get g = new Get(rowKey.getBytes());

g.addFamily(Bytes.toBytes(family));

Result rs = table.get(g);

for (KeyValue kv : rs.raw())

{

System.out.println("--------------------" + new String(kv.getRow()) + "----------------------------");

System.out.println("Column Family: " + new String(kv.getFamily()));

System.out.println("Column :" + new String(kv.getQualifier()) + "t");

System.out.println("value : " + new String(kv.getValue()));

}

}

 

public static void selectRowKeyFamilyColumn(String tablename, String rowKey, String family, String column)

throws IOException

{

HTable table = new HTable(configuration, tablename);

Get g = new Get(rowKey.getBytes());

g.addColumn(family.getBytes(), column.getBytes());

 

Result rs = table.get(g);

 

for (KeyValue kv : rs.raw())

{

System.out.println("--------------------" + new String(kv.getRow()) + "----------------------------");

System.out.println("Column Family: " + new String(kv.getFamily()));

System.out.println("Column :" + new String(kv.getQualifier()) + "t");

System.out.println("value : " + new String(kv.getValue()));

}

}

 

 

 

private static final String USAGE = "Usage: DiffHbase [-o outfile] tablename infile filterColumns...";

 

/**

* Prints the usage message and exists the program.

*

* @param message The message to print first.

*/

private static void printUsage(String message) {

System.err.println(message);

System.err.println(USAGE);

throw new RuntimeException(USAGE);

}

 

private static void PrintId(String id, Result rs){

String value = Bytes.toString( rs.getValue(ColumnUtils.getFamily("info:url"), ColumnUtils.getQualifier("info:url")));

if(value == null){

System.out.println( id + "\tNULL");

}else{

System.out.println( id + "\t" + value);

}

}

 

private static void WriteId(String id, Result rs, FileOutputStream os){

String value = Bytes.toString( rs.getValue(ColumnUtils.getFamily("info:url"), ColumnUtils.getQualifier("info:url")));

try{

if(value == null){

os.write( (id + "\tNULL\n").getBytes());

}else{

os.write( (id + "\t" + value + "\n").getBytes());

}

}

catch (IOException e) {

e.printStackTrace();

}

}

 

private static void PrintRow(String id, Result rs){

 

System.out.println("--------------------" + id + "----------------------------");

for (KeyValue kv : rs.raw())

{

System.out.println(new String(kv.getFamily()) + ":" + new String(kv.getQualifier()) + " : " + new String(kv.getValue()));

}

}

 

public static void main(String[] args) throws Exception

{

if (args.length < 3) {

printUsage("Too few arguments");

}

 

String outfile = null;

String tablename = args[0];

String dictfile = args[1];

int skilLen = 2;

 

if( args[0].equals("-o")){

outfile = args[1];

tablename = args[2];

dictfile = args[3];

skilLen = 4;

}

 

HTable table = new HTable(configuration, tablename);

 

String[] filterColumns = new String[args.length - skilLen];

System.arraycopy(args, skilLen, filterColumns, 0, args.length - skilLen);

 

System.out.println("filterColumns: ");

for(int i=0; i<filterColumns.length; ++i){

System.out.println("\t" + filterColumns[i]);

}

 

FileOutputStream os = null;

if(outfile != null){

os = new FileOutputStream(outfile);

}

 

int count = 0;

SimpleDateFormat df = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");//设置日期格式

 

File srcFile = new File(dictfile);

FileInputStream in = new FileInputStream(srcFile);

InputStreamReader isr = new InputStreamReader(in);

BufferedReader br = new BufferedReader(isr);

String read = null;

while ((read = br.readLine()) != null) {

String[] split = read.trim().split("\\s"); // space split

if( split.length < 1 ){

System.out.println("Error line: " + read);

continue;

}

 

if( ++count % 1000 == 0){

System.out.println(df.format(new Date()) + " : " + count + " rows processed." ); // new Date()为获取当前系统时间

}

// System.out.println("ROWKEY:" + split[0]);

 

Get g = new Get(split[0].getBytes());

Result rs = table.get(g);

if( rs == null){

System.out.println("No Result for " + split[0]);

continue;

}

 

for(int i=0; i<filterColumns.length; ++i){

String value = Bytes.toString(rs.getValue(ColumnUtils.getFamily(filterColumns[i]), ColumnUtils.getQualifier(filterColumns[i])));

if(value == null){

if( os == null){

PrintId(split[0], rs);

}else{

WriteId(split[0], rs, os);

}

 

// PrintRow(split[0], rs);

break;

}

}

}

 

br.close();

isr.close();

in.close();

 

}

}

Hbase Mapreduce示例:全库扫描(大量数据):

package com.hbase.mapreduce;

 

import java.io.File;

import java.io.FileInputStream;

import java.io.IOException;

import java.util.ArrayList;

import java.util.List;

 

import org.apache.hadoop.io.Text;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.HConstants;

import org.apache.hadoop.hbase.client.Scan;

import org.apache.hadoop.hbase.mapreduce.IdentityTableMapper;

import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;

import org.apache.hadoop.hbase.io.ImmutableBytesWritable;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.mapred.JobConf;

import org.apache.hadoop.util.GenericOptionsParser;

 

import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;

import org.apache.hadoop.hbase.filter.CompareFilter;

import org.apache.hadoop.hbase.filter.CompareFilter.CompareOp;

import org.apache.hadoop.hbase.filter.BinaryComparator;

import org.apache.hadoop.hbase.util.Bytes;

 

import com.goodhope.utils.ColumnUtils;

 

public class ExportHbase {

private static final String INFOCATEGORY = "info:storecategory";

 

private static final String USAGE = "Usage: ExportHbase " +

"-r <numReduceTasks> -indexConf <iconfFile>\n" +

"-indexDir <indexDir> -webSite <amazon> [-needupdate <true> -isVisible -startTime <long>] -table <tableName> -columns <columnName1> " +

"[<columnName2> ...]";

 

/**

* Prints the usage message and exists the program.

*

* @param message The message to print first.

*/

private static void printUsage(String message) {

System.err.println(message);

System.err.println(USAGE);

throw new RuntimeException(USAGE);

}

 

/**

* Creates a new job.

* @param conf

*

* @param args The command line arguments.

* @throws IOException When reading the configuration fails.

*/

public static Job createSubmittableJob(Configuration conf, String[] args)

throws IOException {

if (args.length < 7) {

printUsage("Too few arguments");

}

 

int numReduceTasks = 1;

String iconfFile = null;

String indexDir = null;

String tableName = null;

String website = null;

String needupdate = "";

String expectShopGrade = "";

String dino = "6";

String isdebug = "0";

long debugThreshold = 10000;

String debugThresholdStr = Long.toString(debugThreshold);

String queue = "offline";

 

long endTime = Long.MAX_VALUE;

int maxversions = 1;

long startTime = System.currentTimeMillis() - 28*24*60*60*1000l;

long distartTime = System.currentTimeMillis() - 30*24*60*60*1000l;

long diusedTime = System.currentTimeMillis() - 30*24*60*60*1000l;

String startTimeStr = Long.toString(startTime);

String diusedTimeStr = Long.toString(diusedTime);

String quorum = null;

 

String isVisible = "";

List<String> columns = new ArrayList<String>() ;

 

boolean bFilter = false;

 

// parse args

for (int i = 0; i < args.length - 1; i++) {

if ("-r".equals(args[i])) {

numReduceTasks = Integer.parseInt(args[++i]);

} else if ("-indexConf".equals(args[i])) {

iconfFile = args[++i];

} else if ("-indexDir".equals(args[i])) {

indexDir = args[++i];

} else if ("-table".equals(args[i])) {

tableName = args[++i];

} else if ("-webSite".equals(args[i])) {

website = args[++i];

} else if ("-startTime".equals(args[i])) {

startTimeStr = args[++i];

startTime = Long.parseLong(startTimeStr);

} else if ("-needupdate".equals(args[i])) {

needupdate = args[++i];

} else if ("-isVisible".equals(args[i])) {

isVisible = "true";

} else if ("-shopgrade".equals(args[i])) {

expectShopGrade = args[++i];

} else if ("-queue".equals(args[i])) {

queue = args[++i];

} else if ("-dino".equals(args[i])) {

dino = args[++i];

} else if ("-maxversions".equals(args[i])) {

maxversions = Integer.parseInt(args[++i]);

} else if ("-distartTime".equals(args[i])) {

distartTime = Long.parseLong(args[++i]);

} else if ("-diendTime".equals(args[i])) {

endTime = Long.parseLong(args[++i]);

} else if ("-diusedTime".equals(args[i])) {

diusedTimeStr = args[++i];

diusedTime = Long.parseLong(diusedTimeStr);

} else if ("-quorum".equals(args[i])) {

quorum = args[++i];

} else if ("-filter".equals(args[i])) {

bFilter = true;

} else if ("-columns".equals(args[i])) {

columns.add(args[++i]);

while (i + 1 < args.length && !args[i + 1].startsWith("-")) {

String columnname = args[++i];

columns.add(columnname);

System.out.println("args column----: " + columnname);

}

} else if ("-debugThreshold".equals(args[i])) {

isdebug = "1";

debugThresholdStr = args[++i];

debugThreshold = Long.parseLong( debugThresholdStr );

}

else {

printUsage("Unsupported option " + args[i]);

}

}

 

if (distartTime > endTime) {

printUsage("distartTime must <= diendTime");

}

 

if (indexDir == null || tableName == null || columns.isEmpty()) {

printUsage("Index directory, table name and at least one column must " +

"be specified");

}

 

if (iconfFile != null) {

// set index configuration content from a file

String content = readContent(iconfFile);

conf.set("hbase.index.conf", content);

conf.set("hbase.website.name", website);

conf.set("hbase.needupdate.productDB", needupdate);

conf.set("hbase.expect.shopgrade", expectShopGrade);

conf.set("hbase.di.no", dino);

conf.set("hbase.expect.item.visible", isVisible);

conf.set("hbase.index.startTime", startTimeStr);

conf.set("hbase.index.diusedTime", diusedTimeStr);

conf.set("hbase.index.debugThreshold", debugThresholdStr);

conf.set("hbase.index.debug", isdebug);

if (quorum != null) {

conf.set("hbase.zookeeper.quorum", quorum);

}

String temp = "";

for (String column : columns) {

temp = temp + column + "|";

}

temp = temp.substring(0, temp.length() - 1);

conf.set("hbase.index.column", temp);

System.out.println("hbase.index.column: " + temp);

}

 

 

Job job = new Job(conf, "export data from table " + tableName);

((JobConf) job.getConfiguration()).setQueueName(queue);

 

// number of indexes to partition into

job.setNumReduceTasks(numReduceTasks);

Scan scan = new Scan();

scan.setCacheBlocks(false);

 

// limit scan range

scan.setTimeRange(distartTime, endTime);

// scan.setMaxVersions(maxversions);

scan.setMaxVersions(1);

 

/* limit scan columns */

for (String column : columns) {

scan.addColumn(ColumnUtils.getFamily(column), ColumnUtils.getQualifier(column));

scan.addFamily(ColumnUtils.getFamily(column));

}

 

// set filter

if( bFilter ){

System.out.println("only export guangtaobao data. ");

SingleColumnValueFilter filter = new SingleColumnValueFilter(

Bytes.toBytes("info"),

Bytes.toBytes("producttype"),

CompareFilter.CompareOp.EQUAL,

new BinaryComparator(Bytes.toBytes("guangtaobao")) );

filter.setFilterIfMissing(true);

scan.setFilter(filter);

}

 

TableMapReduceUtil.initTableMapperJob(tableName, scan, ExportHbaseMapper.class,

Text.class, Text.class, job);

// job.setReducerClass(ExportHbaseReducer.class);

FileOutputFormat.setOutputPath(job, new Path(indexDir));

 

 

return job;

}

 

/**

* Reads xml file of indexing configurations. The xml format is similar to

* hbase-default.xml and hadoop-default.xml. For an example configuration,

* see the <code>createIndexConfContent</code> method in TestTableIndex.

*

* @param fileName The file to read.

* @return XML configuration read from file.

* @throws IOException When the XML is broken.

*/

private static String readContent(String fileName) throws IOException {

File file = new File(fileName);

int length = (int) file.length();

if (length == 0) {

printUsage("Index configuration file " + fileName + " does not exist");

}

 

int bytesRead = 0;

byte[] bytes = new byte[length];

FileInputStream fis = new FileInputStream(file);

 

try {

// read entire file into content

while (bytesRead < length) {

int read = fis.read(bytes, bytesRead, length - bytesRead);

if (read > 0) {

bytesRead += read;

} else {

break;

}

}

} finally {

fis.close();

}

 

return new String(bytes, 0, bytesRead, HConstants.UTF8_ENCODING);

}

 

/**

* The main entry point.

*

* @param args The command line arguments.

* @throws Exception When running the job fails.

*/

public static void main(String[] args) throws Exception {

Configuration conf = HBaseConfiguration.create();

String[] otherArgs =

new GenericOptionsParser(conf, args).getRemainingArgs();

Job job = createSubmittableJob(conf, otherArgs);

System.exit(job.waitForCompletion(true) ? 0 : 1);

}

 

}

 

//

 

package com.hbase.mapreduce;

 

import java.io.IOException;

import java.util.List;

import java.util.ArrayList;

import java.lang.String;

import java.lang.StringBuffer;

 

import org.apache.hadoop.io.Text;

import org.apache.hadoop.conf.Configurable;

import org.apache.hadoop.conf.Configuration;

import org.apache.commons.lang.StringUtils;

import org.apache.hadoop.hbase.client.Result;

import org.apache.hadoop.hbase.io.ImmutableBytesWritable;

import org.apache.hadoop.hbase.mapreduce.TableMapper;

import org.apache.hadoop.hbase.util.Bytes;

import org.apache.hadoop.hbase.KeyValue;

 

import com.goodhope.utils.ColumnUtils;

 

 

/**

* Pass the given key and record as-is to the reduce phase.

*/

@SuppressWarnings("deprecation")

public class ExportHbaseMapper extends TableMapper<Text,Text> implements Configurable {

private static final Text keyTEXT = new Text();

private static final Text SENDTEXT = new Text();

 

private Configuration conf = null;

 

private long startTime = 0;

List<String> columnMap = null;

 

private long rCount = 0;

private long errCount = 0;

private int debug = 0;

private long thresCount = 10000;

 

public void map(ImmutableBytesWritable key, Result value, Context context) throws IOException, InterruptedException {

 

rCount++;

 

String itemid = Bytes.toString(key.get());

if (itemid.contains("&")) {

context.getCounter("Error", "rowkey contains \"&\"").increment(1);

return;

}

 

StringBuffer outstr = new StringBuffer();

for (String col : columnMap) {

 

String tmp = Bytes.toString(value.getValue(ColumnUtils.getFamily(col), ColumnUtils.getQualifier(col)));

if (tmp == null){

context.getCounter("Error", col+" No value in hbase").increment(1);

 

errCount++;

if( debug > 0 && (errCount % thresCount == 0)){

System.err.println( itemid + ": doesn't has " + col + " data!");

}

 

outstr.append("NULL" + "\t");

}else{

if( tmp.contains("guangtaobao") ){

outstr.append("1" + "\t");

}else{

outstr.append(tmp.trim() + "\t");

}

}

}

 

if ( ! outstr.toString().isEmpty() ) {

 

SENDTEXT.set( outstr.toString() );

keyTEXT.set(itemid);

context.write(keyTEXT, SENDTEXT);

 

if( debug > 0 && (rCount % thresCount*10000 == 0)){

System.out.println( SENDTEXT.toString() + keyTEXT.toString() );

}

}

else

{

context.getCounter("Error", "No Colume output").increment(1);

return;

}

}

 

/**

* Returns the current configuration.

*

* @return The current configuration.

* @see org.apache.hadoop.conf.Configurable#getConf()

*/

@Override

public Configuration getConf() {

return conf;

}

 

/**

* Sets the configuration. This is used to set up the index configuration.

*

* @param configuration

* The configuration to set.

* @see org.apache.hadoop.conf.Configurable#setConf(org.apache.hadoop.conf.Configuration)

*/

@Override

public void setConf(Configuration configuration) {

this.conf = configuration;

 

startTime = Long.parseLong(conf.get("hbase.index.startTime"));

thresCount = Long.parseLong(conf.get("hbase.index.debugThreshold"));

debug = Integer.parseInt(conf.get("hbase.index.debug"));

 

String[] columns = conf.get("hbase.index.column").split("\\|");

 

columnMap = new ArrayList<String>();

for (String column : columns) {

System.out.println("Output column: " + column);

 

columnMap.add(column);

}

 

}

 

}

 

 

//

 

package com.hbase.utils;

 

import org.apache.hadoop.hbase.util.Bytes;

 

public class ColumnUtils {

 

public static byte[] getFamily(String column){

return getBytes(column, 0);

}

 

public static byte[] getQualifier(String column){

return getBytes(column, 1);

}

 

private static byte[] getBytes(String column , int offset){

String[] split = column.split(":");

return Bytes.toBytes(offset > split.length -1 ? split[0] :split[offset]);

}

}

转载于:https://www.cnblogs.com/BetterThanEver_Victor/p/7242699.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值