分布式系统HBase模块及Java API

最新推荐文章于 2024-04-30 16:18:39 发布

weixin_30470857

最新推荐文章于 2024-04-30 16:18:39 发布

阅读量57

点赞数

文章标签：大数据运维 ui

原文链接：http://www.cnblogs.com/BetterThanEver_Victor/p/7242699.html

版权

HBase是基于列存储的，每个列族都由几个文件保存，不同列族的文件是分离的

HBase只有一个索引——行键，通过巧妙的设计，HBase中的所有访问方法，或者通过行键访问，或者通过行键扫描，从而使得整个系统不会慢下来

HBase是一个稀疏、多维度、排序的映射表，这张表的索引是行键、列族、列限定符和时间戳，数据坐标HBase中需要根据行键、列族、列限定符和时间戳来确定一个单元格，因此，可以视为一个"四维坐标"，即[行键,列族, 列限定符,时间戳]

每个值是一个未经解释的字符串，没有数据类型

HBase实现原理

1、HBase的实现包括三个主要的功能组件：

（1）库函数：链接到每个客户端

（2）一个Master主服务器

（3）许多个Region服务器

(1、客户端包含访问HBase的接口，同时在缓存中维护着已经访问过的Region位置信息，用来加快后续数据访问过程

(2、Zookeeper可以帮助选举出一个Master作为集群的总管，并保证在任何时刻总有唯一一个Master在运行，这就避免了Master的"单点失效"问题

（Zookeeper是一个很好的集群管理工具，被大量用于分布式计算，提供配置维护、域名服务、分布式同步、组服务等。）

(3. Master

主服务器Master主要负责表和Region的管理工作：

管理用户对表的增加、删除、修改、查询等操作

实现不同Region服务器之间的负载均衡

在Region分裂或合并后，负责重新调整Region的分布

对发生故障失效的Region服务器上的Region进行迁移

(4. Region服务器

Region服务器是HBase中最核心的模块，负责维护分配给自己的Region，并响应用户的读写请求

(1、用户读写数据过程
用户写入数据时，被分配到相应Region服务器去执行

用户数据首先被写入到MemStore和Hlog中

只有当操作写入Hlog之后，commit()调用才会将其返回给客户端

当用户读取数据时，Region服务器会首先访问MemStore缓存，如果找不到，再去磁盘上面的StoreFile中寻找

(2、缓存的刷新

系统会周期性地把MemStore缓存里的内容刷写到磁盘的StoreFile文件中，清空缓存，并在Hlog里面写入一个标记、
每次刷写都生成一个新的StoreFile文件，因此，每个Store包含多个StoreFile文件

每个Region服务器都有一个自己的HLog文件，每次启动都检查该文件，确认最近一次执行缓存刷新操作之后是否发生新的写入操作；如果发现更新，则先写入MemStore，再刷写到StoreFile，最后删除旧的Hlog文件，开始为用户提供服务
(3、StroreFile的合并

每次刷写都生成一个新的StoreFile，数量太多，影响查找速度

调用Store.compact()把多个合并成一个

合并操作比较耗费资源，只有数量达到一个阈值才启动合并

3、Store工作原理

Store是Region服务器的核心

多个StoreFile合并成一个
触发分裂操作，1个父Region被分裂成两个子Region

单个StoreFile过大时，又

4、HLog工作原理

分布式环境必须要考虑系统出错。HBase采用HLog保证系统恢复

HBase系统为每个Region服务器配置了一个HLog文件，它是一种预写式日志（WriteAhead Log）

用户更新数据必须首先写入日志后，才能写入MemStore缓存，并且，直到MemStore缓存内容对应的日志已经写入磁盘，该缓存内容才能被刷写到磁盘

Zookeeper会实时监测每个Region服务器的状态，当某个Region服务器发生故障时，Zookeeper会通知Master

Master首先会处理该故障Region服务器上面遗留的HLog文件，这个遗留的HLog文件中包含了来自多个Region对象的日志记录

系统会根据每条日志记录所属的Region对象对HLog数据进行拆分，分别放到相应Region对象的目录下，然后，再将失效的Region重新分配到可用的Region服务器中，并把与该Region对象相关的HLog日志记录也发送给相应的Region服务器

Region服务器领取到分配给自己的Region对象以及与之相关的HLog日志记录以后，会重新做一遍日志记录中的各种操作，把日志记录中的数据写入到MemStore缓存中，然后，刷新到磁盘的StoreFile文件中，完成数据恢复

共用日志优点：提高对表的写操作性能；缺点：恢复时需要分拆日志

2、HBaseMaster默认基于Web的UI服务端口为60010，HBase region服务器默认基于Web的UI服务端口为60030.如果master运行在名为master.foo.com的主机中，mater的主页地址就是http://master.foo.com:60010，用户可以通过Web浏览器输入这个地址查看该页面

可以查看HBase集群的当前状态

HBase的版本：0.98.8-hadoop2

所需的依赖包：

commons-codec-1.7.jar

commons-collections-3.2.1.jar

commons-configuration-1.6.jar

commons-lang-2.6.jar

commons-logging-1.1.3.jar

guava-12.0.1.jar

hadoop-auth-2.5.0.jar

hadoop-common-2.5.0.jar

hbase-client-0.98.8-hadoop2.jar

hbase-common-0.98.8-hadoop2.jar

hbase-protocol-0.98.8-hadoop2.jar

htrace-core-2.04.jar

jackson-core-asl-1.9.13.jar

jackson-mapper-asl-1.9.13.jar

log4j-1.2.17.jar

mysql-connector-java-5.1.7-bin.jar

netty-3.6.6.Final.jar

protobuf-java-2.5.0.jar

slf4j-api-1.7.5.jar

slf4j-log4j12-1.7.5.jar

zookeeper-3.4.6.jar

首先我们要确保Web项目中已经把必要的Jar包添加到ClassPath了，下面我对一些HBase的连接做了小封装：

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.client.HBaseAdmin;

/**

* @author a01513

public class HBaseConnector {

private static final String QUOREM = "192.168.103.50,192.168.103.51,192.168.103.52,192.168.103.53,192.168.103.54,192.168.103.55,192.168.103.56,192.168.103.57,192.168.103.58,192.168.103.59,192.168.103.60";

private static final String CLIENT_PORT = "2181";

private HBaseAdmin admin;

private Configuration conf;

public HBaseAdmin getHBaseAdmin(){

getConfiguration();

try {

admin = new HBaseAdmin(conf);

} catch (Exception e) {

e.printStackTrace();

}

return admin;

}

public Configuration getConfiguration(){

if(conf == null){

conf = HBaseConfiguration.create();

conf.set("hbase.zookeeper.quorum", QUOREM);

conf.set("hbase.zookeeper.property.clientPort", CLIENT_PORT);

}

return conf;

}

HBase JavaAPI

HBaseConfiguration

关系：org.apache.hadoop.hbase.HBaseConfiguration

作用：对HBase进行配置

返回值	函数	描述
void	addResource(Path file)	通过给定的路径所指的文件来添加资源
void	clear()	清空所有已设置的属性
string	get(String name)	获取属性名对应的值
String	getBoolean(String name, boolean defaultValue)	获取为boolean类型的属性值，如果其属性值类型部位boolean,则返回默认属性值
void	set(String name, String value)	通过属性名来设置值
void	setBoolean(String name, boolean value)	设置boolean类型的属性值

用法示例：

HBaseConfiguration hconfig = new HBaseConfiguration();
hconfig.set("hbase.zookeeper.property.clientPort","2181");

HBaseAdmin

关系：org.apache.hadoop.hbase.client.HBaseAdmin

作用：提供了一个接口来管理HBase数据库的表信息。它提供的方法包括：创建表，删除表，列出表项，使表有效或无效，以及添加或删除表列族成员等。

返回值	函数	描述
void	addColumn(String tableName, HColumnDescriptor column)	向一个已经存在的表添加咧
	checkHBaseAvailable(HBaseConfiguration conf)	静态函数，查看HBase是否处于运行状态
	createTable(HTableDescriptor desc)	创建一个表，同步操作
	deleteTable(byte[] tableName)	删除一个已经存在的表
	enableTable(byte[] tableName)	使表处于有效状态
	disableTable(byte[] tableName)	使表处于无效状态
HTableDescriptor[]	listTables()	列出所有用户控件表项
void	modifyTable(byte[] tableName, HTableDescriptor htd)	修改表的模式，是异步的操作，可能需要花费一定的时间
boolean	tableExists(String tableName)	检查表是否存在

用法示例：

HBaseAdmin admin = new HBaseAdmin(config);
admin.disableTable("tablename")

HTableDescriptor

关系：org.apache.hadoop.hbase.HTableDescriptor

作用：包含了表的名字极其对应表的列族

返回值	函数	描述
void	addFamily(HColumnDescriptor)	添加一个列族
HColumnDescriptor	removeFamily(byte[] column)	移除一个列族
byte[]	getName()	获取表的名字
byte[]	getValue(byte[] key)	获取属性的值
void	setValue(String key, String value)	设置属性的值

用法示例：

HTableDescriptor htd = new HTableDescriptor(table);
htd.addFamily(new HcolumnDescriptor("family"));

HTableDescriptor

关系：org.apache.hadoop.hbase.HTableDescriptor

作用：包含了表的名字极其对应表的列族

返回值	函数	描述
void	addFamily(HColumnDescriptor)	添加一个列族
HColumnDescriptor	removeFamily(byte[] column)	移除一个列族
byte[]	getName()	获取表的名字
byte[]	getValue(byte[] key)	获取属性的值
void	setValue(String key, String value)	设置属性的值

用法示例：

HTableDescriptor htd = new HTableDescriptor(table);
htd.addFamily(new HcolumnDescriptor("family"));

HTable

关系：org.apache.hadoop.hbase.client.HTable

作用：可以用来和HBase表直接通信。此方法对于更新操作来说是非线程安全的。

返回值	函数	描述
void	checkAdnPut(byte[] row, byte[] family, byte[] qualifier, byte[] value, Put put	自动的检查row/family/qualifier是否与给定的值匹配
void	close()	释放所有的资源或挂起内部缓冲区中的更新
Boolean	exists(Get get)	检查Get实例所指定的值是否存在于HTable的列中
Result	get(Get get)	获取指定行的某些单元格所对应的值
byte[][]	getEndKeys()	获取当前一打开的表每个区域的结束键值
ResultScanner	getScanner(byte[] family)	获取当前给定列族的scanner实例
HTableDescriptor	getTableDescriptor()	获取当前表的HTableDescriptor实例
byte[]	getTableName()	获取表名
static boolean	isTableEnabled(HBaseConfiguration conf, String tableName)	检查表是否有效
void	put(Put put)	向表中添加值

用法示例：

HTable table = new HTable(conf, Bytes.toBytes(tablename));
ResultScanner scanner = table.getScanner(family);

Put

关系：org.apache.hadoop.hbase.client.Put

作用：用来对单个行执行添加操作

返回值	函数	描述
Put	add(byte[] family, byte[] qualifier, byte[] value)	将指定的列和对应的值添加到Put实例中
Put	add(byte[] family, byte[] qualifier, long ts, byte[] value)	将指定的列和对应的值及时间戳添加到Put实例中
byte[]	getRow()	获取Put实例的行
RowLock	getRowLock()	获取Put实例的行锁
long	getTimeStamp()	获取Put实例的时间戳
boolean	isEmpty()	检查familyMap是否为空
Put	setTimeStamp(long timeStamp)	设置Put实例的时间戳

用法示例：

HTable table = new HTable(conf,Bytes.toBytes(tablename));
Put p = new Put(brow);//为指定行创建一个Put操作
p.add(family,qualifier,value);
table.put(p);

Get

关系：org.apache.hadoop.hbase.client.Get

作用：用来获取单个行的相关信息

返回值	函数	描述
Get	addColumn(byte[] family, byte[] qualifier)	获取指定列族和列修饰符对应的列
Get	addFamily(byte[] family)	通过指定的列族获取其对应列的所有列
Get	setTimeRange(long minStamp,long maxStamp)	获取指定取件的列的版本号
Get	setFilter(Filter filter)	当执行Get操作时设置服务器端的过滤器

用法示例：

HTable table = new HTable(conf, Bytes.toBytes(tablename));
Get g = new Get(Bytes.toBytes(row));

Result

关系：org.apache.hadoop.hbase.client.Result

作用：存储Get或者Scan操作后获取表的单行值。使用此类提供的方法可以直接获取值或者各种Map结构（key-value对）

返回值	函数	描述
boolean	containsColumn(byte[] family, byte[] qualifier)	检查指定的列是否存在
NavigableMap<byte[],byte[]>	getFamilyMap(byte[] family)	获取对应列族所包含的修饰符与值的键值对
byte[]	getValue(byte[] family, byte[] qualifier)	获取对应列的最新值

ResultScanner

关系：Interface

作用：客户端获取值的接口

返回值	函数	描述
void	close()	关闭scanner并释放分配给它的资源
Result	next()	获取下一行的值

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.HColumnDescriptor;

import org.apache.hadoop.hbase.HTableDescriptor;

import org.apache.hadoop.hbase.KeyValue;

import org.apache.hadoop.hbase.client.Delete;

import org.apache.hadoop.hbase.client.Get;

import org.apache.hadoop.hbase.client.HBaseAdmin;

import org.apache.hadoop.hbase.client.HTable;

import org.apache.hadoop.hbase.client.HTablePool;

import org.apache.hadoop.hbase.client.Put;

import org.apache.hadoop.hbase.client.Result;

import org.apache.hadoop.hbase.client.ResultScanner;

import org.apache.hadoop.hbase.client.Scan;

import org.apache.hadoop.hbase.util.Bytes;

public class Hbase {

// 声明静态配置

static Configuration conf = null;

static {

conf = HBaseConfiguration.create();

conf.set("hbase.zookeeper.quorum", "localhost");

}

* 创建表

* @tableName 表名

* @family 列族列表

public static void creatTable(String tableName, String[] family)

throws Exception {

HBaseAdmin admin = new HBaseAdmin(conf);

HTableDescriptor desc = new HTableDescriptor(tableName);

for (int i = 0; i < family.length; i++) {

desc.addFamily(new HColumnDescriptor(family[i]));

}

if (admin.tableExists(tableName)) {

System.out.println("table Exists!");

System.exit(0);

} else {

admin.createTable(desc);

System.out.println("create table Success!");

}

* 为表添加数据（适合知道有多少列族的固定表）

* @rowKey rowKey

* @tableName 表名

* @column1 第一个列族列表

* @value1 第一个列的值的列表

* @column2 第二个列族列表

* @value2 第二个列的值的列表

public static void addData(String rowKey, String tableName,

String[] column1, String[] value1, String[] column2, String[] value2)

throws IOException {

Put put = new Put(Bytes.toBytes(rowKey));// 设置rowkey

HTable table = new HTable(conf, Bytes.toBytes(tableName));// HTabel负责跟记录相关的操作如增删改查等//

// 获取表

HColumnDescriptor[] columnFamilies = table.getTableDescriptor() // 获取所有的列族

.getColumnFamilies();

for (int i = 0; i < columnFamilies.length; i++) {

String familyName = columnFamilies[i].getNameAsString(); // 获取列族名

if (familyName.equals("article")) { // article列族put数据

for (int j = 0; j < column1.length; j++) {

put.add(Bytes.toBytes(familyName),

Bytes.toBytes(column1[j]), Bytes.toBytes(value1[j]));

}

if (familyName.equals("author")) { // author列族put数据

for (int j = 0; j < column2.length; j++) {

put.add(Bytes.toBytes(familyName),

Bytes.toBytes(column2[j]), Bytes.toBytes(value2[j]));

}

table.put(put);

System.out.println("add data Success!");

}

* 根据rwokey查询

* @rowKey rowKey

* @tableName 表名

public static Result getResult(String tableName, String rowKey)

throws IOException {

Get get = new Get(Bytes.toBytes(rowKey));

HTable table = new HTable(conf, Bytes.toBytes(tableName));// 获取表

Result result = table.get(get);

for (KeyValue kv : result.list()) {

System.out.println("family:" + Bytes.toString(kv.getFamily()));

System.out

.println("qualifier:" + Bytes.toString(kv.getQualifier()));

System.out.println("value:" + Bytes.toString(kv.getValue()));

System.out.println("Timestamp:" + kv.getTimestamp());

System.out.println("-------------------------------------------");

}

return result;

}

* 遍历查询hbase表

* @tableName 表名

public static void getResultScann(String tableName) throws IOException {

Scan scan = new Scan();

ResultScanner rs = null;

HTable table = new HTable(conf, Bytes.toBytes(tableName));

try {

rs = table.getScanner(scan);

for (Result r : rs) {

for (KeyValue kv : r.list()) {

System.out.println("row:" + Bytes.toString(kv.getRow()));

System.out.println("family:"

+ Bytes.toString(kv.getFamily()));

System.out.println("qualifier:"

+ Bytes.toString(kv.getQualifier()));

System.out

.println("value:" + Bytes.toString(kv.getValue()));

System.out.println("timestamp:" + kv.getTimestamp());

System.out

.println("-------------------------------------------");

}

} finally {

rs.close();

}

* 遍历查询hbase表

* @tableName 表名

public static void getResultScann(String tableName, String start_rowkey,

String stop_rowkey) throws IOException {

Scan scan = new Scan();

scan.setStartRow(Bytes.toBytes(start_rowkey));

scan.setStopRow(Bytes.toBytes(stop_rowkey));

ResultScanner rs = null;

HTable table = new HTable(conf, Bytes.toBytes(tableName));

try {

rs = table.getScanner(scan);

for (Result r : rs) {

for (KeyValue kv : r.list()) {

System.out.println("row:" + Bytes.toString(kv.getRow()));

System.out.println("family:"

+ Bytes.toString(kv.getFamily()));

System.out.println("qualifier:"

+ Bytes.toString(kv.getQualifier()));

System.out

.println("value:" + Bytes.toString(kv.getValue()));

System.out.println("timestamp:" + kv.getTimestamp());

System.out

.println("-------------------------------------------");

}

} finally {

rs.close();

}

* 查询表中的某一列

* @tableName 表名

* @rowKey rowKey

public static void getResultByColumn(String tableName, String rowKey,

String familyName, String columnName) throws IOException {

HTable table = new HTable(conf, Bytes.toBytes(tableName));

Get get = new Get(Bytes.toBytes(rowKey));

get.addColumn(Bytes.toBytes(familyName), Bytes.toBytes(columnName)); // 获取指定列族和列修饰符对应的列

Result result = table.get(get);

for (KeyValue kv : result.list()) {

System.out.println("family:" + Bytes.toString(kv.getFamily()));

System.out

.println("qualifier:" + Bytes.toString(kv.getQualifier()));

System.out.println("value:" + Bytes.toString(kv.getValue()));

System.out.println("Timestamp:" + kv.getTimestamp());

System.out.println("-------------------------------------------");

}

* 更新表中的某一列

* @tableName 表名

* @rowKey rowKey

* @familyName 列族名

* @columnName 列名

* @value 更新后的值

public static void updateTable(String tableName, String rowKey,

String familyName, String columnName, String value)

throws IOException {

HTable table = new HTable(conf, Bytes.toBytes(tableName));

Put put = new Put(Bytes.toBytes(rowKey));

put.add(Bytes.toBytes(familyName), Bytes.toBytes(columnName),

Bytes.toBytes(value));

table.put(put);

System.out.println("update table Success!");

}

* 查询某列数据的多个版本

* @tableName 表名

* @rowKey rowKey

* @familyName 列族名

* @columnName 列名

public static void getResultByVersion(String tableName, String rowKey,

String familyName, String columnName) throws IOException {

HTable table = new HTable(conf, Bytes.toBytes(tableName));

Get get = new Get(Bytes.toBytes(rowKey));

get.addColumn(Bytes.toBytes(familyName), Bytes.toBytes(columnName));

get.setMaxVersions(5);

Result result = table.get(get);

for (KeyValue kv : result.list()) {

System.out.println("family:" + Bytes.toString(kv.getFamily()));

System.out

.println("qualifier:" + Bytes.toString(kv.getQualifier()));

System.out.println("value:" + Bytes.toString(kv.getValue()));

System.out.println("Timestamp:" + kv.getTimestamp());

System.out.println("-------------------------------------------");

}

* List<?> results = table.get(get).list(); Iterator<?> it =

* results.iterator(); while (it.hasNext()) {

* System.out.println(it.next().toString()); }

}

* 删除指定的列

* @tableName 表名

* @rowKey rowKey

* @familyName 列族名

* @columnName 列名

public static void deleteColumn(String tableName, String rowKey,

String falilyName, String columnName) throws IOException {

HTable table = new HTable(conf, Bytes.toBytes(tableName));

Delete deleteColumn = new Delete(Bytes.toBytes(rowKey));

deleteColumn.deleteColumns(Bytes.toBytes(falilyName),

Bytes.toBytes(columnName));

table.delete(deleteColumn);

System.out.println(falilyName + ":" + columnName + "is deleted!");

}

* 删除指定的列

* @tableName 表名

* @rowKey rowKey

public static void deleteAllColumn(String tableName, String rowKey)

throws IOException {

HTable table = new HTable(conf, Bytes.toBytes(tableName));

Delete deleteAll = new Delete(Bytes.toBytes(rowKey));

table.delete(deleteAll);

System.out.println("all columns are deleted!");

}

* 删除表

* @tableName 表名

public static void deleteTable(String tableName) throws IOException {

HBaseAdmin admin = new HBaseAdmin(conf);

admin.disableTable(tableName);

admin.deleteTable(tableName);

System.out.println(tableName + "is deleted!");

}

public static void main(String[] args) throws Exception {

// 创建表

String tableName = "blog2";

String[] family = { "article", "author" };

// creatTable(tableName, family);

// 为表添加数据

String[] column1 = { "title", "content", "tag" };

String[] value1 = {

"Head First HBase",

"HBase is the Hadoop database. Use it when you need random, realtime read/write access to your Big Data.",

"Hadoop,HBase,NoSQL" };

String[] column2 = { "name", "nickname" };

String[] value2 = { "nicholas", "lee" };

addData("rowkey1", "blog2", column1, value1, column2, value2);

addData("rowkey2", "blog2", column1, value1, column2, value2);

addData("rowkey3", "blog2", column1, value1, column2, value2);

// 遍历查询

getResultScann("blog2", "rowkey4", "rowkey5");

// 根据row key范围遍历查询

getResultScann("blog2", "rowkey4", "rowkey5");

// 查询

getResult("blog2", "rowkey1");

// 查询某一列的值

getResultByColumn("blog2", "rowkey1", "author", "name");

// 更新列

updateTable("blog2", "rowkey1", "author", "name", "bin");

// 查询某一列的值

getResultByColumn("blog2", "rowkey1", "author", "name");

// 查询某列的多版本

getResultByVersion("blog2", "rowkey1", "author", "name");

// 删除一列

deleteColumn("blog2", "rowkey1", "author", "nickname");

// 删除所有列

deleteAllColumn("blog2", "rowkey1");

// 删除表

deleteTable("blog2");

}

Hbase基本使用示例：

import java.io.IOException;

import java.util.ArrayList;

import java.util.List;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.HColumnDescriptor;

import org.apache.hadoop.hbase.HTableDescriptor;

import org.apache.hadoop.hbase.KeyValue;

import org.apache.hadoop.hbase.MasterNotRunningException;

import org.apache.hadoop.hbase.ZooKeeperConnectionException;

import org.apache.hadoop.hbase.client.Delete;

import org.apache.hadoop.hbase.client.Get;

import org.apache.hadoop.hbase.client.HBaseAdmin;

import org.apache.hadoop.hbase.client.HTable;

import org.apache.hadoop.hbase.client.HTablePool;

import org.apache.hadoop.hbase.client.Put;

import org.apache.hadoop.hbase.client.Result;

import org.apache.hadoop.hbase.client.ResultScanner;

import org.apache.hadoop.hbase.client.Scan;

import org.apache.hadoop.hbase.filter.Filter;

import org.apache.hadoop.hbase.filter.FilterList;

import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;

import org.apache.hadoop.hbase.filter.CompareFilter.CompareOp;

import org.apache.hadoop.hbase.util.Bytes;

public class HbaseTest {

public static Configuration configuration;

static {

configuration = HBaseConfiguration.create();

configuration.set("hbase.zookeeper.property.clientPort", "2181");

configuration.set("hbase.zookeeper.quorum", "192.168.1.100");

configuration.set("hbase.master", "192.168.1.100:600000");

}

public static void main(String[] args) {

// createTable("wujintao");

// insertData("wujintao");

// QueryAll("wujintao");

// QueryByCondition1("wujintao");

// QueryByCondition2("wujintao");

//QueryByCondition3("wujintao");

//deleteRow("wujintao","abcdef");

deleteByCondition("wujintao","abcdef");

}

public static void createTable(String tableName) {

System.out.println("start create table ......");

try {

HBaseAdmin hBaseAdmin = new HBaseAdmin(configuration);

if (hBaseAdmin.tableExists(tableName)) {// 如果存在要创建的表，那么先删除，再创建

hBaseAdmin.disableTable(tableName);

hBaseAdmin.deleteTable(tableName);

System.out.println(tableName + " is exist,detele....");

}

HTableDescriptor tableDescriptor = new HTableDescriptor(tableName);

tableDescriptor.addFamily(new HColumnDescriptor("column1"));

tableDescriptor.addFamily(new HColumnDescriptor("column2"));

tableDescriptor.addFamily(new HColumnDescriptor("column3"));

hBaseAdmin.createTable(tableDescriptor);

} catch (MasterNotRunningException e) {

e.printStackTrace();

} catch (ZooKeeperConnectionException e) {

e.printStackTrace();

} catch (IOException e) {

e.printStackTrace();

}

System.out.println("end create table ......");

}

public static void insertData(String tableName) {

System.out.println("start insert data ......");

HTablePool pool = new HTablePool(configuration, 1000);

HTable table = (HTable) pool.getTable(tableName);

Put put = new Put("112233bbbcccc".getBytes());// 一个PUT代表一行数据，再NEW一个PUT表示第二行数据,每行一个唯一的ROWKEY，此处rowkey为put构造方法中传入的值

put.add("column1".getBytes(), null, "aaa".getBytes());// 本行数据的第一列

put.add("column2".getBytes(), null, "bbb".getBytes());// 本行数据的第三列

put.add("column3".getBytes(), null, "ccc".getBytes());// 本行数据的第三列

try {

table.put(put);

} catch (IOException e) {

e.printStackTrace();

}

System.out.println("end insert data ......");

}

public static void dropTable(String tableName) {

try {

HBaseAdmin admin = new HBaseAdmin(configuration);

admin.disableTable(tableName);

admin.deleteTable(tableName);

} catch (MasterNotRunningException e) {

e.printStackTrace();

} catch (ZooKeeperConnectionException e) {

e.printStackTrace();

} catch (IOException e) {

e.printStackTrace();

}

public static void deleteRow(String tablename, String rowkey) {

try {

HTable table = new HTable(configuration, tablename);

List list = new ArrayList();

Delete d1 = new Delete(rowkey.getBytes());

list.add(d1);

table.delete(list);

System.out.println("删除行成功!");

} catch (IOException e) {

e.printStackTrace();

}

public static void deleteByCondition(String tablename, String rowkey) {

//目前还没有发现有效的API能够实现根据非rowkey的条件删除这个功能能，还有清空表全部数据的API操作

}

public static void QueryAll(String tableName) {

HTablePool pool = new HTablePool(configuration, 1000);

HTable table = (HTable) pool.getTable(tableName);

try {

ResultScanner rs = table.getScanner(new Scan());

for (Result r : rs) {

System.out.println("获得到rowkey:" + new String(r.getRow()));

for (KeyValue keyValue : r.raw()) {

System.out.println("列：" + new String(keyValue.getFamily())

+ "====值:" + new String(keyValue.getValue()));

}

} catch (IOException e) {

e.printStackTrace();

}

public static void QueryByCondition1(String tableName) {

HTablePool pool = new HTablePool(configuration, 1000);

HTable table = (HTable) pool.getTable(tableName);

try {

Get scan = new Get("abcdef".getBytes());// 根据rowkey查询

Result r = table.get(scan);

System.out.println("获得到rowkey:" + new String(r.getRow()));

for (KeyValue keyValue : r.raw()) {

System.out.println("列：" + new String(keyValue.getFamily())

+ "====值:" + new String(keyValue.getValue()));

}

} catch (IOException e) {

e.printStackTrace();

}

public static void QueryByCondition2(String tableName) {

try {

HTablePool pool = new HTablePool(configuration, 1000);

HTable table = (HTable) pool.getTable(tableName);

Filter filter = new SingleColumnValueFilter(Bytes

.toBytes("column1"), null, CompareOp.EQUAL, Bytes

.toBytes("aaa")); // 当列column1的值为aaa时进行查询

Scan s = new Scan();

s.setFilter(filter);

ResultScanner rs = table.getScanner(s);

for (Result r : rs) {

System.out.println("获得到rowkey:" + new String(r.getRow()));

for (KeyValue keyValue : r.raw()) {

System.out.println("列：" + new String(keyValue.getFamily())

+ "====值:" + new String(keyValue.getValue()));

}

} catch (Exception e) {

e.printStackTrace();

}

public static void QueryByCondition3(String tableName) {

try {

HTablePool pool = new HTablePool(configuration, 1000);

HTable table = (HTable) pool.getTable(tableName);

List<Filter> filters = new ArrayList<Filter>();

Filter filter1 = new SingleColumnValueFilter(Bytes

.toBytes("column1"), null, CompareOp.EQUAL, Bytes

.toBytes("aaa"));

filters.add(filter1);

Filter filter2 = new SingleColumnValueFilter(Bytes

.toBytes("column2"), null, CompareOp.EQUAL, Bytes

.toBytes("bbb"));

filters.add(filter2);

Filter filter3 = new SingleColumnValueFilter(Bytes

.toBytes("column3"), null, CompareOp.EQUAL, Bytes

.toBytes("ccc"));

filters.add(filter3);

FilterList filterList1 = new FilterList(filters);

Scan scan = new Scan();

scan.setFilter(filterList1);

ResultScanner rs = table.getScanner(scan);

for (Result r : rs) {

System.out.println("获得到rowkey:" + new String(r.getRow()));

for (KeyValue keyValue : r.raw()) {

System.out.println("列：" + new String(keyValue.getFamily())

+ "====值:" + new String(keyValue.getValue()));

}

rs.close();

} catch (Exception e) {

e.printStackTrace();

}

Hbase数据获取示例：

* Need Packages:

* commons-codec-1.4.jar

* commons-logging-1.1.1.jar

* hadoop-0.20.2-core.jar

* hbase-0.90.2.jar

* log4j-1.2.16.jar

* zookeeper-3.3.2.jar

import java.io.IOException;

import java.util.ArrayList;

import java.util.List;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.KeyValue;

import org.apache.hadoop.hbase.client.Get;

import org.apache.hadoop.hbase.client.HTable;

import org.apache.hadoop.hbase.client.Result;

import org.apache.hadoop.hbase.client.ResultScanner;

import org.apache.hadoop.hbase.client.Scan;

import org.apache.hadoop.hbase.filter.CompareFilter.CompareOp;

import org.apache.hadoop.hbase.filter.FilterList;

import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;

import org.apache.hadoop.hbase.util.Bytes;

public class HbaseSelecter

{

public static Configuration configuration = null;

static

{

configuration = HBaseConfiguration.create();

//configuration.set("hbase.master", "192.168.0.201:60000");

configuration.set("hbase.zookeeper.quorum", "idc01-hd-nd-03,idc01-hd-nd-04,idc01-hd-nd-05");

//configuration.set("hbase.zookeeper.property.clientPort", "2181");

}

public static void selectRowKey(String tablename, String rowKey) throws IOException

{

HTable table = new HTable(configuration, tablename);

Get g = new Get(rowKey.getBytes());

Result rs = table.get(g);

for (KeyValue kv : rs.raw())

{

System.out.println("--------------------" + new String(kv.getRow()) + "----------------------------");

System.out.println("Column Family: " + new String(kv.getFamily()));

System.out.println("Column :" + new String(kv.getQualifier()));

System.out.println("value : " + new String(kv.getValue()));

}

public static void selectRowKeyFamily(String tablename, String rowKey, String family) throws IOException

{

HTable table = new HTable(configuration, tablename);

Get g = new Get(rowKey.getBytes());

g.addFamily(Bytes.toBytes(family));

Result rs = table.get(g);

for (KeyValue kv : rs.raw())

{

System.out.println("--------------------" + new String(kv.getRow()) + "----------------------------");

System.out.println("Column Family: " + new String(kv.getFamily()));

System.out.println("Column :" + new String(kv.getQualifier()));

System.out.println("value : " + new String(kv.getValue()));

}

public static void selectRowKeyFamilyColumn(String tablename, String rowKey, String family, String column)

throws IOException

{

HTable table = new HTable(configuration, tablename);

Get g = new Get(rowKey.getBytes());

g.addColumn(family.getBytes(), column.getBytes());

Result rs = table.get(g);

for (KeyValue kv : rs.raw())

{

System.out.println("--------------------" + new String(kv.getRow()) + "----------------------------");

System.out.println("Column Family: " + new String(kv.getFamily()));

System.out.println("Column :" + new String(kv.getQualifier()));

System.out.println("value : " + new String(kv.getValue()));

}

public static void selectFilter(String tablename, List<String> arr) throws IOException

{

HTable table = new HTable(configuration, tablename);

Scan scan = new Scan();// 实例化一个遍历器

FilterList filterList = new FilterList(); // 过滤器List

for (String v : arr)

{ // 下标0为列簇，1为列名，3为条件

String[] wheres = v.split(",");

filterList.addFilter(new SingleColumnValueFilter(// 过滤器

wheres[0].getBytes(), wheres[1].getBytes(),

CompareOp.EQUAL,// 各个条件之间是" and "的关系

wheres[2].getBytes()));

}

scan.setFilter(filterList);

ResultScanner ResultScannerFilterList = table.getScanner(scan);

for (Result rs = ResultScannerFilterList.next(); rs != null; rs = ResultScannerFilterList.next())

{

for (KeyValue kv : rs.list())

{

System.out.println("--------------------" + new String(kv.getRow()) + "----------------------------");

System.out.println("Column Family: " + new String(kv.getFamily()));

System.out.println("Column :" + new String(kv.getQualifier()));

System.out.println("value : " + new String(kv.getValue()));

}

public static void main(String[] args) throws Exception

{

if(args.length < 2){

System.out.println("Usage: HbaseSelecter table key");

System.exit(-1);

}

System.out.println("Table: " + args[0] + " , key: " + args[1]);

selectRowKey(args[0], args[1]);

System.out.println("------------------------行键查询----------------------------------");

selectRowKey("b2c", "yihaodian1002865");

selectRowKey("b2c", "yihaodian1003396");

System.out.println("------------------------行键+列簇查询----------------------------------");

selectRowKeyFamily("riapguh", "用户A", "user");

selectRowKeyFamily("riapguh", "用户B", "user");

System.out.println("------------------------行键+列簇+列名查询----------------------------------");

selectRowKeyFamilyColumn("riapguh", "用户A", "user", "user_code");

selectRowKeyFamilyColumn("riapguh", "用户B", "user", "user_code");

System.out.println("------------------------条件查询----------------------------------");

List<String> arr = new ArrayList<String>();

arr.add("dpt,dpt_code,d_001");

arr.add("user,user_code,u_0001");

selectFilter("riapguh", arr);

}

Hbase 导出特定列示例(小量数据):

* Need Packages:

* commons-codec-1.4.jar

* commons-logging-1.1.1.jar

* hadoop-0.20.2-core.jar

* hbase-0.90.2.jar

* log4j-1.2.16.jar

* zookeeper-3.3.2.jar

* Example: javac -classpath ./:/data/chenzhenjing/code/panama/lib/hbase-0.90.2.jar:/data/chenzhenjing/code/panama/lib/hadoop-core-0.20-append-for-hbase.jar:/data/chenzhenjing/code/panama/lib/commons-logging-1.0.4.jar:/data/chenzhenjing/code/panama/lib/commons-lang-2.4.jar:/data/chenzhenjing/code/panama/lib/commons-io-1.2.jar:/data/chenzhenjing/code/panama/lib/zookeeper-3.3.2.jar:/data/chenzhenjing/code/panama/lib/log4j-1.2.15.jar:/data/chenzhenjing/code/panama/lib/commons-codec-1.3.jar DiffHbase.java

import java.io.BufferedReader;

import java.io.File;

import java.io.IOException;

import java.io.FileInputStream;

import java.io.InputStreamReader;

import java.io.FileOutputStream;

import java.io.OutputStreamWriter;

import java.io.StringReader;

import java.text.SimpleDateFormat;

import java.util.Date;

import java.io.IOException;

import java.util.ArrayList;

import java.util.List;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.KeyValue;

import org.apache.hadoop.hbase.client.Get;

import org.apache.hadoop.hbase.client.HTable;

import org.apache.hadoop.hbase.client.Result;

import org.apache.hadoop.hbase.client.ResultScanner;

import org.apache.hadoop.hbase.client.Scan;

import org.apache.hadoop.hbase.filter.CompareFilter.CompareOp;

import org.apache.hadoop.hbase.filter.FilterList;

import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;

import org.apache.hadoop.hbase.util.Bytes;

class ColumnUtils {

public static byte[] getFamily(String column){

return getBytes(column, 0);

}

public static byte[] getQualifier(String column){

return getBytes(column, 1);

}

private static byte[] getBytes(String column , int offset){

String[] split = column.split(":");

return Bytes.toBytes(offset > split.length -1 ? split[0] :split[offset]);

}

public class DiffHbase

{

public static Configuration configuration = null;

static

{

configuration = HBaseConfiguration.create();

configuration.set("hbase.zookeeper.quorum", "idc01-hd-ds-01,idc01-hd-ds-02,idc01-hd-ds-03");

}

public static void selectRowKey(String tablename, String rowKey) throws IOException

{

HTable table = new HTable(configuration, tablename);

Get g = new Get(rowKey.getBytes());

Result rs = table.get(g);

for (KeyValue kv : rs.raw())

{

System.out.println("--------------------" + new String(kv.getRow()) + "----------------------------");

System.out.println("Column Family: " + new String(kv.getFamily()));

System.out.println("Column :" + new String(kv.getQualifier()) + "t");

System.out.println("value : " + new String(kv.getValue()));

}

public static void selectRowKeyFamily(String tablename, String rowKey, String family) throws IOException

{

HTable table = new HTable(configuration, tablename);

Get g = new Get(rowKey.getBytes());

g.addFamily(Bytes.toBytes(family));

Result rs = table.get(g);

for (KeyValue kv : rs.raw())

{

System.out.println("--------------------" + new String(kv.getRow()) + "----------------------------");

System.out.println("Column Family: " + new String(kv.getFamily()));

System.out.println("Column :" + new String(kv.getQualifier()) + "t");

System.out.println("value : " + new String(kv.getValue()));

}

public static void selectRowKeyFamilyColumn(String tablename, String rowKey, String family, String column)

throws IOException

{

HTable table = new HTable(configuration, tablename);

Get g = new Get(rowKey.getBytes());

g.addColumn(family.getBytes(), column.getBytes());

Result rs = table.get(g);

for (KeyValue kv : rs.raw())

{

System.out.println("--------------------" + new String(kv.getRow()) + "----------------------------");

System.out.println("Column Family: " + new String(kv.getFamily()));

System.out.println("Column :" + new String(kv.getQualifier()) + "t");

System.out.println("value : " + new String(kv.getValue()));

}

private static final String USAGE = "Usage: DiffHbase [-o outfile] tablename infile filterColumns...";

/**

* Prints the usage message and exists the program.

* @param message The message to print first.

private static void printUsage(String message) {

System.err.println(message);

System.err.println(USAGE);

throw new RuntimeException(USAGE);

}

private static void PrintId(String id, Result rs){

String value = Bytes.toString( rs.getValue(ColumnUtils.getFamily("info:url"), ColumnUtils.getQualifier("info:url")));

if(value == null){

System.out.println( id + "\tNULL");

}else{

System.out.println( id + "\t" + value);

}

private static void WriteId(String id, Result rs, FileOutputStream os){

String value = Bytes.toString( rs.getValue(ColumnUtils.getFamily("info:url"), ColumnUtils.getQualifier("info:url")));

try{

if(value == null){

os.write( (id + "\tNULL\n").getBytes());

}else{

os.write( (id + "\t" + value + "\n").getBytes());

}

catch (IOException e) {

e.printStackTrace();

}

private static void PrintRow(String id, Result rs){

System.out.println("--------------------" + id + "----------------------------");

for (KeyValue kv : rs.raw())

{

System.out.println(new String(kv.getFamily()) + ":" + new String(kv.getQualifier()) + " : " + new String(kv.getValue()));

}

public static void main(String[] args) throws Exception

{

if (args.length < 3) {

printUsage("Too few arguments");

}

String outfile = null;

String tablename = args[0];

String dictfile = args[1];

int skilLen = 2;

if( args[0].equals("-o")){

outfile = args[1];

tablename = args[2];

dictfile = args[3];

skilLen = 4;

}

HTable table = new HTable(configuration, tablename);

String[] filterColumns = new String[args.length - skilLen];

System.arraycopy(args, skilLen, filterColumns, 0, args.length - skilLen);

System.out.println("filterColumns: ");

for(int i=0; i<filterColumns.length; ++i){

System.out.println("\t" + filterColumns[i]);

}

FileOutputStream os = null;

if(outfile != null){

os = new FileOutputStream(outfile);

}

int count = 0;

SimpleDateFormat df = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");//设置日期格式

File srcFile = new File(dictfile);

FileInputStream in = new FileInputStream(srcFile);

InputStreamReader isr = new InputStreamReader(in);

BufferedReader br = new BufferedReader(isr);

String read = null;

while ((read = br.readLine()) != null) {

String[] split = read.trim().split("\\s"); // space split

if( split.length < 1 ){

System.out.println("Error line: " + read);

continue;

}

if( ++count % 1000 == 0){

System.out.println(df.format(new Date()) + " : " + count + " rows processed." ); // new Date()为获取当前系统时间

}

// System.out.println("ROWKEY:" + split[0]);

Get g = new Get(split[0].getBytes());

Result rs = table.get(g);

if( rs == null){

System.out.println("No Result for " + split[0]);

continue;

}

for(int i=0; i<filterColumns.length; ++i){

String value = Bytes.toString(rs.getValue(ColumnUtils.getFamily(filterColumns[i]), ColumnUtils.getQualifier(filterColumns[i])));

if(value == null){

if( os == null){

PrintId(split[0], rs);

}else{

WriteId(split[0], rs, os);

}

// PrintRow(split[0], rs);

break;

}

br.close();

isr.close();

in.close();

}

Hbase Mapreduce示例：全库扫描(大量数据)：

package com.hbase.mapreduce;

import java.io.File;

import java.io.FileInputStream;

import java.io.IOException;

import java.util.ArrayList;

import java.util.List;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.HConstants;

import org.apache.hadoop.hbase.client.Scan;

import org.apache.hadoop.hbase.mapreduce.IdentityTableMapper;

import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;

import org.apache.hadoop.hbase.io.ImmutableBytesWritable;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.mapred.JobConf;

import org.apache.hadoop.util.GenericOptionsParser;

import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;

import org.apache.hadoop.hbase.filter.CompareFilter;

import org.apache.hadoop.hbase.filter.CompareFilter.CompareOp;

import org.apache.hadoop.hbase.filter.BinaryComparator;

import org.apache.hadoop.hbase.util.Bytes;

import com.goodhope.utils.ColumnUtils;

public class ExportHbase {

private static final String INFOCATEGORY = "info:storecategory";

private static final String USAGE = "Usage: ExportHbase " +

"-r <numReduceTasks> -indexConf <iconfFile>\n" +

"-indexDir <indexDir> -webSite <amazon> [-needupdate <true> -isVisible -startTime <long>] -table <tableName> -columns <columnName1> " +

"[<columnName2> ...]";

/**

* Prints the usage message and exists the program.

* @param message The message to print first.

private static void printUsage(String message) {

System.err.println(message);

System.err.println(USAGE);

throw new RuntimeException(USAGE);

}

/**

* Creates a new job.

* @param conf

* @param args The command line arguments.

* @throws IOException When reading the configuration fails.

public static Job createSubmittableJob(Configuration conf, String[] args)

throws IOException {

if (args.length < 7) {

printUsage("Too few arguments");

}

int numReduceTasks = 1;

String iconfFile = null;

String indexDir = null;

String tableName = null;

String website = null;

String needupdate = "";

String expectShopGrade = "";

String dino = "6";

String isdebug = "0";

long debugThreshold = 10000;

String debugThresholdStr = Long.toString(debugThreshold);

String queue = "offline";

long endTime = Long.MAX_VALUE;

int maxversions = 1;

long startTime = System.currentTimeMillis() - 28*24*60*60*1000l;

long distartTime = System.currentTimeMillis() - 30*24*60*60*1000l;

long diusedTime = System.currentTimeMillis() - 30*24*60*60*1000l;

String startTimeStr = Long.toString(startTime);

String diusedTimeStr = Long.toString(diusedTime);

String quorum = null;

String isVisible = "";

List<String> columns = new ArrayList<String>() ;

boolean bFilter = false;

// parse args

for (int i = 0; i < args.length - 1; i++) {

if ("-r".equals(args[i])) {

numReduceTasks = Integer.parseInt(args[++i]);

} else if ("-indexConf".equals(args[i])) {

iconfFile = args[++i];

} else if ("-indexDir".equals(args[i])) {

indexDir = args[++i];

} else if ("-table".equals(args[i])) {

tableName = args[++i];

} else if ("-webSite".equals(args[i])) {

website = args[++i];

} else if ("-startTime".equals(args[i])) {

startTimeStr = args[++i];

startTime = Long.parseLong(startTimeStr);

} else if ("-needupdate".equals(args[i])) {

needupdate = args[++i];

} else if ("-isVisible".equals(args[i])) {

isVisible = "true";

} else if ("-shopgrade".equals(args[i])) {

expectShopGrade = args[++i];

} else if ("-queue".equals(args[i])) {

queue = args[++i];

} else if ("-dino".equals(args[i])) {

dino = args[++i];

} else if ("-maxversions".equals(args[i])) {

maxversions = Integer.parseInt(args[++i]);

} else if ("-distartTime".equals(args[i])) {

distartTime = Long.parseLong(args[++i]);

} else if ("-diendTime".equals(args[i])) {

endTime = Long.parseLong(args[++i]);

} else if ("-diusedTime".equals(args[i])) {

diusedTimeStr = args[++i];

diusedTime = Long.parseLong(diusedTimeStr);

} else if ("-quorum".equals(args[i])) {

quorum = args[++i];

} else if ("-filter".equals(args[i])) {

bFilter = true;

} else if ("-columns".equals(args[i])) {

columns.add(args[++i]);

while (i + 1 < args.length && !args[i + 1].startsWith("-")) {

String columnname = args[++i];

columns.add(columnname);

System.out.println("args column----: " + columnname);

}

} else if ("-debugThreshold".equals(args[i])) {

isdebug = "1";

debugThresholdStr = args[++i];

debugThreshold = Long.parseLong( debugThresholdStr );

}

else {

printUsage("Unsupported option " + args[i]);

}

if (distartTime > endTime) {

printUsage("distartTime must <= diendTime");

}

if (indexDir == null || tableName == null || columns.isEmpty()) {

printUsage("Index directory, table name and at least one column must " +

"be specified");

}

if (iconfFile != null) {

// set index configuration content from a file

String content = readContent(iconfFile);

conf.set("hbase.index.conf", content);

conf.set("hbase.website.name", website);

conf.set("hbase.needupdate.productDB", needupdate);

conf.set("hbase.expect.shopgrade", expectShopGrade);

conf.set("hbase.di.no", dino);

conf.set("hbase.expect.item.visible", isVisible);

conf.set("hbase.index.startTime", startTimeStr);

conf.set("hbase.index.diusedTime", diusedTimeStr);

conf.set("hbase.index.debugThreshold", debugThresholdStr);

conf.set("hbase.index.debug", isdebug);

if (quorum != null) {

conf.set("hbase.zookeeper.quorum", quorum);

}

String temp = "";

for (String column : columns) {

temp = temp + column + "|";

}

temp = temp.substring(0, temp.length() - 1);

conf.set("hbase.index.column", temp);

System.out.println("hbase.index.column: " + temp);

}

Job job = new Job(conf, "export data from table " + tableName);

((JobConf) job.getConfiguration()).setQueueName(queue);

// number of indexes to partition into

job.setNumReduceTasks(numReduceTasks);

Scan scan = new Scan();

scan.setCacheBlocks(false);

// limit scan range

scan.setTimeRange(distartTime, endTime);

// scan.setMaxVersions(maxversions);

scan.setMaxVersions(1);

/* limit scan columns */

for (String column : columns) {

scan.addColumn(ColumnUtils.getFamily(column), ColumnUtils.getQualifier(column));

scan.addFamily(ColumnUtils.getFamily(column));

}

// set filter

if( bFilter ){

System.out.println("only export guangtaobao data. ");

SingleColumnValueFilter filter = new SingleColumnValueFilter(

Bytes.toBytes("info"),

Bytes.toBytes("producttype"),

CompareFilter.CompareOp.EQUAL,

new BinaryComparator(Bytes.toBytes("guangtaobao")) );

filter.setFilterIfMissing(true);

scan.setFilter(filter);

}

TableMapReduceUtil.initTableMapperJob(tableName, scan, ExportHbaseMapper.class,

Text.class, Text.class, job);

// job.setReducerClass(ExportHbaseReducer.class);

FileOutputFormat.setOutputPath(job, new Path(indexDir));

return job;

}

/**

* Reads xml file of indexing configurations. The xml format is similar to

* hbase-default.xml and hadoop-default.xml. For an example configuration,

* see the <code>createIndexConfContent</code> method in TestTableIndex.

* @param fileName The file to read.

* @return XML configuration read from file.

* @throws IOException When the XML is broken.

private static String readContent(String fileName) throws IOException {

File file = new File(fileName);

int length = (int) file.length();

if (length == 0) {

printUsage("Index configuration file " + fileName + " does not exist");

}

int bytesRead = 0;

byte[] bytes = new byte[length];

FileInputStream fis = new FileInputStream(file);

try {

// read entire file into content

while (bytesRead < length) {

int read = fis.read(bytes, bytesRead, length - bytesRead);

if (read > 0) {

bytesRead += read;

} else {

break;

}

} finally {

fis.close();

}

return new String(bytes, 0, bytesRead, HConstants.UTF8_ENCODING);

}

/**

* The main entry point.

* @param args The command line arguments.

* @throws Exception When running the job fails.

public static void main(String[] args) throws Exception {

Configuration conf = HBaseConfiguration.create();

String[] otherArgs =

new GenericOptionsParser(conf, args).getRemainingArgs();

Job job = createSubmittableJob(conf, otherArgs);

System.exit(job.waitForCompletion(true) ? 0 : 1);

}

package com.hbase.mapreduce;

import java.io.IOException;

import java.util.List;

import java.util.ArrayList;

import java.lang.String;

import java.lang.StringBuffer;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.conf.Configurable;

import org.apache.hadoop.conf.Configuration;

import org.apache.commons.lang.StringUtils;

import org.apache.hadoop.hbase.client.Result;

import org.apache.hadoop.hbase.io.ImmutableBytesWritable;

import org.apache.hadoop.hbase.mapreduce.TableMapper;

import org.apache.hadoop.hbase.util.Bytes;

import org.apache.hadoop.hbase.KeyValue;

import com.goodhope.utils.ColumnUtils;

/**

* Pass the given key and record as-is to the reduce phase.

@SuppressWarnings("deprecation")

public class ExportHbaseMapper extends TableMapper<Text,Text> implements Configurable {

private static final Text keyTEXT = new Text();

private static final Text SENDTEXT = new Text();

private Configuration conf = null;

private long startTime = 0;

List<String> columnMap = null;

private long rCount = 0;

private long errCount = 0;

private int debug = 0;

private long thresCount = 10000;

public void map(ImmutableBytesWritable key, Result value, Context context) throws IOException, InterruptedException {

rCount++;

String itemid = Bytes.toString(key.get());

if (itemid.contains("&")) {

context.getCounter("Error", "rowkey contains \"&\"").increment(1);

return;

}

StringBuffer outstr = new StringBuffer();

for (String col : columnMap) {

String tmp = Bytes.toString(value.getValue(ColumnUtils.getFamily(col), ColumnUtils.getQualifier(col)));

if (tmp == null){

context.getCounter("Error", col+" No value in hbase").increment(1);

errCount++;

if( debug > 0 && (errCount % thresCount == 0)){

System.err.println( itemid + ": doesn't has " + col + " data!");

}

outstr.append("NULL" + "\t");

}else{

if( tmp.contains("guangtaobao") ){

outstr.append("1" + "\t");

}else{

outstr.append(tmp.trim() + "\t");

}

if ( ! outstr.toString().isEmpty() ) {

SENDTEXT.set( outstr.toString() );

keyTEXT.set(itemid);

context.write(keyTEXT, SENDTEXT);

if( debug > 0 && (rCount % thresCount*10000 == 0)){

System.out.println( SENDTEXT.toString() + keyTEXT.toString() );

}

else

{

context.getCounter("Error", "No Colume output").increment(1);

return;

}

/**

* Returns the current configuration.

* @return The current configuration.

* @see org.apache.hadoop.conf.Configurable#getConf()

@Override

public Configuration getConf() {

return conf;

}

/**

* Sets the configuration. This is used to set up the index configuration.

* @param configuration

* The configuration to set.

* @see org.apache.hadoop.conf.Configurable#setConf(org.apache.hadoop.conf.Configuration)

@Override

public void setConf(Configuration configuration) {

this.conf = configuration;

startTime = Long.parseLong(conf.get("hbase.index.startTime"));

thresCount = Long.parseLong(conf.get("hbase.index.debugThreshold"));

debug = Integer.parseInt(conf.get("hbase.index.debug"));

String[] columns = conf.get("hbase.index.column").split("\\|");

columnMap = new ArrayList<String>();

for (String column : columns) {

System.out.println("Output column: " + column);

columnMap.add(column);

}

package com.hbase.utils;

import org.apache.hadoop.hbase.util.Bytes;

public class ColumnUtils {

public static byte[] getFamily(String column){

return getBytes(column, 0);

}

public static byte[] getQualifier(String column){

return getBytes(column, 1);

}

private static byte[] getBytes(String column , int offset){

String[] split = column.split(":");

return Bytes.toBytes(offset > split.length -1 ? split[0] :split[offset]);

}

转载于:https://www.cnblogs.com/BetterThanEver_Victor/p/7242699.html

weixin_30470857

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
分布式系统HBase模块及Java API

HBase是基于列存储的，每个列族都由几个文件保存，不同列族的文件是分离的HBase只有一个索引——行键，通过巧妙的设计，HBase中的所有访问方法，或者通过行键访问，或者通过行键扫描，从而使得整个系统不会慢下来HBase是一个稀疏、多维度、排序的映射表，这张表的索引是行键、列族、列限定符和时间戳，数据坐标HBase中需要根据行键、列族、列限定符和时间戳来确定一个单元格，因此，可以视为一个...
复制链接

扫一扫