Storm-HBase集成--配置和开发_strom读取hbase-CSDN博客

1 Storm0.9.3中的对HBase的集成

Storm新版本0.9.3中重新整理和加入了对Hbase的集成模块，除了基本的Bolt和Spout之外，加入了用于访问hbase的Trident。利用这个Trident，我们可以更加快速的编写Storm访问HBase的代码逻辑。

关于Storm-HBase模块中，几个主要的功能类如下：

类名	介绍
org.apache.storm.hbase.trident.mapper.TridentHBaseMapper	把HBase的Row Key, 列簇和列，对应到Storm Trident中Tuple的Field
org.apache.storm.hbase.trident.mapper.SimpleTridentHBaseMapper	上面的HBaseMapper的一个简单继承实现，指定Rowkey的Field和列簇/列的Field
org.apache.storm.hbase.bolt.mapper.HBaseValueMapper	用户继承这个类来实现把HBase的Cell映射成Storm中的Tuple，这个类通常是被继承后使用
org.apache.storm.hbase.trident.state.HBaseUpdater	更新HBaseState的类
org.apache.storm.hbase.trident.state.HBaseState	Trident中负责HBase数据的状态类
org.apache.storm.hbase.trident.state.HBaseStateFactory	工厂类，负责生产HBaseState对象
org.apache.storm.hbase.bolt.mapper.HBaseProjectionCriteria	负责定义HBase数据到Storm Tuple的投影，需要指定HBase的表名，列族名和列
org.apache.storm.hbase.security.HBaseSecurityUtil	专门用于让Storm通过HBase集群的Kerberos安全验证的类

2 代码示例

下面是一个完整的从Kafka中读取数据，并写入HBase的代码（Trident模式）。

[html] view plain copy print ?

// Storm Tuple中的两个Field，分别叫做word 和 count
Fields fields = new Fields("word", "count");
// 定义HBase配置相关和Kerberos相关
String hBaseConfigKey = "config_key";
System.setProperty("java.security.krb5.realm", "HADOOP.QIYI.COM");
System.setProperty("java.security.krb5.kdc", "kerberos-hadoop-dev001-shjj.qiyi.virtual");
//载入HBase和Kerberos相关配置，Config对象是来自backtype.storm.Config 类
Config conf = new Config();
conf.setDebug(true);
Map<String, String> hBaseConfigMap = new HashMap<String, String>();
hBaseConfigMap.put(HBaseSecurityUtil.STORM_KEYTAB_FILE_KEY, "/home/yeweichen/yeweichen.keytab");
hBaseConfigMap.put(HBaseSecurityUtil.STORM_USER_NAME_KEY, "yeweichen@HADOOP.QIYI.COM");
conf.put("config_key", hBaseConfigMap);
// 定义Trident拓扑，从Kafka中获取数据
TridentTopology tridentTopology = new TridentTopology();
BrokerHosts zk = new ZkHosts("10.121.43.14,10.121.43.17");
TridentKafkaConfig spoutConf = new TridentKafkaConfig(zk, "mytopic");
spoutConf.forceFromStart = true;
spoutConf.scheme = new SchemeAsMultiScheme(new StringScheme());
OpaqueTridentKafkaSpout spout = new OpaqueTridentKafkaSpout(spoutConf);
//定义HBase的Mapper，指定“word”字段的内容作为rowkey，列族名为cf
TridentHBaseMapper tridentHBaseMapper = new SimpleTridentHBaseMapper()
.withColumnFamily("cf")
.withColumnFields(new Fields("word"))
.withColumnFields(new Fields("count"))
.withRowKeyField("word");
// LogCollect就是自定义的Mapper
HBaseValueMapper rowToStormValueMapper = new LogCollectMapper();
//定义投影类，加入cf列族中的word和count两个列
HBaseProjectionCriteria projectionCriteria = new HBaseProjectionCriteria();
projectionCriteria.addColumn(new HBaseProjectionCriteria.ColumnMetaData("cf", "word"));
projectionCriteria.addColumn(new HBaseProjectionCriteria.ColumnMetaData("cf", "count"));
//定义HBaseState类的属性类Option
HBaseState.Options options = new HBaseState.Options()
.withConfigKey(hBaseConfigKey)
.withDurability(Durability.SYNC_WAL)
.withMapper(tridentHBaseMapper)
.withProjectionCriteria(projectionCriteria)
.withRowToStormValueMapper(rowToStormValueMapper)
.withTableName("storminput");
//使用工厂方法和Option生成HBaseState对象
StateFactory factory = new HBaseStateFactory(options);
//定义Stream，从Kafka中读出的数据，使用AddTimeFunction方法把它生成word和field两个字段，然后把他们写入HBase,如上面定义的，word字段作为row key
tridentTopology.newStream("myspout", spout).each(new Fields("str"), new AddTimeFunction(), new Fields("word", "count"))
.partitionPersist(factory, fields, new HBaseUpdater(), new Fields());
// 提交拓扑
StormSubmitter.submitTopology(args[0], conf,tridentTopology.build());