先丢个官网链接
本章博客依赖官网
第一步,先放依赖:
//为了防止冲突,我们用exclusion将hadoopClient和hadoopAuth排除
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-hbase</artifactId>
<version>${storm.version}</version>
<exclusions>
<exclusion>
<groupId>com.googla.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>16.0.1</version>
</dependency>
同时也要在上面的hadoop-client上添加排除代码
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop-client}</version>
<exclusions>
<exclusion>
<groupId>com.googla.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
</dependency>
启动habse,先添加一张表,表名"wc",列族“cf”
create "wc","cf"
在这里我们只需要编写一个spout接收数据,和一个bolt进行wordcount数据即可,逻辑与之前Storm整合Redis一致,这边直接贴代码
编写第一个spout
public static class DataSourceSpout extends BaseRichSpout {
//由于需要发送数据,所以定义一个SpoutOutputCollector
private SpoutOutputCollector collector;
@Override
public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
//初始化SpoutOutputCollector
this.collector = collector;
}
//这边我们自定义一个数组
public static final String[] words = new String[]{"apple","banana","orange","strawberry",};
@Override
public void nextTuple() {
//我们通过随机读取数组的方式生成数据进行测试
Random random = new Random();
String word = words[random.nextInt(words.length)];
//这边为了防止刷屏,我们设定sleep一秒
Utils.sleep(1000);
//发送数据
this.collector.emit(new Values(word));
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
//声明输出字段,和上面发送的数据相对应
declarer.declare(new Fields("word"));
}
}
编写下一个bolt
public static class CountWords extends BaseRichBolt{
//由于需要发送数据,所以定义一个SpoutOutputCollector
private OutputCollector collector;
@Override
public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
//初始化SpoutOutputCollector
this.collector = collector;
}
//定义一个map存放数据
Map<String,Integer> map = new HashMap<>();
@Override
public void execute(Tuple input) {
//抓取上面发送过来的数据
String word = input.getStringByField("word");
Integer i = map.get(word);
if (i == null){
i = 0;
}
i++;
map.put(word,i);
//这边输出遗传字符串方便测试
System.out.println("emit : "+word + " "+map.get(word));
//将word和count发送出去
this.collector.emit(new Values(word,map.get(word)));
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
//注意这边发送的字段名必须与数据库的字段名一致
declarer.declare(new Fields("word","count"));
}
}
重点在于main方法的编写:
结合官网:
public static void main(String[] args) {
//创建一个config,用于存放hbase连接信息
Config config = new Config();
//这边新建一个map
Map<String, Object> map = new HashMap<>();
//这边网map添加两个数据,这个数据可以从hbaser-site.xml里去找
map.put("hbase.rootdi","hdfs://192.168.0.133:8020/hbase");
map.put("hbase.zookeeper.quorum","192.168.0.133:2181");
//将map放到config中去
config.put("hbase.conf",map);
//这边是官网参考代码
SimpleHBaseMapper mapper = new SimpleHBaseMapper()
//设置rowKey
.withRowKeyField("word")
//设置字段,名称必须与上面发送的相对应
.withColumnFields(new Fields("word"))
.withCounterFields(new Fields("count"))
//设置列族
.withColumnFamily("cf");
//这边第一个参数是表名
HBaseBolt hbase = new HBaseBolt("wc", mapper)
//这边必须设置ConfigKey,将上面map里的放进去就行
.withConfigKey("hbase.conf");
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("DataSourceSpout",new DataSourceSpout());
builder.setBolt("CountWords",new CountWords()).shuffleGrouping("DataSourceSpout");
builder.setBolt("hbase",hbase).shuffleGrouping("CountWords");
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("LocalWCStormHbaseTop",config,builder.createTopology());
}
运行查看hbase
hbase里数据一直在跟新,这样我们的Storm整合Hbase就完成了