今天来将上次的kafka tridentredis 结合的demo完成,回顾下上篇中的数据流程。好的我们先来完成trident部分。日志打印部分思路很清晰了,就是时间戳,城市名,病症id 的形式。
topology.newStream("kafkaspout",kafkaspout)
.each(newFields("str"),new Func1(),new Fields("obj"))
.each(newFields("obj"),new Func2(),new Fields("descripe"))
.groupBy(new Fields("descripe"))
.persistentAggregate(new MemoryMapState.Factory(),new Count(),new Fields("count"))
.newValuesStream()
.each(newFields("descripe","count"),newFunc3(),new Fields());
trident处理流程大概如上,kafka订阅过来内容字段为str,然后将这个字符串格式化为对象并作为obj字段发射,然后对obj字段追加descripe字段这个字段是精确到小时的时间+城市名+病症id组成的,例2017.07.20.10Beijing2。作用是用于分组。然后就是根据这个描述分组,并对每个组进行计数,每个组的数目刚好就代表了这一个小时内某个城市某个病症的发病数,然后继续往下到Func3,即检查cout是否超过一个门限值,如果超过认为病症爆发。
再次明确下功能Func1是字符串序列化为对象的,Func2是将部分字段整合成一个字符串描述的,Func3是判断是否超过门限的。
代码结构如下:
MyEvent是将字符串转换成的对象:
package Util;
import java.io.Serializable;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
/**
* Created by Frank on 2017/7/20.
*/
public class MyEvent implements Serializable{
private Date time;
private City city;
private int disease;
public MyEvent(String[]arr) throws ParseException {
SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss" );
this.time=sdf.parse(arr[0]);
this.city=City.valueOf(arr[1]);
this.disease= Integer.parseInt(arr[2]);
}
public Date getTime() {
return time;
}
public void setTime(Date time) {
this.time = time;
}
public City getCity() {
return city;
}
public void setCity(City city) {
this.city = city;
}
public int getDisease() {
return disease;
}
public void setDisease(int disease){
this.disease = disease;
}
}
City是个枚举,这里我只用了北上广三个城市:
package Util;
/**
* Created by Frank on 2017/7/20.
*/
public enum City{
Beijing,Shanghai,Guangzhou
}
Func1 字符串转MyEvent:
package Func;
import Util.MyEvent;
import org.apache.storm.trident.operation.BaseFunction;
import org.apache.storm.trident.operation.TridentCollector;
import org.apache.storm.trident.tuple.TridentTuple;
import java.util.ArrayList;
import java.util.List;
/**
* Created by Frank on 2017/7/20.
*/
public class Func1 extends BaseFunction{
public void execute(TridentTuple tuple, TridentCollector collector) {
String log=tuple.getStringByField("str");
System.out.println(log);
String[] arr=log.split(",");
try{
MyEvent myEvent = new MyEvent(arr);
List<Object> list=new ArrayList<>();
list.add(myEvent);
collector.emit(list);
}
catch(Exception e){
System.out.println("logconvert error");//转换失败就不继续发射了
}
}
}
Func2 整合字段为“描述”字符串:
package Func;
import Util.City;
import Util.MyEvent;
import org.apache.storm.trident.operation.BaseFunction;
import org.apache.storm.trident.operation.TridentCollector;
import org.apache.storm.trident.tuple.TridentTuple;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;
/**
* Created by Frank on 2017/7/20.
*/
public class Func2 extends BaseFunction{
public void execute(TridentTuple tuple, TridentCollector collector) {
MyEvent myEvent=(MyEvent)tuple.getValueByField("obj");
//这里使用了Date类的相关方法其实已经不推荐了推荐使用Calendar类
String hourstr =(myEvent.getTime().getYear()+1900)+"."
+(myEvent.getTime().getMonth()+1)+"."
+myEvent.getTime().getDate()+"."
+myEvent.getTime().getHours();
String citystr=myEvent.getCity().name();
String descripe=hourstr+citystr+myEvent.getDisease();
List<Object> list=new ArrayList<>();
list.add(descripe);
collector.emit(list);
}
}
Func3 门限预警(先打印,还没有结合redis)
package Func;
import Util.City;
import Util.MyEvent;
import org.apache.storm.trident.operation.BaseFunction;
import org.apache.storm.trident.operation.TridentCollector;
import org.apache.storm.trident.tuple.TridentTuple;
import java.util.ArrayList;
import java.util.List;
/**
* Created by Frank on 2017/7/20.
*/
public class Func3 extends BaseFunction{
private static final int BREAK_VALUE=10;//设置一小时超过十个病例就算疾病爆发
public void execute(TridentTuple tuple, TridentCollector collector) {
Long count =tuple.getLongByField("count");
Stringdescripe=tuple.getStringByField("descripe");
if(count>BREAK_VALUE){
System.out.println(descripe +" breakout");
}
}
}
Topology 拓扑部分:
package topology;
import Func.Func1;
import Func.Func2;
import Func.Func3;
import com.esotericsoftware.minlog.Log;
import kafka.api.OffsetRequest;
import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.generated.StormTopology;
import org.apache.storm.kafka.*;
import org.apache.storm.kafka.trident.GlobalPartitionInformation;
import org.apache.storm.kafka.trident.OpaqueTridentKafkaSpout;
import org.apache.storm.kafka.trident.TridentKafkaConfig;
import org.apache.storm.spout.MultiScheme;
import org.apache.storm.topology.base.BaseWindowedBolt;
import org.apache.storm.trident.TridentState;
import org.apache.storm.trident.TridentTopology;
import org.apache.storm.trident.operation.*;
import org.apache.storm.trident.operation.builtin.Count;
import org.apache.storm.trident.testing.FixedBatchSpout;
import org.apache.storm.trident.testing.MemoryMapState;
import org.apache.storm.trident.testing.Split;
import org.apache.storm.trident.tuple.TridentTuple;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Values;
import java.nio.ByteBuffer;
import java.util.List;
import java.util.Map;
/**
* Created by Frank on 2017/7/16.
*/
public class Topology {
public static StormTopologybuildTopology(){
BrokerHosts kafkaHosts = new ZkHosts("kafkaserverip:2181");
TridentKafkaConfig spoutConf=new TridentKafkaConfig(kafkaHosts,"test");
spoutConf.scheme=new StringMultiSchemeWithTopic();
spoutConf.startOffsetTime= OffsetRequest.LatestTime();
//默认是从最早的时间点抓取,改为从当前
OpaqueTridentKafkaSpoutkafkaspout=new OpaqueTridentKafkaSpout(spoutConf);
TridentTopology topology = new TridentTopology();
topology.newStream("kafkaspout",kafkaspout)
.each(new Fields("str"),new Func1(),new Fields("obj"))
.each(new Fields("obj"),new Func2(),new Fields("descripe"))
.groupBy(new Fields("descripe"))
.persistentAggregate(new MemoryMapState.Factory(),new Count(),new Fields("count"))
.newValuesStream()
.each(new Fields("descripe","count"),newFunc3(),new Fields());
return topology.build();
}
public static void main(String[] args) throws Exception{
Config conf = new Config();
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("cdc", conf, buildTopology());
}
}
好,已经迫不及待的想试下能不能运行了,逻辑上好像是没啥问题。为了便于测试我用了node-red作为kafka的生产者,成产日志,而不是实际的程序日志。
如下,点击第一个则发送一个 当前时间年-月-日 时:分:秒,Beijing,1的消息到kafka的test主题,第二个则最后是2,第三个最后是3.
分别点击数次,同时观察程序打印情况:
随机点击模拟病症诊断结果的随机性,最后发现在北京的病症1在当前小时超过10次(第11次)的时候打印了breakout日志。可见程序运行正常。
好的trident部分看来问题不大我们再将最后的breakout日志发布到redis服务器上,而不仅仅是打印。
Func3 结合redis发布channel
package Func;
import Util.City;
import Util.MyEvent;
import org.apache.storm.trident.operation.BaseFunction;
import org.apache.storm.trident.operation.TridentCollector;
import org.apache.storm.trident.operation.TridentOperationContext;
import org.apache.storm.trident.tuple.TridentTuple;
import redis.clients.jedis.Jedis;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
/**
* Created by Frank on 2017/7/20.
*/
public class Func3 extends BaseFunction{
private staticfinal int BREAK_VALUE=10;
private Jedisjedis;
publicvoid prepare(Map conf, TridentOperationContext context) {
jedis=new Jedis("redisserverip");
jedis.connect();
jedis.auth("password");
}
public void execute(TridentTuple tuple, TridentCollector collector) {
Long count =tuple.getLongByField("count");
String descripe=tuple.getStringByField("descripe");
if(count>BREAK_VALUE){
System.out.println(descripe +" breakout");
jedis.publish("breakout",descripe +" breakout");
}
}
}
node-red中取订阅这个breakoutchannel,(redis的订阅发布叫channel其他一些叫topic其实意思一样)然后点十一次日志出来,breakout打印并且redis中也有这个发布出来。如下
程序就全部完成了。完结撒花~