Apache Storm

Apache Storm

版本号:2.0.0
官网:http://storm.apache.org/releases/2.0.0/index.html

一、基础篇

(一)什么是Storm

Apache Storm是一个免费的开源分布式实时计算系统。Storm可以轻松可靠地处理无限数据流,实时处理Hadoop为批处理所做的工作。Storm很简单,可以与任何编程语言一起使用,并且使用起来很有趣!
Storm有许多用例:实时分析,在线机器学习,连续计算,分布式RPC,ETL等。Storm很快:一个基准测试表示每个节点每秒处理超过一百万个元组。它具有可扩展性,容错性,可确保您的数据得到处理,并且易于设置和操作。
Storm集成了您已经使用的排队和数据库技术。Storm拓扑消耗数据流并以任意复杂的方式处理这些流,然后在计算的每个阶段之间重新划分流。阅读本教程中的更多内容。(引自官网)

(二)相关概念

Topology(直译为拓扑):在Storm中topology编织数据流计算的流程。Storm拓扑类似于MapReduce作业。一个关键的区别是MapReduce作业最终完成,而拓扑结构永远运行,直到手动kill掉进程。

Streams:流是无限的Tuple序列,以分布式方式并行处理和创建。Streams是使用Schema定义的,该Schema命名流的Tuple中的字段。

Tuple:是Storm中一则记录,该记录存储是一个数组元素,Tuple元素都是只读的,不允许修改。(Tuple等价于Kafka Streaming的Record,可以理解为一条包含多个字段的数据)

Tuple t=new Tuple(new Object[]{1,"zs",true})// readOnly

Spouts(直译为喷嘴):负责产生Tuple,是Streams源头.通常是通过Spout读取外围系统的数据,并且将数据封装成Tuple,并且将封装Tuple发射|emit到Topology中.IRichSpout|BaseRichSpout

Bolts(直译为螺栓):所有的Topology中的Tuple是通过Bolt处理,Bolt作用是用于过滤/聚合/函数处理/join/存储数据到DB中等。

IRichBolt|BaseRichBolt 执行At Most Once机制,IBasicBolt|BaseBasicBolt 执行At Least Once ,IStatefulBolt | BaseStatefulBolt 执行有状态计算。
注:名词图解
在这里插入图片描述

(三)集群搭建

1.集群模型

在这里插入图片描述

2.模型解释

Nimbus:计算任务的主节点,负责分发代码/分配任务/对Supervisor任务执行进行故障检测。
Supervisor:接受来自Nimbus的任务分配,启动Worker进程执行计算任务。
Zookeeper:负责Nimbus和Supervisor协调,Storm会使用zookeeper存储nimbus和supervisor进程状态信息,这就导致了Nimbus和Supervisor是无状态的可以实现任务快速故障恢复,即而让流计算达到难以置信的稳定。
Worker:是Supervisor专门为某一个Topology任务启动的一个Java 进程,Worker进程通过执行Executors(线程)完成任务的执行,每个任务会被封装成一个个Task

3.集群搭建(虚拟机)
  • 同步时钟
    [root@CentOSX ~]# yum install -y ntp
    [root@CentOSX ~]# service ntpd start
    [root@CentOSX ~]# ntpdate cn.pool.ntp.org
  • 安装zookeeper集群
    [root@CentOSX ~]# tar -zxf zookeeper-3.4.6.tar.gz -C /usr/
    [root@CentOSX ~]# mkdir zkdata
    [root@CentOSX ~]# cp /usr/zookeeper-3.4.6/conf/zoo_sample.cfg /usr/zookeeper-3.4.6/conf/zoo.cfg
    [root@CentOSX ~]# vi /usr/zookeeper-3.4.6/conf/zoo.cfg
    tickTime=2000
    dataDir=/root/zkdata
    clientPort=2181
    [root@CentOSX ~]# /usr/zookeeper-3.4.6/bin/zkServer.sh start zoo.cfg
    [root@CentOSX ~]# /usr/zookeeper-3.4.6/bin/zkServer.sh status zoo.cfg
  • 安装JDK8+
    [root@CentOSX ~]# rpm -ivh jdk-8u171-linux-x64.rpm
    [root@CentOSX ~]# vi .bashrc
    JAVA_HOME=/usr/java/latest
    CLASSPATH=.
    PATH= P A T H : PATH: PATH:JAVA_HOME/bin
    export JAVA_HOME
    export CLASSPATH
    export PATH
    [root@CentOSX ~]# source .bashrc
  • 配置主机名和IP的映射关系
    [root@CentOSX ~]# vi /etc/hosts
    127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
    192.168.111.128 CentOSA
    192.168.111.129 CentOSB
    192.168.111.130 CentOSC
  • 关闭防火墙
    [root@CentOSX ~]# vi /etc/hosts
    [root@CentOSX ~]# service iptables stop
    [root@CentOSX ~]# chkconfig iptables off
  • 安装配置Storm
    [root@CentOSX ~]# tar -zxf apache-storm-2.0.0.tar.gz -C /usr/
    [root@CentOSX ~]# vi .bashrc
    STORM_HOME=/usr/apache-storm-2.0.0
    JAVA_HOME=/usr/java/latest
    CLASSPATH=.
    PATH= P A T H : PATH: PATH:JAVA_HOME/bin:$STORM_HOME/bin
    export JAVA_HOME
    export CLASSPATH
    export PATH
    export STORM_HOME
    [root@CentOSX ~]# source .bashrc
    [root@CentOSX ~]# storm version
    如果是Storm-2.0.0需要二外安装yum install -y python-argparse否则 storm指令无法正常使用
    Traceback (most recent call last):
    File “/usr/apache-storm-2.0.0/bin/storm.py”, line 20, in
    import argparse
    ImportError: No module named argparse
  • 修改storm.yaml 配置文件
    [root@CentOSX ~]# vi /usr/apache-storm-2.0.0/conf/storm.yaml
    ########### These MUST be filled in for a storm configuration
    storm.zookeeper.servers:
    - “CentOSA”
    - “CentOSB”
    - “CentOSC”
    storm.local.dir: “/usr/storm-stage”
    nimbus.seeds: [“CentOSA”,“CentOSB”,“CentOSC”]
    supervisor.slots.ports:
    - 6700
    - 6701
    - 6702
    - 6703
    注意 ymal配置格式前面空格
  • 启动Storm进程
    [root@CentOSX ~]# nohup storm nimbus >/dev/null 2>&1 & – 启动 主节点
    [root@CentOSX ~]# nohup storm supervisor >/dev/null 2>&1 & --启动 计算节点
    [root@CentOSA ~]# nohup storm ui >/dev/null 2>&1 & --启动web ui界面
  • 启动成功后访问访问:http://CentOSA:8080

在这里插入图片描述

(四)入门案例-LowLevel API

1.案例描述

每秒钟向Topology随机发送"this is a demo",“hello Storm”,"ni hao"三个字符串中的一串,在Topology中分割字符串(分割为单词),统计每个单词出现次数,将最终结果持续输出打印

2.Maven依赖
  • pom.xml
<dependency>
    <groupId>org.apache.storm</groupId>
    <artifactId>storm-core</artifactId>
    <version>2.0.0</version>
    <scope>provided</scope>
</dependency>
<dependency>
    <groupId>org.apache.storm</groupId>
    <artifactId>storm-client</artifactId>
    <version>2.0.0</version>
    <scope>provided</scope>
</dependency>
3.Java代码
  • 编写 Spout
public class WordCountSpout extends BaseRichSpout {
    private String[] lines={"this is a demo","hello Storm","ni hao"};
     //该类负责将数据发送给下游
    private SpoutOutputCollector collector;
    public void open(Map<String, Object> conf, TopologyContext context, SpoutOutputCollector collector) {
        this.collector=collector;
    }
    //向下游发送Tuple ,改Tuple的Schemal在declareOutputFields声明
    public void nextTuple() {
        Utils.sleep(1000);//休息1s钟,即每隔一秒钟发送
        String line=lines[new Random().nextInt(lines.length)];
        collector.emit(new Values(line));
    }
    //对emit中的tuple做字段的描述
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("line"));
    }
}
  • 编写 Bolt
    LineSplitBolt
public class LineSplitBolt extends BaseRichBolt {
     //该类负责将数据发送给下游
    private OutputCollector collector;
    public void prepare(Map<String, Object> topoConf, TopologyContext context, OutputCollector collector) {
        this.collector=collector;
    }	
    public void execute(Tuple input) {
        String line = input.getStringByField("line");
        String[] tokens = line.split("\\W+");
        for (String token : tokens) {
            collector.emit(new Values(token,1));
        }
    }
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("word","count"));
    }
}

WordCountBolt

public class WordCountBolt extends BaseRichBolt {
    //存储状态
    private Map<String,Integer> keyValueState;
    //该类负责将数据发送给下游
    private OutputCollector collector;
 
    public void prepare(Map<String, Object> topoConf, TopologyContext context, OutputCollector collector) {
        this.collector=collector;
        keyValueState=new HashMap<String, Integer>();
    }
    public void execute(Tuple input) {
        String key = input.getStringByField("word");
        int count=0;
        if(keyValueState.containsKey(key)){
            count=keyValueState.get(key);
        }
        //更新状态
        int currentCount=count+1;
        keyValueState.put(key,currentCount);
        //将最后结果输出给下游
        collector.emit(new Values(key,currentCount));
    }
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("key","result"));
    }
}

WordPrintBolt

public class WordPrintBolt extends BaseRichBolt {
    
    public void prepare(Map<String, Object> topoConf, TopologyContext context, OutputCollector collector) {
    }
    public void execute(Tuple input) {
        String word=input.getStringByField("key");
        Integer result=input.getIntegerByField("result");
        System.out.println(input+"\t"+word+" , "+result);
    }
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
    }
}
  • 编写Topology
import org.apache.storm.Config;
import org.apache.storm.StormSubmitter;
import org.apache.storm.topology.TopologyBuilder;
import org.apache.storm.tuple.Fields;

public class WordCountTopology {
    public static void main(String[] args) throws Exception {
        //1.创建TopologyBuilder
        TopologyBuilder builder = new TopologyBuilder();

        //2.编织流处理逻辑- 重点(Spout、Bolt、连接方式)

        builder.setSpout("WordCountSpout",new WordCountSpout(),1);

        builder.setBolt("LineSplitBolt",new LineSplitBolt(),3)
                .shuffleGrouping("WordCountSpout");//设置 LineSplitBolt 接收上游数据通过 随机

        builder.setBolt("WordCountBolt",new WordCountBolt(),3)
                .fieldsGrouping("LineSplitBolt",new Fields("word"));

        builder.setBolt("WordPrintBolt",new WordPrintBolt(),4)
                .fieldsGrouping("WordCountBolt",new Fields("key"));

        //3.提交流计算
        Config conf= new Config();
        conf.setNumWorkers(3); //设置Topology运行所需的Worker资源,JVM个数
        conf.setNumAckers(0);  //关闭Storm应答,可靠性有关
        StormSubmitter.submitTopology("worldcount",conf,builder.createTopology());
        /*注:使用LocalCluster可以进行本地测试
         LocalCluster localCluster = new LocalCluster();
         localCluster.submitTopology("worldcount",conf,builder.createTopology());
		*/
    }
}

shuffleGrouping:表示下游的LineSplitBolt会随机的接收上游的Spout发出的数据流。
fieldsGrouping: 表示相同的Fields数据总会发送给同一个Task Bolt节点。

  • 任务提交

使用mvn package打包应用,然后将打包的jar包上传到 集群中的任意一台机器

[root@CentOSA ~]# storm jar /root/storm-lowlevel-1.0-SNAPSHOT.jar com.baizhi.demo01.WordCountTopology
....
16:27:28.207 [main] INFO  o.a.s.StormSubmitter - Finished submitting topology: worldcount

提交成功后,用户可以查看Storm UI界面查看程序的执行效果http://centosa:8080/

  • 查看任务列表
[root@CentOSA ~]# storm list
...
Topology_name        Status     Num_tasks  Num_workers  Uptime_secs  Topology_Id          Owner
----------------------------------------------------------------------------------------
worldcount           ACTIVE     11         3            66           worldcount-2-1560760048 root
  • 杀死Topology
[root@CentOSX ~]# storm kill worldcount

(五)任务并行性

1.代码对实体并行的配置-图解

Worker(进程)、Executors(线程)和Task(任务)
在这里插入图片描述

2.并行配置的运行原理-图解

在这里插入图片描述

3.并行配置在页面中的表现-图解

在这里插入图片描述

4.程序运行过程中并行配置的修改

方法一:使用Storm Web UI重新平衡拓扑。
方法二:使用在Linux中通过shell命令修改

[root@CentOSA ~]# storm rebalance
usage: storm rebalance [-h] [-w WAIT_TIME_SECS] [-n NUM_WORKERS]
                       [-e EXECUTORS] [-r RESOURCES] [-t TOPOLOGY_CONF]
                       [--config CONFIG]
                       [-storm_config_opts STORM_CONFIG_OPTS]
                       topology-name
                     

修改Worker数目

[root@CentOSX ~]# storm rebalance -w 10 -n 6  wordcount02

修改某个组件的并行度,一般不能超过Task个数

[root@CentOSX ~]# storm rebalance -w 10 -n 3 -e LineSplitBolt=5  wordcount02

(六)Tuple可靠性处理

1.Storm如何保证消息处理

Storm 消息Tuple可以通过一个叫做AckerBolt去监测整个Tuple Tree是否能够被完整消费,如果消费超时或者失败该AckerBolt会调用Spout组件(发送改Tuple的Spout组件)的fail方法,要求Spout重新发送Tuple.默认AckerBolt并行度是和Worker数目一致,用户可以通过config.setNumAckers(0);关闭Storm的Acker机制。
具体操作:
Spout端:- Spout在发射 tuple 的时候必须提供msgID,同时覆盖ack和fail方法

public class WordCountSpout extends BaseRichSpout {
    private String[] lines={"this is a demo","hello Storm","ni hao"};
    private SpoutOutputCollector collector;

    public void open(Map<String, Object> conf, TopologyContext context, SpoutOutputCollector collector) {
        this.collector=collector;
    }
    public void nextTuple() {
        Utils.sleep(5000);//休息1s钟
        int msgId = new Random().nextInt(lines.length);
        String line=lines[msgId];
        //发送 Tuple 指定 msgId
        collector.emit(new Values(line),msgId);
    }
    //对emit中的tuple做字段的描述
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("line"));
    }
    //发送成功回调 AckerBolt
    @Override
    public void ack(Object msgId) {
        System.out.println("发送成功:"+msgId);
    }
    //发送失败回调 AckerBolt
    @Override
    public void fail(Object msgId) {
        String line = lines[(Integer) msgId];
        System.out.println("发送失败:"+msgId+"\t"+line);
    }
}

Bolt端:将当前的子Tuple 锚定到父Tuple上; 向上游应答当前父Tuple的状态,应答有两种方式 collector.ack(input);|collector.fail(input)。

public void execute(Tuple 父Tuple) {
    try {
         //do sth
         //锚定当前父Tuple
         collector.emit(父Tuple,子Tuple);
         //向上游应答当前父Tuple的状态
         collector.ack(父Tuple);
     } catch (Exception e) {
         collector.fail(父Tuple);
     }
    }
2.可靠性机制检测原理

在这里插入图片描述

3.IBasicBolt|BaseBasicBolt规范

许多Bolt遵循读取输入元组的共同模式(锚定、ack出错fail),基于它发出元组,然后在执行方法结束时执行元组。因此Storm给我们提供了一套规范,如果用户使用Ack机制,在编写Bolt的时候只需要实现BasicBolt接口或者继承BaseBasicBolt类即可。

public class WordCountBolt extends BaseBasicBolt {
    private Map<String,Integer> keyValueState;
    @Override
    public void prepare(Map<String, Object> topoConf, TopologyContext context) {
        keyValueState=new HashMap<String, Integer>();
    }
    public void execute(Tuple input, BasicOutputCollector collector) {
        String key = input.getStringByField("word");
        int count=0;
        if(keyValueState.containsKey(key)){
            count=keyValueState.get(key);
        }
        int currentCount=count+1;
        keyValueState.put(key,currentCount);
        //不在需要手工锚定上级tuple
        collector.emit(new Values(key,currentCount));
    }
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("key","result"));
    }
}
4.关闭Acker机制
  • 设置NumAcker数目为0
  • 在Spout发送的是不提供MsgID
  • 在Bolt 不使用锚定
    关闭Acker的好处:可以提示Storm处理性能,减少延迟。

(七)Storm的状态管理

1.检查点机制

检查点由指定topology.state.checkpoint.interval.ms的内部检查点spout触发。如果拓扑中至少有一个IStatefulBolt(有状态的bolt),则拓扑构建器会自动添加检查点spout。对于有状态拓扑,拓扑构建器将IStatefulBolt包装在StatefulBoltExecutor中,该处理器在接收检查点tuple时处理状态提交。非状态Bolt包装在CheckpointTupleForwarder中,它只转发检查点Tuple,以便检查点元组可以流经拓扑DAG。检查点元组流经单独的内部流,即$ checkpoint。拓扑构建器在整个拓扑中连接检查点流,并在根处设置检查点spout。

在检查点间隔,CheckPointSpout发出CheckPointTuple(检查点元组)。在接收到检查点元组时,保存螺栓的状态,然后将检查点元组转发到下一个组件。每个tuple在保存其状态之前等待检查点到达其所有输入流,以便状态表示拓扑中的一致状态。一旦$CheckPointSpout从所有螺栓接收到ACK,状态提交就完成了,并且事务被记录并由CheckPointSpout提交。
状态检查点不检查当前spout的状态。然而,一旦所有tuple的状态都被检查,并且一旦CheckPointTuple被激活,Spout发出的tuple也会被激活。它还意味着topology.state.checkpoint.interval.ms低于topology.message.timeout.secs。 状态提交的工作方式类似于具有准备和提交阶段的三阶段提交协议,以便以一致且原子的方式保存拓扑中的状态。

2.状态持久化-基于Redis存储状态
  • 在pom.xml中引入依赖
<dependency>
    <groupId>org.apache.storm</groupId>
    <artifactId>storm-redis</artifactId>
    <version>2.0.0</version>
</dependency>
  • 官方文档给出的配置模板Map<String,String|Map<String,Object>>
单机模式
{
  "keyClass": "Optional fully qualified class name of the Key type.",
  "valueClass": "Optional fully qualified class name of the Value type.",
  "keySerializerClass": "Optional Key serializer implementation class.",
  "valueSerializerClass": "Optional Value Serializer implementation class.",
  "jedisPoolConfig": {
    "host": "localhost",
    "port": 6379,
    "timeout": 2000,
    "database": 0,
    "password": "xyz"
    }
}
集群模式
{
   "keyClass": "Optional fully qualified class name of the Key type.",
   "valueClass": "Optional fully qualified class name of the Value type.",
   "keySerializerClass": "Optional Key serializer implementation class.",
   "valueSerializerClass": "Optional Value Serializer implementation class.",
   "jedisClusterConfig": {
     "nodes": ["localhost:7379", "localhost:7380", "localhost:7381"],
     "timeout": 2000,
     "maxRedirections": 5
   }
 }
  • 配置topology,添加如下配置信息
    //配置Redis
    conf.put(Config.TOPOLOGY_STATE_PROVIDER,"org.apache.storm.redis.state.RedisKeyValueStateProvider");
    Map<String,Object> stateConfig=new HashMap<String,Object>();
    Map<String,Object> redisConfig=new HashMap<String,Object>();
    redisConfig.put("host","CentOSA");
    redisConfig.put("port",6379);
    stateConfig.put("jedisPoolConfig",redisConfig);
    ObjectMapper objectMapper=new ObjectMapper();
    System.out.println(objectMapper.writeValueAsString(stateConfig));
    conf.put(Config.TOPOLOGY_STATE_PROVIDER_CONFIG,objectMapper.writeValueAsString(stateConfig));
3.状态持久化-基于HBase存储状态
  • 在pom.xml中引入依赖
<dependency>
    <groupId>org.apache.storm</groupId>
    <artifactId>storm-hbase</artifactId>
    <version>2.0.0</version>
</dependency>
  • 官方文档给出的配置模板Map<String,String>
{
   "keyClass": "Optional fully qualified class name of the Key type.",
   "valueClass": "Optional fully qualified class name of the Value type.",
   "keySerializerClass": "Optional Key serializer implementation class.",
   "valueSerializerClass": "Optional Value Serializer implementation class.",
   "hbaseConfigKey": "config key to load hbase configuration from storm root configuration. (similar to storm-hbase)",
   "tableName": "Pre-created table name for state.",
   "columnFamily": "Pre-created column family for state."
 }
  • 配置topology,添加如下配置信息
config.put(Config.TOPOLOGY_STATE_PROVIDER,"org.apache.storm.hbase.state.HBaseKeyValueStateProvider");
Map<String,Object> hbaseConfig=new HashMap<String,Object>();
hbaseConfig.put("hbase.zookeeper.quorum", "CentOSA");//Hbase zookeeper连接参数
config.put("hbase.conf", hbaseConfig);
ObjectMapper objectMapper=new ObjectMapper();
Map<String,Object> stateConfig=new HashMap<String,Object>();
stateConfig.put("hbaseConfigKey","hbase.conf");
stateConfig.put("tableName","baizhi:wordcountstate");
stateConfig.put("columnFamily","cf1");
config.put(Config.TOPOLOGY_STATE_PROVIDER_CONFIG,objectMapper.writeValueAsString(stateConfig));

(八)Distributed RPC

Storm的DRPC真正的实现了并行计算.Storm Topology接受用户的参数进行计算,然后最终将计算结果以Tuple形式返回给用户.

在这里插入图片描述

  • 修改storm.yaml配置文件

vi /usr/apache-storm-2.0.0/conf/storm.yaml

 storm.zookeeper.servers:
     - "CentOSA"
     - "CentOSB"
     - "CentOSC"
 storm.local.dir: "/usr/apache-storm-1.2.2/storm-stage"
 nimbus.seeds: ["CentOSA","CentOSB","CentOSC"]
 supervisor.slots.ports:
     - 6700
     - 6701
     - 6702
     - 6703
 drpc.servers:
     - "CentOSA"
     - "CentOSB"
     - "CentOSC"
 storm.thrift.transport: "org.apache.storm.security.auth.plain.PlainSaslTransportPlugin"

注意格式!

  • 重启Storm所有服务
[root@CentOSX ~]# nohup storm drpc  >/dev/null 2>&1 &
[root@CentOSX ~]# nohup storm nimbus >/dev/null 2>&1 &
[root@CentOSX ~]# nohup storm supervisor >/dev/null 2>&1 &
[root@CentOSA ~]# nohup storm ui >/dev/null 2>&1 &
DRPC案例剖析

引入依赖

<dependency>
    <groupId>org.apache.storm</groupId>
    <artifactId>storm-redis</artifactId>
    <version>2.0.0</version>
</dependency>

WordCountRedisLookupMapper

import com.google.common.collect.Lists;
import org.apache.storm.redis.common.mapper.RedisDataTypeDescription;
import org.apache.storm.redis.common.mapper.RedisLookupMapper;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.ITuple;
import org.apache.storm.tuple.Values;

import java.util.List;

public class WordCountRedisLookupMapper implements RedisLookupMapper {
    // iTuple 上游发送的iTuple,目的是为了获取id
    public List<Values> toTuple(ITuple iTuple, Object value) {
        Object id = iTuple.getValue(0);
        List<Values> values = Lists.newArrayList();
        if(value == null){
            value = 0;
        }
        values.add(new Values(id, value));

        return values;

    }

    //第一个位置的name必须为id,后续的无所谓了
    public void declareOutputFields(OutputFieldsDeclarer outputFieldsDeclarer) {
        outputFieldsDeclarer.declare(new Fields("id", "num"));
    }
    //告知数据类型
    public RedisDataTypeDescription getDataTypeDescription() {
        return new RedisDataTypeDescription(RedisDataTypeDescription.RedisDataType.HASH,"wordcount");
    }

    public String getKeyFromTuple(ITuple iTuple) {
        return iTuple.getString(1);
    }
    //该方法无需实现,默认是给RedisStoreBolt使用
    public String getValueFromTuple(ITuple iTuple) {
        return null;
    }
}

TopologyDRPCStreeamTest

import org.apache.storm.Config;
import org.apache.storm.StormSubmitter;
import org.apache.storm.drpc.LinearDRPCTopologyBuilder;
import org.apache.storm.redis.bolt.RedisLookupBolt;
import org.apache.storm.redis.common.config.JedisPoolConfig;

public class TopologyDRPCStreeamTest {
    public static void main(String[] args) throws Exception {
        LinearDRPCTopologyBuilder builder = new LinearDRPCTopologyBuilder("count");
        Config conf = new Config();
        conf.setDebug(false);

        JedisPoolConfig jedisConfig = new JedisPoolConfig.Builder()
                .setHost("CentOSA").setPort(6379).build();

        RedisLookupBolt lookupBolt = new RedisLookupBolt(jedisConfig, new WordCountRedisLookupMapper());
        builder.addBolt(lookupBolt);

        StormSubmitter.submitTopology("drpc-demo", conf, builder.createRemoteTopology());

    }
}
  • 打包服务
  • 提交topology
[root@CentOSA ~]# storm jar storm-lowlevel-1.0-SNAPSHOT.jar  com.baizhi.demo07.TopologyDRPCStreeamTest --artifacts 'org.apache.storm:storm-redis:2.0.0'

–artifacts 指定程序运行所需的maven坐标依赖,strom脚本会自动连接网络下载,如果有多个依赖请使用^隔开。如果依赖实在私服上用户可以使用--artifactRepositories

[root@CentOSA ~]# storm jar storm-lowlevel-1.0-SNAPSHOT.jar  com.baizhi.demo07.TopologyDRPCStreeamTest 
        --artifacts 'org.apache.storm:storm-redis:2.0.0' 
        --artifactRepositories  'local^http://192.168.111.1:8081/nexus/content/groups/public/'

附录:Maven私服的安装与配置

(九)Kafka充当Spout

1.Storm与Kafka集成-LowLevel API
  • 在pom.xml中引入依赖
    <dependency>
        <groupId>org.apache.storm</groupId>
        <artifactId>storm-kafka-client</artifactId>
        <version>2.0.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.kafka</groupId>
        <artifactId>kafka-clients</artifactId>
        <version>2.2.0</version>
    </dependency>
  • 构建KafkaSpout
    public class KafkaTopologyDemo {
        public static void main(String[] args) throws Exception {
            TopologyBuilder builder = new TopologyBuilder(); 
            String boostrapServers="CentOSA:9092,CentOSB:9092,CentOSC:9092";
            String topic="topic01";

            KafkaSpout<String, String> kafkaSpout = buildKafkaSpout(boostrapServers,topic);
            //默认输出的Tuple格式 new Fields(new String[]{"topic", "partition", "offset", "key", "value"});
            builder.setSpout("KafkaSpout",kafkaSpout,3);
            builder.setBolt("KafkaPrintBlot",new KafkaPrintBlot(),1)
                    .shuffleGrouping("KafkaSpout");
            Config conf = new Config();
            conf.setNumWorkers(3);
            LocalCluster cluster = new LocalCluster();
            cluster.submitTopology("kafkaspout",conf,builder.createTopology());
        }
        //提供静态方法,用来添加Kafka配置信息构建KafkaSpout
        public static KafkaSpout<String, String> buildKafkaSpout(String boostrapServers,String topic){
    
            KafkaSpoutConfig<String,String> kafkaspoutConfig=KafkaSpoutConfig.builder(boostrapServers,topic)                    .setProp(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringDeserializer")
                    .setProp(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringDeserializer")
                    .setProp(ConsumerConfig.GROUP_ID_CONFIG,"g1")
                    .setEmitNullTuples(false)
                    .setFirstPollOffsetStrategy(FirstPollOffsetStrategy.LATEST)
                    .setProcessingGuarantee(KafkaSpoutConfig.ProcessingGuarantee.AT_LEAST_ONCE)
                    .setMaxUncommittedOffsets(10)//一旦分区积压有10个未提交offset,Spout停止poll数据,解决Storm背压问题
                    //使用自定义转换器-为了得到需要的字段
                    .setRecordTranslator(new MyRecordTranslator<String, String>())
                    .build();
            return new KafkaSpout<String, String>(kafkaspoutConfig);
        }
    }
  • 自定义装换器,将Kafka的Topic中的Recourd转换成tuple
    MyRecordTranslator
    import org.apache.kafka.clients.consumer.ConsumerRecord;
    import org.apache.storm.kafka.spout.DefaultRecordTranslator;
    import org.apache.storm.tuple.Fields;
    import org.apache.storm.tuple.Values;
    
    import java.util.List;
    
    public class MyRecordTranslator<K, V>  extends DefaultRecordTranslator<K, V> {
        @Override
        public List<Object> apply(ConsumerRecord<K, V> record) {
            return new Values(new Object[]{record.topic(),record.partition(),record.offset(),record.key(),record.value(),record.timestamp()});
        }
    
        @Override
        public Fields getFieldsFor(String stream) {
            return new Fields("topic","partition","offset","key","value","timestamp");
        }
    }
2.Storm与Kafka、Hbase、Redis整合-LowLevel API
  • 导入依赖
    <dependency>
        <groupId>org.apache.storm</groupId>
        <artifactId>storm-core</artifactId>
        <version>2.0.0</version>
        <scope>provide</scope>
    </dependency> 
    <dependency>
        <groupId>org.apache.storm</groupId>
        <artifactId>storm-client</artifactId>
        <version>2.0.0</version>
        <scope>provide</scope>
    </dependency>    
    <dependency>
        <groupId>org.apache.storm</groupId>
        <artifactId>storm-redis</artifactId>
        <version>2.0.0</version>
    </dependency>    
    <dependency>
        <groupId>org.apache.storm</groupId>
        <artifactId>storm-hbase</artifactId>
        <version>2.0.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.storm</groupId>
        <artifactId>storm-kafka-client</artifactId>
        <version>2.0.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.kafka</groupId>
        <artifactId>kafka-clients</artifactId>
        <version>2.2.0</version>
    </dependency>
  • WodCountTopology
    public class WodCountTopology {
        public static void main(String[] args) throws Exception {
    
            TopologyBuilder builder=new TopologyBuilder();
            Config conf = new Config();
    
            //Redis 状态管理
            conf.put(Config.TOPOLOGY_STATE_PROVIDER,"org.apache.storm.redis.state.RedisKeyValueStateProvider");
            Map<String,Object> stateConfig=new HashMap<String,Object>();
            Map<String,Object> redisConfig=new HashMap<String,Object>();
            redisConfig.put("host","CentOSA");
            redisConfig.put("port",6379);
            stateConfig.put("jedisPoolConfig",redisConfig);
            ObjectMapper objectMapper=new ObjectMapper();
            System.out.println(objectMapper.writeValueAsString(stateConfig));
            conf.put(Config.TOPOLOGY_STATE_PROVIDER_CONFIG,objectMapper.writeValueAsString(stateConfig));
    
            //配置Hbase连接参数
            Map<String, Object> hbaseConfig = new HashMap<String, Object>();
            hbaseConfig.put("hbase.zookeeper.quorum", "CentOSA");
            conf.put("hbase.conf", hbaseConfig);
    
            //构建KafkaSpout
            KafkaSpout<String, String> kafkaSpout = KafkaSpoutUtils.buildKafkaSpout("CentOSA:9092,CentOSB:9092,CentOSC:9092", "topic01");
    
            builder.setSpout("KafkaSpout",kafkaSpout,3);
            builder.setBolt("LineSplitBolt",new LineSplitBolt(),3)
                    .shuffleGrouping("KafkaSpout");
            builder.setBolt("WordCountBolt",new WordCountBolt(),3)
                    .fieldsGrouping("LineSplitBolt",new Fields("word"));
    
            SimpleHBaseMapper mapper = new SimpleHBaseMapper()
                    .withRowKeyField("key")
                    .withColumnFields(new Fields("key"))
                    .withCounterFields(new Fields("result"))//要求改field的值必须是数值类型
                    .withColumnFamily("cf1");
    
            HBaseBolt haseBolt = new HBaseBolt("baizhi:t_words", mapper)
                    .withConfigKey("hbase.conf");
            builder.setBolt("HBaseBolt",haseBolt,3)
                    .fieldsGrouping("WordCountBolt",new Fields("key"));
    
            StormSubmitter.submitTopology("wordcount1",conf,builder.createTopology());
        }
    }
  • WordCountBolt
    public class WordCountBolt extends BaseStatefulBolt<KeyValueState<String,Integer>> {
        private KeyValueState<String,Integer> state;
        private OutputCollector collector;
        public void initState(KeyValueState<String,Integer> state) {
            this.state=state;
        }
        @Override
        public void prepare(Map<String, Object> topoConf, TopologyContext context, OutputCollector collector) {
            this.collector=collector;
        }
        public void execute(Tuple input) {
            String key = input.getStringByField("word");
            Integer count=input.getIntegerByField("count");
            Integer historyCount = state.get(key, 0);
    
            Integer currentCount=historyCount+count;
            //更新状态
            state.put(key,currentCount);
    
            //必须锚定当前的input
            collector.emit(input,new Values(key,currentCount));
            collector.ack(input);
    
        }
    
        @Override
        public void declareOutputFields(OutputFieldsDeclarer declarer) {
            declarer.declare(new Fields("key","result"));
        }
    }
  • LineSplitBolt
    public class LineSplitBolt extends BaseBasicBolt {
        public void declareOutputFields(OutputFieldsDeclarer declarer) {
            declarer.declare(new Fields("word","count"));
        }
        public void execute(Tuple input, BasicOutputCollector collector) {
            String line = input.getStringByField("value");
            String[] tokens = line.split("\\W+");
            for (String token : tokens) {
                //锚定当前Tuple
                collector.emit(new Values(token,1));
            }
        }
    }
  • maven远程下载

    [root@CentOSC ~]# storm jar storm-lowlevel-1.0-SNAPSHOT.jar com.baizhi.demo09.WodCountTopology --artifacts ‘org.apache.storm:storm-redis:2.0.0,org.apache.storm:storm-hbase:2.0.0,org.apache.storm:storm-kafka-client:2.0.0,org.apache.kafka:kafka-clients:2.2.0’ --artifactRepositories ‘local^http://192.168.111.1:8081/nexus/content/groups/public/’

  • 在项目中添加插件-将项目依赖一起打包

    <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <version>2.4.1</version>
        <executions>
            <execution>
                <phase>package</phase>
                <goals>
                    <goal>shade</goal>
                </goals>
                <configuration>
                    <filters>
                        <filter>
                            <artifact>*:*</artifact>
                            <excludes>
                                <exclude>META-INF/*.SF</exclude>
                                <exclude>META-INF/*.DSA</exclude>
                                <exclude>META-INF/*.RSA</exclude>
                            </excludes>
                        </filter>
                    </filters>
                </configuration>
            </execution>
        </executions>
    </plugin>

(十)窗口函数Window

Storm核心支持处理窗口内的一组元组。 Windows使用以下两个参数指定(类似Kafka Streaming):

  • 窗口长度- the length or duration of the window
  • 滑动间隔- the interval at which the windowing slides
1.Sliding Window(hopping time window)

Tuples以窗口进行分组,窗口每间隔一段滑动间隔滑动出一个新的窗口。例如下面就是一个基于时间滑动的窗口,窗口每间隔10秒钟为一个窗口,每间隔5秒钟滑动一次窗口,从下面的案例中可以看到,滑动窗口是存在一定的重叠,也就是说一个tuple可能属于1~n个窗口 。

........| e1 e2 | e3 e4 e5 e6 | e7 e8 e9 |...
-5      0       5            10          15   -> time
|<------- w1 -->|
        |<---------- w2 ----->|
                |<-------------- w3 ---->|
2.Tumbling Window

Tuples以窗口分组,窗口滑动的长度恰好等于窗口长度,这就导致和Tumbling Window和Sliding Window最大的区别是Tumbling Window没有重叠,也就是说一个Tuple只属于固定某一个window。

| e1 e2 | e3 e4 e5 e6 | e7 e8 e9 |...
0       5             10         15    -> time
   w1         w2            w3
3.代码运用
//构建ClickWindowCountBolt

public class ClickWindowCountBolt extends BaseWindowedBolt {
    private OutputCollector collector;
    @Override
    public void prepare(Map<String, Object> topoConf, TopologyContext context, OutputCollector collector) {
        this.collector=collector;
    }
    public void execute(TupleWindow tupleWindow) {
        Long startTimestamp = tupleWindow.getStartTimestamp();
        Long endTimestamp = tupleWindow.getEndTimestamp();
        SimpleDateFormat sdf=new SimpleDateFormat("HH:mm:ss");
      System.out.println(sdf.format(startTimestamp)+"\t"+sdf.format(endTimestamp));  
        HashMap<String,Integer> hashMap=new HashMap<String, Integer>();
        List<Tuple> tuples = tupleWindow.get();
        for (Tuple tuple : tuples) {
            String key = tuple.getStringByField("word");
            Integer historyCount = 0;
            if (hashMap.containsKey(key)) {
                historyCount=hashMap.get(key);
            }
            int currentCount=historyCount+1;
            hashMap.put(key,currentCount);
        }
        //将数据输出给PrintBolt
        for (Map.Entry<String, Integer> entry : hashMap.entrySet()) {
            collector.emit(tupleWindow.get(),new Values(entry.getKey(),entry.getValue()));
        }
        for (Tuple tuple : tupleWindow.get()) {
            collector.ack(tuple);
        }
    }
    @Override
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("key","result"));
    }
}

//构建拓扑
public class WodCountTopology {
    public static void main(String[] args) throws Exception {

        TopologyBuilder builder=new TopologyBuilder();
        Config conf = new Config();


        //构建KafkaSpout
        KafkaSpout<String, String> kafkaSpout = KafkaSpoutUtils.buildKafkaSpout("CentOSA:9092,CentOSB:9092,CentOSC:9092", "topic01");

        builder.setSpout("KafkaSpout",kafkaSpout,3);
        builder.setBolt("LineSplitBolt",new LineSplitBolt(),3)
                .shuffleGrouping("KafkaSpout");
        ClickWindowCountBolt clickWindowCountBolt = new ClickWindowCountBolt();
        //设置滑动窗口
        //clickWindowCountBolt.withWindow(BaseWindowedBolt.Duration.seconds(5),BaseWindowedBolt.Duration.seconds(2));
        //设置滚动窗口
        clickWindowCountBolt.withTumblingWindow(BaseWindowedBolt.Duration.seconds(5));

        builder.setBolt("ClickWindowCountBolt",clickWindowCountBolt,3)
                .fieldsGrouping("LineSplitBolt",new Fields("word"));
        builder.setBolt("WordPrintBolt",new WordPrintBolt(),3)
                .fieldsGrouping("ClickWindowCountBolt",new Fields("key"));

        new LocalCluster().submitTopology("wordcount",conf,builder.createTopology());
    }
}
4.Storm窗口的时间采样策略

默认情况下,Storm窗口计算时间是根据Tuple抵达Bolt时当前系统时间。只有当记录产生时间和计算时间差非常小的时候,改计算才有意义,通常把这种计算时间的策略称为Prcessing Time
通常在实际业务场景中,计算节点的时间往往比数据产生的时间较晚,这个时候基于窗口的就失去了原有的意义。Storm支持通过提取Tuple所携带的时间参数,进行窗口计算。通常把这种计算时间的策略称为Event Time.

  • Event Time Window(基于事件时间的窗口)
    (1)相关概念:
    水位线:watermaker,改值的取值是当前接收Tuple的最新的时间戳减去 延迟lag即可以得到水位线。水位线的作用是为了推进触发窗口的。
    lag:设置水位线的延迟间隙
    (2)原理图解
    在这里插入图片描述
    (3)代码实现
构建ClickWindowCountBolt
public class ClickWindowCountBolt extends BaseWindowedBolt {
    private OutputCollector collector;
    @Override
    public void prepare(Map<String, Object> topoConf, TopologyContext context, OutputCollector collector) {
        this.collector=collector;
    }
    public void execute(TupleWindow tupleWindow) {
        Long startTimestamp = tupleWindow.getStartTimestamp();
        Long endTimestamp = tupleWindow.getEndTimestamp();
        SimpleDateFormat sdf=new SimpleDateFormat("HH:mm:ss");
      System.out.println(sdf.format(startTimestamp)+"\t"+sdf.format(endTimestamp)+" \t"+this);
        for (Tuple tuple : tupleWindow.get()) {
            collector.ack(tuple);
            String key = tuple.getStringByField("word");
            System.out.println("\t"+key);
        }
    }
}	
构建ExtractTimeBolt-提取时间的bolt
public class ExtractTimeBolt extends BaseBasicBolt {
    public void execute(Tuple input, BasicOutputCollector collector) {
        String line = input.getStringByField("value");
        String[] tokens = line.split("\\W+");

        SimpleDateFormat sdf=new SimpleDateFormat("HH:mm:ss");
        Long ts= Long.parseLong(tokens[1]);
        System.out.println("收到:"+tokens[0]+"\t"+sdf.format(ts) );
        collector.emit(new Values(tokens[0],ts));
    }
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("word","timestamp"));
    }
}
构建收集迟到tuple的bolt
public class LateBolt extends BaseBasicBolt {
    public void execute(Tuple tuple, BasicOutputCollector basicOutputCollector) {
        System.out.println("迟到的元素:"+tuple);
    }
    public void declareOutputFields(OutputFieldsDeclarer outputFieldsDeclarer) {

    }
}
构建Topology
public class WodCountTopology {
    public static void main(String[] args) throws Exception {

        TopologyBuilder builder=new TopologyBuilder();
        Config conf = new Config();

        //构建KafkaSpout
        KafkaSpout<String, String> kafkaSpout = KafkaSpoutUtils.buildKafkaSpout("CentOSA:9092,CentOSB:9092,CentOSC:9092", "topic02");
		//Kafka发出tuple
        builder.setSpout("KafkaSpout",kafkaSpout,3);
        //提取时间
        builder.setBolt("ExtractTimeBolt",new ExtractTimeBolt(),3)
                .shuffleGrouping("KafkaSpout");
		//放入窗口
        builder.setBolt("ClickWindowCountBolt",new ClickWindowCountBolt()
                .withWindow(BaseWindowedBolt.Duration.seconds(10),BaseWindowedBolt.Duration.seconds(5))
                .withTimestampField("timestamp")
                .withLag(BaseWindowedBolt.Duration.seconds(2))
                .withWatermarkInterval(BaseWindowedBolt.Duration.seconds(1))
                .withLateTupleStream("latestream")
                ,1)
                .fieldsGrouping("ExtractTimeBolt",new Fields("word"));
		//收集迟到tuple
        builder.setBolt("lateBolt",new LateBolt(),3)
                .shuffleGrouping("ClickWindowCountBolt",
                        "latestream");

        new LocalCluster().submitTopology("wordcount",conf,builder.createTopology());
    }
}

二、应用篇

(一)Trident描述

Trident(直译三叉戟)是Storm的高级API,是一个高级抽象,用于在Storm之上进行实时计算。允许无缝混合高吞吐量(每秒数百万条消息),有状态流处理和低延迟分布式查询Trident具有连接,聚合,分组,功能和过滤器。除此之外,Trident还添加了基元,用于在任何数据库或持久性存储之上执行有状态的增量处理。Trident具有一致,完全一次的语义,因此很容易推理Trident拓扑。
特点:宏观流处理,微观批处理。批处理可以分组,联接,聚合,运行函数,运行过滤器等,同时提供了跨批处理进行聚合的功能,并持久存储这些聚合。

(二)常见算子介绍-无状态

1.Map算子

将一个Tuple转换为另外一个Tuple,如果用户修改了Tuple元素的个数,需要指定输出的Fields

  tridentTopology.newStream("KafkaSpoutOpaque",KafkaSpoutUtils.buildKafkaSpoutOpaque("CentOSA:9092,CentOSB:9092,CentOSC:9092","topic01"))
               .map((tuple)-> new Values("Hello~"+tuple.getStringByField("value")),new Fields("name"))
               .peek((tuple) -> System.out.println(tuple));
2.flatMap

将一个Tuple,转换为多个Tuple,如果修改了Tuple的数目,需要指定输出的Fields

                    .flatMap((tuple)->{
                        List<Values> list=new ArrayList<>();
                        String[] tokens = tuple.getStringByField("value").split("\\W+");
                        for (String token : tokens) {
                            list.add(new Values(token));
                        }
                        return list;
                    },new Fields("word"))
                    .peek((tuple) -> System.out.println(tuple));
3.Filter

过滤上游输入的Tuple将满足条件的Tuple向下游输出。

  tridentTopology.newStream("KafkaSpoutOpaque",KafkaSpoutUtils.buildKafkaSpoutOpaque("CentOSA:9092,CentOSB:9092,CentOSC:9092","topic01"))
         .filter(new Fields("value"), new BaseFilter() {
             @Override
             public boolean isKeep(TridentTuple tuple) {
                 return !tuple.getStringByField("value").contains("error");
             }
         })
         .peek((tuple) -> System.out.println(tuple));
4.each

参数传递可以是BaseFunction(需要添加fields)和BaseFilter(等价于Filter)

  • basefunction
tridentTopology.newStream("KafkaSpoutOpaque",KafkaSpoutUtils.buildKafkaSpoutOpaque("CentOSA:9092,CentOSB:9092,CentOSC:9092","topic01"))
        .each(new Fields("value"), new BaseFunction() {
            @Override
            public void execute(TridentTuple tuple, TridentCollector collector) {
                collector.emit(new Values(tuple.getStringByField("value")));
            }
        }, new Fields("other"))
        .peek((tuple) -> System.out.println(tuple));
  • baseFilter
tridentTopology.newStream("KafkaSpoutOpaque",KafkaSpoutUtils.buildKafkaSpoutOpaque("CentOSA:9092,CentOSB:9092,CentOSC:9092","topic01"))
        .each(new Fields("value"), new BaseFilter() {
            @Override
            public boolean isKeep(TridentTuple tuple) {
                return !tuple.getStringByField("value").contains("error");
            }
        })
        .peek((tuple) -> System.out.println(tuple));
5.project

投影/过滤Tuple中无用field

tridentTopology.newStream("KafkaSpoutOpaque",KafkaSpoutUtils.buildKafkaSpoutOpaque("CentOSA:9092,CentOSB:9092,CentOSC:9092","topic01"))
       .project(new Fields("value","timestamp"))
       .peek((tuple) -> System.out.println(tuple));
6.分区和聚合-无状态
  • 定义计数聚合器CountAggregater
    public class CountAggregater extends BaseAggregator<Map<String,Integer>> {
        @Override
        public Map<String, Integer> init(Object batchId, TridentCollector collector) {
            return new HashMap<>();
        }
    	//对countle累计求和
        @Override
        public void aggregate(Map<String, Integer> val, TridentTuple tuple, TridentCollector collector) {
            String word = tuple.getStringByField("key");
            Integer count=tuple.getIntegerByField("count");  
            if(val.containsKey(word)){
                count= val.get(word)+count;
            }
            val.put(word,count);
        }
    	//完成后emit
        @Override
        public void complete(Map<String, Integer> val, TridentCollector collector) {
            for (Map.Entry<String, Integer> entry : val.entrySet()) {
                collector.emit(new Values(entry.getKey(),entry.getValue()));
            }
            val.clear();
        }
    }
  • 构建topology
    public class KafkaTridentTopology {
        public static void main(String[] args) throws Exception {
            TridentTopology tridentTopology=new TridentTopology();    
            tridentTopology.newStream("KafkaSpoutOpaque",KafkaSpoutUtils.buildKafkaSpoutOpaque("CentOSA:9092,CentOSB:9092,CentOSC:9092","topic01"))
                .parallelismHint(3)//设置分区,并行度
                .project(new Fields("value"))//过滤去除无用的field
                .flatMap((tuple)-> {//将一个tuple转化为多个tuple
                    List<Values> list=new ArrayList<>();
                    String[] tokens = tuple.getStringByField("value").split("\\W+");
                    for (String token : tokens) {
                        list.add(new Values(token));
                    }
                    return list;
                },new Fields("word"))
                .map((tuple)->new Values(tuple.getStringByField("word"),1),new Fields("key","count"))//为每一个word设定count初始值为1
                .partition(new PartialKeyGrouping(new Fields("key")))
                .parallelismHint(5)
                .partitionAggregate(new Fields("key","count"),new CountAggregater(),new Fields("word","total"))//分区统计
                .peek((tuple) -> System.out.println(tuple));    
    
            new LocalCluster().submitTopology("tridentTopology",new Config(),tridentTopology.build());
        }
    }

(三)Trident状态管理

1.ProcessingGuarantee(处理保证-容错)的三种策略对比

在这里插入图片描述

2.统计案例
  • 定义KafkaSpoutUtils
public class KafkaSpoutUtils {
    public static KafkaSpout<String, String> buildKafkaSpout(String boostrapServers, String topic){

        KafkaSpoutConfig<String,String> kafkaspoutConfig=KafkaSpoutConfig.builder(boostrapServers,topic)
                .setProp(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringDeserializer")
                .setProp(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringDeserializer")
                .setProp(ConsumerConfig.GROUP_ID_CONFIG,"g1")
                .setEmitNullTuples(false)
                .setFirstPollOffsetStrategy(FirstPollOffsetStrategy.LATEST)
                .setProcessingGuarantee(KafkaSpoutConfig.ProcessingGuarantee.AT_LEAST_ONCE)
                .setMaxUncommittedOffsets(10)//一旦分区积压有10个未提交offset,Spout停止poll数据,解决Storm背压问题

                .build();
        return new KafkaSpout<String, String>(kafkaspoutConfig);
    }
    //可以保证精准一次更新,推荐使用
    public static KafkaTridentSpoutOpaque<String,String> buildKafkaSpoutOpaque(String boostrapServers, String topic){
        KafkaTridentSpoutConfig<String, String> kafkaOpaqueSpoutConfig = KafkaTridentSpoutConfig.builder(boostrapServers, topic)
                .setProp(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringDeserializer")
                .setProp(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringDeserializer")
                .setProp(ConsumerConfig.GROUP_ID_CONFIG,"g1")
                .setFirstPollOffsetStrategy(FirstPollOffsetStrategy.LATEST)
                .setRecordTranslator(new Func<ConsumerRecord<String, String>, List<Object>>() {
                    public List<Object> apply(ConsumerRecord<String, String> record) {
                        return new Values(record.key(),record.value(),record.timestamp());
                    }
                },new Fields("key","value","timestamp"))
                .build();
        return new KafkaTridentSpoutOpaque<String, String>(kafkaOpaqueSpoutConfig);
    }
}
  • 构建Topology
public class KafkaTridentTopology {
    public static void main(String[] args) throws Exception {
        TridentTopology tridentTopology = new TridentTopology();

		//配置JedisPoolConfig参数,用来状态存储
        JedisPoolConfig jedisPoolConfig = new JedisPoolConfig.Builder()
                .setHost("CentOSA")
                .setPort(6379)
                .build();
        //配置状态存储数据类型参数,“mapstate”为存储在redis中的大key        
        Options<OpaqueValue> options=new Options<OpaqueValue>();
        options.dataTypeDescription=new RedisDataTypeDescription(RedisDataTypeDescription.RedisDataType.HASH,"mapstate");
        	//数据序列化配置
        options.serializer=new JSONOpaqueSerializer();


        tridentTopology.newStream("KafkaSpoutOpaque", KafkaSpoutUtils.buildKafkaSpoutOpaque("CentOSA:9092,CentOSB:9092,CentOSC:9092", "topic01"))
                .project(new Fields("value"))
                .flatMap((tuple) -> {
                    List<Values> list = new ArrayList<>();
                    String[] tokens = tuple.getStringByField("value").split("\\W+");
                    for (String token : tokens) {
                        list.add(new Values(token));
                    }
                    return list;
                }, new Fields("word"))
                .groupBy(new Fields("word"))
                .persistentAggregate(RedisMapState.opaque(jedisPoolConfig,options),new Fields("word"),new Count(),new Fields("count")).newValuesStream()
                .peek(new Consumer() {
                    @Override
                    public void accept(TridentTuple input) {
                        System.out.println(input);
                    }
                });

        Config config = new Config();

        new LocalCluster().submitTopology("tridentTopology", config, tridentTopology.build());
    }
}
3.自定义State案例-StateQuery
  • 自定义RedisIpState
public class RedisIpState implements State {

    private JedisPoolConfig jedisPoolConfig;

    public RedisIpState(JedisPoolConfig jedisPoolConfig) {
        this.jedisPoolConfig = jedisPoolConfig;
    }

    @Override
    public void beginCommit(Long txid) {

    }

    @Override
    public void commit(Long txid) {

    }
   public static StateFactory ipSateFactory(JedisPoolConfig jedisPoolConfig){
        return new StateFactory(){

            @Override
            public State makeState(Map<String, Object> conf, IMetricsContext metrics, int partitionIndex, int numPartitions) {
                return new RedisIpState(jedisPoolConfig);
            }
        };
    }

    public List<String> batchRetrive(List<TridentTuple> tuples) {
        Jedis jedis=new Jedis(jedisPoolConfig.getHost(),jedisPoolConfig.getPort());
        List<String> lastIps=new ArrayList<>();
        for (TridentTuple tuple : tuples) {
            String lastIp = jedis.get(tuple.getStringByField("userid"));
            if(lastIp!=null){
                lastIps.add(lastIp);
            }else{
                lastIps.add("");
            }
        }
        jedis.close();
        return lastIps;
    }
}
  • 自定义IpQueryFunction
public class IpQueryFunction extends BaseQueryFunction<RedisIpState, String> {
    @Override
    public List<String> batchRetrieve(RedisIpState state, List<TridentTuple> tuples) {
        return state.batchRetrive(tuples);
    }

    @Override
    public void execute(TridentTuple tuple, String lastIp, TridentCollector collector) {
        System.out.println(tuple);
        collector.emit(new Values(lastIp));
    }
}
  • 构建Topology
public class KafkaTridentTopology {
    public static void main(String[] args) throws Exception {
        TridentTopology tridentTopology = new TridentTopology();

        JedisPoolConfig jedisPoolConfig = new JedisPoolConfig.Builder().
                setHost("CentOSA")
                .setPort(6379).build();
        TridentState ipstate = tridentTopology.newStaticState(RedisIpState.ipSateFactory(jedisPoolConfig));

        //INFO 001 2019:10:10 10:00:00 1.202.251.26
        tridentTopology.newStream("KafkaSpoutOpaque", KafkaSpoutUtils.buildKafkaSpoutOpaque("CentOSA:9092,CentOSB:9092,CentOSC:9092", "topic01"))
                        .project(new Fields("value"))
                        .map((tuple)-> {
                            String value = tuple.getStringByField("value");
                            String[] tokens = value.split("\\s+");
                            return new Values(tokens[1],tokens[4] );
                        },new Fields("userid","ip"))
                        .stateQuery(ipstate,new Fields("userid","ip"),new IpQueryFunction(),new Fields("hip"))
                        .peek(new Consumer() {
                            @Override
                            public void accept(TridentTuple input) {
                                System.out.println(input);
                            }
                        });
        Config config = new Config();

        new LocalCluster().submitTopology("tridentTopology", config, tridentTopology.build());
    }
}
4.自定义State案例-partitionPersist
  • 自定义RedisIpState
public class RedisIpState implements State {
    private JedisPoolConfig jedisPoolConfig;
    public RedisIpState(JedisPoolConfig jedisPoolConfig) {
        this.jedisPoolConfig=jedisPoolConfig;
    }

    @Override
    public void beginCommit(Long txid) {

    }

    @Override
    public void commit(Long txid) {

    }
    public static StateFactory ipUpdateStateFactory(JedisPoolConfig jedisPoolConfig){
        return  new StateFactory() {
            @Override
            public State makeState(Map<String, Object> conf, IMetricsContext metrics, int partitionIndex, int numPartitions) {
                return new RedisIpState(jedisPoolConfig);
            }
        };
    }

    public void batchUpdate(List<TridentTuple> tuples) {
        Jedis jedis=new Jedis(jedisPoolConfig.getHost(),jedisPoolConfig.getPort());
        Pipeline pipelined = jedis.pipelined();

        for (TridentTuple tuple : tuples) {
            pipelined.set(tuple.getStringByField("userid"),tuple.getStringByField("ip"));
        }

        pipelined.sync();
        jedis.close();
    }
}
  • 自定义UserIPSateUpdater
public class UserIPSateUpdater extends BaseStateUpdater<RedisIpState> {
    @Override
    public void updateState(RedisIpState state, List<TridentTuple> tuples, TridentCollector collector) {
        state.batchUpdate(tuples);
    }
}
  • 构建Topology
public class KafkaTridentTopology {
    public static void main(String[] args) throws Exception {
        TridentTopology tridentTopology = new TridentTopology();

        JedisPoolConfig jedisPoolConfig = new JedisPoolConfig.Builder().
                setHost("CentOSA")
                .setPort(6379).build();

        //INFO 001 2019:10:10 10:00:00 1.202.251.26
        tridentTopology.newStream("KafkaSpoutOpaque", KafkaSpoutUtils.buildKafkaSpoutOpaque("CentOSA:9092,CentOSB:9092,CentOSC:9092", "topic01"))
                        .project(new Fields("value"))
                        .map((tuple)-> {
                            String value = tuple.getStringByField("value");
                            String[] tokens = value.split("\\s+");
                            return new Values(tokens[1],tokens[4] );
                        },new Fields("userid","ip"))
                        .partitionPersist(RedisIpState.ipUpdateStateFactory(jedisPoolConfig),
                                new Fields("userid","ip"),new UserIPSateUpdater(),new Fields());
        Config config = new Config();

        new LocalCluster().submitTopology("tridentTopology", config, tridentTopology.build());
    }
}

5.自定义State案例-persistentAggregate
  • 自定义MyMapState
public class MyMapState implements IBackingMap<OpaqueValue<Integer>> {
    private HashMap<String,OpaqueValue<Integer>> db =new HashMap<>();


    public static StateFactory opaqueFactory(){
        return new StateFactory() {
            @Override
            public State makeState(Map<String, Object> conf, IMetricsContext metrics, int partitionIndex, int numPartitions) {
                CachedMap  c = new CachedMap (new MyMapState(), 1024);
                MapState mapState = OpaqueMap.build(c);

                return new SnapshottableMap(mapState,new Values("gloableKeys"));
            }
        };
    }

    @Override
    public List<OpaqueValue<Integer>> multiGet(List<List<Object>> keys) {

        List<OpaqueValue<Integer>> values=new ArrayList<>(keys.size());

        for (List<Object> key : keys) {
            System.out.println(key.get(0));
            OpaqueValue<Integer> histryValue= db.get(key);
            if (histryValue==null){
                values.add(new OpaqueValue<Integer>(-1L,null,null));
            }else{
                values.add(histryValue);
            }
        }

        return values;
    }

    @Override
    public void multiPut(List<List<Object>> keys, List<OpaqueValue<Integer>> vals) {
        for (int i = 0; i < keys.size(); i++) {
            OpaqueValue<Integer> v=vals.get(i);
            db.put(keys.get(i).get(0).toString(),v);
        }
    }
}

-构建Topology

public class KafkaTridentTopology {
    public static void main(String[] args) throws Exception {
        TridentTopology tridentTopology = new TridentTopology();
        tridentTopology.newStream("KafkaSpoutOpaque", KafkaSpoutUtils.buildKafkaSpoutOpaque("CentOSA:9092,CentOSB:9092,CentOSC:9092", "topic01"))
                .project(new Fields("value"))
                .flatMap((tuple) -> {
                    List<Values> list = new ArrayList<>();
                    String[] tokens = tuple.getStringByField("value").split("\\W+");
                    for (String token : tokens) {
                        list.add(new Values(token));
                    }
                    return list;
                }, new Fields("word"))
                .groupBy(new Fields("word"))
                .persistentAggregate(MyMapState.opaqueFactory(),new Fields("word"),new Count(),new Fields("count")).newValuesStream()
                .peek(new Consumer() {
                    @Override
                    public void accept(TridentTuple input) {
                        System.out.println(input);
                    }
                });
        Config config = new Config();

        new LocalCluster().submitTopology("tridentTopology", config, tridentTopology.build());
    }
}
6.窗口案例-1
public class WordCountAggregator extends BaseAggregator<Map<String,Integer>> {
    @Override
    public Map<String, Integer> init(Object batchId, TridentCollector collector) {
        return new HashMap<>();
    }

    @Override
    public void aggregate(Map<String, Integer> val, TridentTuple tuple, TridentCollector collector) {
        String key = tuple.getStringByField("key");
        Integer  count = tuple.getIntegerByField("count");
        Integer historyValue = val.get(key);
        if(historyValue==null){
            val.put(key,count);
        }else{
            val.put(key,historyValue+count);
        }
    }

    @Override
    public void complete(Map<String, Integer> val, TridentCollector collector) {
        for (Map.Entry<String, Integer> entry : val.entrySet()) {
            collector.emit(new Values(entry.getKey(),entry.getValue()));
        }
    }
}


public class TridentWindowDemo {
    public static void main(String[] args) throws Exception {
        TridentTopology tridentTopology = new TridentTopology();

        tridentTopology.newStream("KafkaSpoutOpaque",
                KafkaSpoutUtils.buildKafkaSpoutOpaque("CentOSA:9092,CentOSB:9092,CentOSC:9092", "topic01"))
                .project(new Fields("value"))
                .flatMap((tuple) -> {
                    List<Values> list = new ArrayList<>();
                    String[] tokens = tuple.getStringByField("value").split("\\W+");
                    for (String token : tokens) {
                        list.add(new Values(token));
                    }
                    return list;
                }, new Fields("word"))
                .map((tuple) -> new Values(tuple.getStringByField("word"), 1), new Fields("key", "count"))
                .slidingWindow(
                        BaseWindowedBolt.Duration.seconds(10),
                        BaseWindowedBolt.Duration.seconds(5),
                        new InMemoryWindowsStoreFactory(),
                        new Fields("key","count"),
                        new WordCountAggregator(),
                        new Fields("key","total")
                )
                .peek(new Consumer() {
                    @Override
                    public void accept(TridentTuple input) {
                        System.out.println(input);
                    }
                });

        new LocalCluster().submitTopology("aa",new Config(),tridentTopology.build());
    }
}
7.窗口案例-2
public class WordCountAggregator extends BaseAggregator<Map<String,Integer>> {
    @Override
    public Map<String, Integer> init(Object batchId, TridentCollector collector) {
        return new HashMap<>();
    }

    @Override
    public void aggregate(Map<String, Integer> val, TridentTuple tuple, TridentCollector collector) {
        String key = tuple.getStringByField("key");
        Integer  count = tuple.getIntegerByField("count");
        Integer historyValue = val.get(key);
        if(historyValue==null){
            val.put(key,count);
        }else{
            val.put(key,historyValue+count);
        }
    }

    @Override
    public void complete(Map<String, Integer> val, TridentCollector collector) {
        for (Map.Entry<String, Integer> entry : val.entrySet()) {
            collector.emit(new Values(entry.getKey(),entry.getValue()));
        }
    }
}


public class TridentWindowDemo {
    public static void main(String[] args) throws Exception {
        TridentTopology tridentTopology = new TridentTopology();

        WindowConfig wc= SlidingDurationWindow.of(BaseWindowedBolt.Duration.seconds(10),
                BaseWindowedBolt.Duration.seconds(5));

        tridentTopology.newStream("KafkaSpoutOpaque",
                KafkaSpoutUtils.buildKafkaSpoutOpaque("CentOSA:9092,CentOSB:9092,CentOSC:9092", "topic01"))
                .project(new Fields("value"))
                .flatMap((tuple) -> {
                    List<Values> list = new ArrayList<>();
                    String[] tokens = tuple.getStringByField("value").split("\\W+");
                    for (String token : tokens) {
                        list.add(new Values(token));
                    }
                    return list;
                }, new Fields("word"))
                .map((tuple) -> new Values(tuple.getStringByField("word"), 1), new Fields("key", "count"))
                .window(wc,
                        new InMemoryWindowsStoreFactory(),
                        new Fields("word","count"),
                        new WordCountAggregator(),
                        new Fields("word","total")
                )
                .peek(new Consumer() {
                    @Override
                    public void accept(TridentTuple input) {
                        System.out.println(input);
                    }
                });

        new LocalCluster().submitTopology("aa",new Config(),tridentTopology.build());

    }
}
8.Storm对象序列化
  • Storm-1.x版本

在storm中流动的数据流格式可以多种多样,数据流可以以各种格式的形式在task之间进行传递。因为storm中可以对数据格式自动进行序列化,但是也只是对于一些常见格式能进行序列化,其中包括int, short, long, float, double, bool, byte, string, byte arrays,也就是说用户可以直接在task之间传递这些类型而不需要做其它的操作,但是对于一些其它类型,或是自己定义的一些类型(比如要在task之间传递一个对象格式),就需要自己进行序列化.

    public class UserOrder implements Serializable {
        private Integer userid;
        private String username;
        private String itemname;
        private double cost;
        ...
    }

    Config conf = new Config();
    //如果是Storm-1.x版本,需要在Tuple中传递实体类,需要注册改实体类
    //目前测试版本是Storm-2.0.0,不需注册序列化
    conf.registerSerialization(UserOrder.class);
    StormSubmitter.submitTopology("localDemo",conf,tridentTopology.build());

目前如果使用最新的Storm-2.0 用户只需要自定义实体类型实现序列化接口即可,无需注册序列化。

(四)Trident状态管理

Stream API

Stream API是Storm的另一种替代接口。它提供了一种用于表达流式计算的类型化API,并支持功能样式操作。

快速入门
StreamBuilder builder = new StreamBuilder();

KafkaSpout<String, String> spout = KafkaSpoutUtils.buildKafkaSpout("CentOSA:9092,CentOSB:9092,CentOSC:9092",
                                                                   "topic01");
builder.newStream(spout, TupleValueMappers.<Long,String,String>of(1,3,4),3)
    .peek(new Consumer<Tuple3<Long, String, String>>() {
        @Override
        public void accept(Tuple3<Long, String, String> input) {
            System.out.println(input._1+" "+input._2+" "+input._3);
        }
    });

Config conf = new Config();
LocalCluster localCluster = new LocalCluster();
localCluster.submitTopology("stream",conf,
                            builder.build());

Stream APIs

基本转换
builder.newStream(spout, TupleValueMappers.<Long,String,String>of(1,3,4),3)
    .filter((t)-> t._3.contains("error"))
    .peek((t)-> System.out.println(t));
 builder.newStream(spout, TupleValueMappers.<Long,String,String>of(1,3,4),3)
                .map((t)-> t._3)
     .peek((t)-> System.out.println(t));
builder.newStream(spout, TupleValueMappers.<Long,String,String>of(1,3,4),3)
                .flatMap(t-> Arrays.asList(t._3.split("\\s")))
                .map(t-> Pair.<String,Integer>of(t,1))
                .peek(t -> System.out.println(t));
窗口操作
builder.newStream(spout, TupleValueMappers.<Long,String,String>of(1,3,4),3)
        .flatMap(t-> Arrays.asList(t._3.split("\\s")))
        .map(t-> Pair.<String,Integer>of(t,1))
        .window(SlidingWindows.of(BaseWindowedBolt.Duration.seconds(5),BaseWindowedBolt.Duration.seconds(2)))
        .peek(input->System.out.println(input));
针对KeyValue pair 转换
  • flatMapToPair (等价 flatMap+mapToPair)
builder.newStream(spout, TupleValueMappers.<Long, String, String>of(1, 3, 4), 3)
                .flatMapToPair(t -> {
                    String[] tokens = t._3.split("\\s");
                    List<Pair<String,Integer>> pairList=new ArrayList<>();
                    for (String token : tokens) {
                        pairList.add(Pair.<String, Integer>of(token, 1));
                    }
                    return pairList;
                }).peek(t -> System.out.println(t));
  • mapToPair
builder.newStream(spout, TupleValueMappers.<Long, String, String>of(1, 3, 4), 3)
                .flatMap(v->Arrays.asList(v._3.split("\\s+")))
                .mapToPair(t -> Pair.<String, Integer>of(t, 1))
                .peek(t -> System.out.println(t));
聚合
  • 单个值聚合
builder.newStream(spout, TupleValueMappers.<Long, String, String>of(1, 3, 4), 3)
    .flatMap(v->Arrays.asList(v._3.split("\\s+")))
    .map(t-> Integer.parseInt(t))
    .window(SlidingWindows.of(BaseWindowedBolt.Duration.seconds(5),BaseWindowedBolt.Duration.seconds(2)))
    //.reduce((v1,v2)->v1+v2)
    .aggregate(0,(v1,v2)->v1+v2,(v1,v2)->v1+v2)
    .peek(t -> System.out.println(t));
  • 聚合key-value
builder.newStream(spout, TupleValueMappers.<Long, String, String>of(1, 3, 4), 3)
        .flatMap(v->Arrays.asList(v._3.split("\\s+")))
        .mapToPair(t-> Pair.<String,Integer>of(t,1))
        .window(SlidingWindows.of(BaseWindowedBolt.Duration.seconds(5),BaseWindowedBolt.Duration.seconds(2)))
        .reduceByKey((v1,v2)->v1+v2)
        //.aggregateByKey(0,(v1,v2)->v1+v2,(v1,v2)->v1+v2)
        .peek(t -> System.out.println(t));
  • groupBy
 builder.newStream(spout, TupleValueMappers.<Long, String, String>of(1, 3, 4), 3)
                .flatMap(v->Arrays.asList(v._3.split("\\s+")))
                .mapToPair(t-> Pair.<String,Integer>of(t,1))
                .window(SlidingWindows.of(BaseWindowedBolt.Duration.seconds(5),BaseWindowedBolt.Duration.seconds(2)))
                .groupByKey()
                .map(t-> {
                    int total=0;
                    for (Integer integer : t._2) {
                        total+=integer;
                    }
                    return Pair.<String,Integer>of(t._1,total);
                }).peek(t-> System.out.println(t));
  • countByKey
 builder.newStream(spout, TupleValueMappers.<Long, String, String>of(1, 3, 4), 3)
                .flatMap(v->Arrays.asList(v._3.split("\\s+")))
                .mapToPair(t-> Pair.<String,Integer>of(t,1))
                .window(SlidingWindows.of(BaseWindowedBolt.Duration.seconds(5),BaseWindowedBolt.Duration.seconds(2)))
                .countByKey().peek(t-> System.out.println(t));
重新分区

重新分区操作会重新分区当前流并返回具有指定分区数的新流。对结果流的进一步操作将在该并行级别上执行。重新分区可用于增加或减少流中操作的并行性。

builder.newStream(spout, TupleValueMappers.<Long, String, String>of(1, 3, 4), 3)
    .flatMap(v->Arrays.asList(v._3.split("\\s+")))
    .repartition(4)
    .mapToPair(t-> Pair.<String,Integer>of(t,1))
    .repartition(2)
    .window(SlidingWindows.of(BaseWindowedBolt.Duration.seconds(5),BaseWindowedBolt.Duration.seconds(2)))
    .countByKey().peek(t-> System.out.println(t));

注意:repartition操作会产生网络操作- shuffle

输出算子-Sinks

print和peek
builder.newStream(spout, TupleValueMappers.<Long, String, String>of(1, 3, 4), 3)
                .flatMap(v->Arrays.asList(v._3.split("\\s+")))
                .repartition(4)
                .mapToPair(t-> Pair.<String,Integer>of(t,1))
                .repartition(2)
                .window(SlidingWindows.of(BaseWindowedBolt.Duration.seconds(5),BaseWindowedBolt.Duration.seconds(2)))
                .countByKey().print();

print 返回值是void表示流的终止,后续无法追加算子。而Peek作为程序执行探针,用于debug调试,并不影响程序正常执行的流程。

forEach
builder.newStream(spout, TupleValueMappers.<Long, String, String>of(1, 3, 4), 3)
        .flatMap(v->Arrays.asList(v._3.split("\\s+")))
        .repartition(4)
        .mapToPair(t-> Pair.<String,Integer>of(t,1))
        .repartition(2)
        .window(SlidingWindows.of(BaseWindowedBolt.Duration.seconds(5),BaseWindowedBolt.Duration.seconds(2)))
        .countByKey()
        .forEach(t-> System.out.println(t));
to
 JedisPoolConfig jedisPoolConfig = new JedisPoolConfig.Builder().setHost("CentOSA").setPort(6379).build();
KafkaSpout<String, String> spout = KafkaSpoutUtils.buildKafkaSpout("CentOSA:9092,CentOSB:9092,CentOSC:9092",
                                                                   "topic01");
builder.newStream(spout, TupleValueMappers.<Long, String, String>of(1, 3, 4), 3)
    .flatMap(v->Arrays.asList(v._3.split("\\s+")))
    .repartition(4)
    .mapToPair(t-> Pair.<String,Integer>of(t,1))
    .repartition(2)
    .window(SlidingWindows.of(BaseWindowedBolt.Duration.seconds(5),BaseWindowedBolt.Duration.seconds(2)))
    .countByKey()
    .to(new RedisStoreBolt(jedisPoolConfig,new WordCountRedisStoreMapper()));

WordCountRedisStoreMapper

public class WordCountRedisStoreMapper implements RedisStoreMapper {
    @Override
    public RedisDataTypeDescription getDataTypeDescription() {
        return new RedisDataTypeDescription(RedisDataTypeDescription.RedisDataType.HASH,"swc");
    }

    @Override
    public String getKeyFromTuple(ITuple tuple) {
        System.out.println(tuple.getFields());//默认field key,value
        return tuple.getString(0);
    }

    @Override
    public String getValueFromTuple(ITuple tuple) {
        return tuple.getLong(1)+"";
    }
}

分支算子

branch
 KafkaSpout<String, String> spout = KafkaSpoutUtils.buildKafkaSpout("CentOSA:9092,CentOSB:9092,CentOSC:9092",
                "topic01");
        Stream<Tuple3<Long, String, String>>[] streams = builder.newStream(spout, TupleValueMappers.<Long, String, String>of(1, 3, 4), 3)
                .branch(
                        t-> t._3.contains("info"),
                        t->t._3.contains("error"),
                        t-> true
                );
        Stream<Tuple3<Long, String, String>> infoStream = streams[0];
        Stream<Tuple3<Long, String, String>> errorStream = streams[1];
        Stream<Tuple3<Long, String, String>> otherStream = streams[2];

        infoStream.peek(t -> System.out.println("info:"+t));
        errorStream.peek(t -> System.out.println("error:"+t));
        otherStream.peek(t -> System.out.println("other:"+t));

Join

join操作将一个流的值与来自另一个流的具有相同键的值连接起来。

KafkaSpout<String, String> userSpout = KafkaSpoutUtils.buildKafkaSpout("CentOSA:9092,CentOSB:9092,CentOSC:9092",
                                                                       "usertopic");
KafkaSpout<String, String> orderSpout = KafkaSpoutUtils.buildKafkaSpout("CentOSA:9092,CentOSB:9092,CentOSC:9092",
                                                                        "ordertopic");
//001 zhangsan
PairStream<String, String> userPair = builder.newStream(userSpout, TupleValueMappers.<Long, String, String>of(1, 3, 4))
    .mapToPair(t -> {
        String[] tokens = t._3.split("\\s");
        return  Pair.<String, String>of(tokens[0], tokens[1]);
    });
//001 apple 100
PairStream<String, String> orderPair = builder.newStream(orderSpout, TupleValueMappers.<Long, String, String>of(1, 3, 4))
    .mapToPair(t -> {
        String[] tokens = t._3.split("\\s");
        return  Pair.<String, String>of(tokens[0], tokens[1]+":"+tokens[2]);
    });
userPair.window(TumblingWindows.of(BaseWindowedBolt.Duration.seconds(5)))
    .leftOuterJoin(orderPair).peek(t -> System.out.println(t));

CoGroupByKey

coGroupByKey使用其他流中具有相同键的值对此流的值进行分组。

KafkaSpout<String, String> userSpout = KafkaSpoutUtils.buildKafkaSpout("CentOSA:9092,CentOSB:9092,CentOSC:9092",
                                                                       "usertopic");
KafkaSpout<String, String> orderSpout = KafkaSpoutUtils.buildKafkaSpout("CentOSA:9092,CentOSB:9092,CentOSC:9092",
                                                                        "ordertopic");
//001 zhangsan
PairStream<String, String> userPair = builder.newStream(userSpout, TupleValueMappers.<Long, String, String>of(1, 3, 4))
    .mapToPair(t -> {
        String[] tokens = t._3.split("\\s");
        return  Pair.<String, String>of(tokens[0], tokens[1]);
    });
//001 apple 100
PairStream<String, String> orderPair = builder.newStream(orderSpout, TupleValueMappers.<Long, String, String>of(1, 3, 4))
    .mapToPair(t -> {
        String[] tokens = t._3.split("\\s");
        return  Pair.<String, String>of(tokens[0], tokens[1]+":"+tokens[2]);
    });
userPair.coGroupByKey(orderPair).peek(t-> System.out.println(t));

State

updateStateByKey
 builder.newStream(userSpout, TupleValueMappers.<Long, String, String>of(1, 3, 4))
         .map(t->t._3)
         .flatMap(line-> Arrays.asList(line.split("\\s+")))
         .mapToPair(word-> Pair.<String,Integer>of(word,1))
         .updateStateByKey(0,(v1,v2)->v1+v2)
         .toPairStream()
         .peek( t -> System.out.println(t));

Config conf = new Config();
conf.put(Config.TOPOLOGY_STATE_PROVIDER,"org.apache.storm.redis.state.RedisKeyValueStateProvider");
Map<String,Object> stateConfig=new HashMap<String,Object>();
Map<String,Object> redisConfig=new HashMap<String,Object>();
redisConfig.put("host","CentOSA");
redisConfig.put("port",6379);
stateConfig.put("jedisPoolConfig",redisConfig);
ObjectMapper objectMapper=new ObjectMapper();
conf.put(Config.TOPOLOGY_STATE_PROVIDER_CONFIG,objectMapper.writeValueAsString(stateConfig));
stateQuery
KafkaSpout<String, String> userSpout = KafkaSpoutUtils.buildKafkaSpout("CentOSA:9092,CentOSB:9092,CentOSC:9092",
                                                                       "topic01");

StreamState<String, Integer> streamState = builder.newStream(userSpout, TupleValueMappers.<Long, String, String>of(1, 3, 4))
    .map(t -> t._3)
    .flatMap(line -> Arrays.asList(line.split("\\s+")))
    .mapToPair(word -> Pair.<String, Integer>of(word, 1))
    .updateStateByKey(0, (v1, v2) -> v1 + v2);


KafkaSpout<String, String> querySpout = KafkaSpoutUtils.buildKafkaSpout("CentOSA:9092,CentOSB:9092,CentOSC:9092",
                                                                        "topic02");

builder.newStream(querySpout, TupleValueMappers.<Long, String, String>of(1, 3, 4))
    .map(t -> t._3)
    .stateQuery(streamState).peek(t-> System.out.println(t));
  • 3
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值