Storm总体与代码结构和原生调度器、编程基础

最新推荐文章于 2021-09-16 16:41:04 发布

风吹海洋浪

最新推荐文章于 2021-09-16 16:41:04 发布

阅读量503

点赞数

分类专栏： # 流计算平台Storm

本文链接：https://blog.csdn.net/Taylor_Ocean/article/details/109622044

版权

流计算平台Storm 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

Storm总体与代码结构和原生调度器、编程基础

1.总体与代码结构
2.原生调度器
3.编程基础

1.总体与代码结构

1.1总体结构

在这里插入图片描述

在这里插入图片描述
上图参考：https://www.cnblogs.com/ahu-lichang/p/6871920.html

一个进程worker（物理jvm）只为一个topology服务，即 1个worker进程执行的是1个topology的子集（注：不会出现1个worker为多个topology服务
进程worker（物理jvm）运行在Supervisor（服务器：物理机器）开的端口（一个 storm 集群默认配置的 worker 就是 default.yaml 或者 storm.yaml 中的 supervisor.slots.port。对应的是conf/storm.yaml 中的 supervisor.slot 的数量）上，Node+port = slot（插槽）。一个slot只能被一个worker占用。
每个Slot根据配置拥有一定的资源,具体为CPU资源和内存资源。
Executors(threads)（执行器）一个 Executor 可以运行一个“组件” 内的多个 tasks（一个 task 就是一个节点类 soupt 或者 bolt 的实例对象）
一个executor中只能运行同一个组件的多个task（即一个组件的多个实例对象）
Storm 默认一个每个 executor 分配一个 task。
Executor的表现形式为[1-1],[2-2]，中括号内的数字代表该Executor中的起始Task id到末尾Task id（表达一个Executor中运行着几个task任务）， 1个Worker中运行多少个Executor就用形式表示为在外面加个大括号{[1-1],[2-2]}
Topology 中每一个计算组件（ Spout 和 Bolt）都有一个并行执行度，在创建 Topology 时可以进行指定，Storm 会在集群内分配对应并行度个数的线程来同时执行这一组件。
组件，Storm中的每个组件就是指一类Spout或1个类型的Bolt
注意：在LocalMode下不管设置几个worker，最终都只有一个worker进程。
同一个运算符的一个或多个任务task被分组为执行器Executors，一个执行器Executors是最小的可调度单位
tuple，values，fields：
declareOutputFields()声明了该spout或者bolt的输出的消息模式（消息格式），new Fields(“word”）利用fields方法将传入的string数组转化为list 列，输出就是一列，字段名为word。
Tuple本应该是一个Key-Value的Map, 由于各个组件之间的传递的tuple字段名称declareOutputFields()已经实现预定好了，所以Tuple只需要按序填入各个Value,所以就是一个Value List。
nimbus、supervisor都是无状态的（所有的状态信息都存放在zookeeper中来进行管理）
task代表最大并发度，一个组件的task数量在指定后运行之后是不会改变的，但是一个拓扑的组件的executor数目、worker数目可以变化（使用storm rebalance命令）

在这里插入图片描述

参考：https://www.cnblogs.com/xidianzxm/p/10751259.html

Nimbus

Supervisor

Worker

Executor

Task

并行度

拓扑示例

下面我们定义一个名为mytopology的拓扑，由一个Spout组件(BlueSpout)、两个Bolt组件(GreenBolt和YellowBolt)共三个组件构成，代码如下：
在这里插入图片描述
mytopology拓扑的描述如下：

1、拓扑将使用两个工作进程(Worker)。

2、Spout是id为“blue-spout”、并行度为2的BlueSpout实例(产生两个执行器和两个任务)。

3、第一个Bolt的id为"green-bolt"、并行度为2、任务数为4、使用随机分组方式接收"blue-spout"所发射元组的GreenBolt实例(产生两个执行器和4个任务)。

4、第二个Bolt是id为"yellow-bolt"、并行度为6、使用随机分组方式接收"green-bolt"所发射元组的YellowBolt实例(产生6个执行器和6个任务)。

综上所述，该拓扑一共有两个工作进程(Worker)，2+2+6=10个执行器(Executor)，2+4+6=12个任务。因此，每个工作进程可以分配到10/2=5个执行器，12/2=6个任务。默认情况下，一个执行器执行一个任务，但是如果指定了任务的数目，则任务会平均分配到执行器中，因此，GreenBolt的实例"green-bolt"的一个执行器将会分配到4/2个任务。

mytopology的拓扑及其对应的资源分配如下图所示：
在这里插入图片描述

并行度这个概念分为不同层面的：

Storm的并行度分为Topology并行度和组件并行度：
组件并行度通过Executor实现,Topology并行度则通过Worker实现。

worker的并行度

（topo任务拓扑所需要的worker数量）
工作进程的数量，配置参数Conf.setNumworker（）

Executor并行度

在这里插入图片描述
上述代码的并行度2：是指定某个组件的并行度就是在改变executor线程的数量。
Storm中也有一个参数来控制topology的并行数量： TOPOLOGY_MAX_TASK_PARALLELISM: 这个参数可以控制一个组件上Executor的最大数量。它通常用来在本地模式测试topology的最大线程数量。当然我们也可以在代码中设置：

config.setMaxTaskParallelism().

组件的并行度

executor线程在执行期间会调用该task的nextTuple或execute方法）。
topology启动后，1个component(spout或bolt)的task数目是固定不变的，但该component使用的executor线程数可以动态调整（例如：1个executor线程可以执行该component的1个或多个task实例）。
这意味着，对于1个component存在这样的条件：#threads<=#tasks（即：线程数小于等于task数目）。默认情况下task的数目等于executor线程数目，即1个executor线程只运行1个task。

原文链接：https://blog.csdn.net/xingchenhy/article/details/75085550

动态设置组件并行度

Storm中一个很好的特性就是可以在topology运行期间动态调制worker进程或Executor线程的数量而不需要重启topology。这种机制被称作rebalancing。
只能借助ui或者命令行进行动态设置组件的worker和executor的并行度：
在这里插入图片描述
参考：https://blog.csdn.net/xingchenhy/article/details/75085550?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-4.channel_param&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-4.channel_param

Topology

就是有向无环图，由一系列通过数据流向关联的spout和bolt组成的拓扑结构，就是一个作业

消息分发策略

Storm的Ack消息框架

参考原文：https://blog.csdn.net/hzk_wen/article/details/53153632?utm_medium=distribute.pc_relevant.none-task-blog-title-2&spm=1001.2101.3001.4242

Storm 消息机制
Storm 主要提供了两种消息保证机制（Message Processing Guarantee）

至少一次 At least once
仅且一次 exactly once
其中 exactly once 是通过 Trident 方式实现的（exactly once through Trident）。两种模式的选择要视业务情况而定，有些场景要求精确的仅且一次消费，比如订单处理，决不能允许重复的处理订单，因为很可能会导致订单金额、交易手数等计算错误；有些场景允许一定的重复，比如页面点击统计，访客统计等。总之，不管何种模式，Storm 都能保证数据不会丢失，开发者需要关心的是，如何保证数据不会重复消费。

At least once 的消息处理机制，在运用时需要格外小心，Storm 采用 ack/fail 机制来追踪消息的流向，当一个消息（tuple）发送到下游时，如果超时未通知 spout，或者发送失败，Storm 默认会根据配置策略进行重发，可通过调节重发策略来尽量减少消息的重复发送。一个常见情况是，Storm 集群经常会超负载运行，导致下游的 bolt 未能及时 ack，从而导致 spout 不断的重发一个 tuple，进而导致消息大量的重复消费。
在与 Kafka 集成时，常用 Storm 提供的 kafkaSpout 作为 spout 消费 kafka 中的消息。Storm 提供的 kafkaSpout 默认有两种实现方式：至少一次消费的 core Storm spouts 和仅且一次消费的 Trident spouts ：（We support both Trident and core Storm spouts）。

在 Storm 里面，消息的处理，通过两个组件进行：spout 和 bolt。其中 spout 负责产生数据，bolt 负责接收并处理数据，业务逻辑代码一般都写入 bolt 中。可以定义多个 bolt ，bolt 与 bolt 之间可以指定单向链接关系。通常的作法是，在 spout 里面读取诸如 kafka，mysql，redis，elasticsearch 等数据源的数据，并发射（emit）给下游的 bolt，定义多个 bolt，分别进行多个不同阶段的数据处理，比如第一个 bolt 负责过滤清洗数据，第二个 bolt 负责逻辑计算，并产生最终运算结果，写入 redis，mysql，hdfs 等目标源。

Storm 将消息封装在一个 Tuple 对象里，Tuple 对象经由 spout 产生后通过 emit() 方法发送给下游 bolt，下游的所有 bolt 也同样通过 emit() 方法将 tuple 传递下去。一个 tuple 可能是一行 mysql 记录，也可能是一行文件内容，具体视 spout 如何读入数据源，并如何发射给下游。

如下图，是一个 spout/bolt 的执行过程：
这里写图片描述

spout/bolt 的执行过程
spout -> open(pending状态) -> nextTuple -> emit -> bolt -> execute -> ack(spout) / fail(spout) -> message-provider 将该消息移除队列(complete) / 将消息重新压回队列

ACK/Fail
上文说到，Storm 保证了数据不会丢失，ack/fail 机制便是实现此机制的法宝。Storm 在内部构建了一个 tuple tree 来表示每一个 tuple 的流向，当一个 tuple 被 spout 发射给下游 bolt 时，默认会带上一个 messageId，可以由代码指定但默认是自动生成的，当下游的 bolt 成功处理 tuple 后，会通过 acker 进程通知 spout 调用 ack 方法，当处理超时或处理失败，则会调用 fail 方法。当 fail 方法被调用，消息可能被重发，具体取决于重发策略的配置，和所使用的 spout。

对于一个消息，Storm 提出了『完全处理』的概念。即一个消息是否被完全处理，取决于这个消息是否被 tuple tree 里的每一个 bolt 完全处理，当 tuple tree 中的所有 bolt 都完全处理了这条消息后，才会通知 acker 进程并调用该消息的原始发射 spout 的 ack 方法，否则会调用 fail 方法。

ack/fail 只能由创建该 tuple 的 task 所承载的 spout 触发
默认情况下，Storm 会在每个 worker 进程里面启动1个 acker 线程，以为 spout/bolt 提供 ack/fail 服务，该线程通常不太耗费资源，因此也无须配置过多，大多数情况下1个就足够了。
这里写图片描述

通信机制

worker进程间通信原理

在这里插入图片描述

Worker 间通信
大家看上图，一个 worker 进程装配了如下几个元件：

一个 receive 线程，该线程维护了一个 ArrayList，负责接收其他 worker 的 sent 线程发送过来的数据，并将数据存储到 ArrayList 中。数据首先存入 receive 线程的一个缓冲区，可通过 topology.receiver.buffer.size （此项配置在 Storm 1.0 版本以后被删除了）来配置该缓冲区存储消息的最大数量，默认为8（个数，并且得是2的倍数），然后才被推送到 ArrayList 中。receive 线程接收数据，是通过监听 TCP的端口，该端口有 storm 配置文件中 supervisor.slots.prots 来配置，比如 6700；
一个 sent 线程，该线程维护了一个消息队列，负责将队里中的消息发送给其他 worker 的 receive 线程。同样具有缓冲区，可通过 topology.transfer.buffer.size 来配置缓冲区存储消息的最大数量，默认为1024（个数，并且得是2的倍数）。当消息达到此阈值时，便会被发送到 receive 线程中。sent 线程发送数据，是通过一个随机分配的TCP端口来进行的。
一个或多个 executor 线程。executor 内部同样拥有一个 receive buffer 和一个 sent buffer，其中 receive buffer 接收来自 receive 线程的的数据，sent buffer 向 sent 线程发送数据；而 task 线程则介于 receive buffer 和 sent buffer 之间。receive buffer 的大小可通过 Conf.TOPOLOGY_EXECUTOR_RECEIVE_BUFFER_SIZE 参数配置，sent buffer 的大小可通过 Config.TOPOLOGY_EXECUTOR_SEND_BUFFER_SIZE 配置，两个参数默认都是 1024（个数，并且得是2的倍数）。

Config conf = new Config();
conf.put(Config.TOPOLOGY_RECEIVER_BUFFER_SIZE, 16); // 默认8
conf.put(Config.TOPOLOGY_TRANSFER_BUFFER_SIZE, 32);
conf.put(Config.TOPOLOGY_EXECUTOR_RECEIVE_BUFFER_SIZE, 16384);
conf.put(Config.TOPOLOGY_EXECUTOR_SEND_BUFFER_SIZE, 16384);

参考
http://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal-message-buffers/

worker进程内通信原理

线程之间共享数据，storm没有使用锁，利用了环形队列缓冲区
Disruptor一种线程之间信息无所交换的技术，它是一个queue，实现了队列的功能，有界长度队列。
原理就是采用所有访问者都记录自己的序号，指定可操作消费的每一条数据。
底层实现是单个数据结构，一个ring buffer

容错机制

集群节点宕机

nimbus节点宕机：

单点故障
1.0版本后，storm的nimbus是高可用的

非nimbus节点宕机：

故障时，该节点的所有task任务都会超时，nimbus会讲这些task任务进行重新分配到其他的服务器机器上运行

进程故障

在这里插入图片描述

任务级容错

在这里插入图片描述
topology提交，会自动启动系统级别的acker线程（task），用于消息跟踪的

消息的完整性

消息的完整性原理

每个spout可能会产生并发送出的tuple，这些挥别bolt处理出更多的tuple，构成树
在这里插入图片描述
acker跟踪如何去判断是否处理成功或者失败呢？

Java模拟实现消息重传

为了消息的完整性，如果消息传错了，则需要重传

1.2代码结构

2.原生调度器

Scheduler是Storm的调度器，它负责为Topology分配当前集群中可用的资源。 Storm定义了IScheduler接口，用户可以通过实现该接口来定义自己的Scheduler。
IScheduler主要涉及两个方法:

1.prepare方法：它接收当前Nimbus的Storm配置作为参数，以进行一些初始化

2.scheduler方法：它是真正进行任务分配的方法。在Nimbus进行任务分配的时候会调用该方法。它的参数包括topologies和cluster。前者含有了当前集群中所有的Topology信息，后者则代表当前集群，其中包含用户自定义调度逻辑时所需的所有资源，包括Supervisor信息、当前可用的所有slot, 以及任务分配情况等。

public interface IScheduler {
    
    void prepare(Map conf);
    
    /**
     * Set assignments for the topologies which needs scheduling. The new assignments is available 
     * through <code>cluster.getAssignments()</code>
     *
     *@param topologies all the topologies in the cluster, some of them need schedule. Topologies object here 
     *       only contain static information about topologies. Information like assignments, slots are all in
     *       the <code>cluster</code>object.
     *@param cluster the cluster these topologies are running in. <code>cluster</code> contains everything user
     *       need to develop a new scheduling logic. e.g. supervisors information, available slots, current 
     *       assignments for all the topologies etc. User can set the new assignment for topologies using
     *       <code>cluster.setAssignmentById</code>
     */
    void schedule(Topologies topologies, Cluster cluster);
}

2.1任务调度策略

*大量研究致力于优化任务分配策略,也就是将Executor分配给Worker和*将Worker分配给Worker node（即slot）的过程。
很多时候默认一个executor中只放一个task，
将executor分配给worker，也就相当于将task任务分配给worker
然后在将worker分配给相应的节点中的slot槽中。

提出两种离线任务分配策略 & 在线任务分配策略 & 还有一些致力于找到最优的并行度配置’

storm任务调度策略
参考：http://www.voidcn.com/article/p-poodoqeh-box.html

2.2调度器解析

EvenScheduler和DefaultScheduler

IsolationScheduler

MultitenantScheduler

RAS-ResourceAwareScheduler

package backtype.storm.scheduler.resource;

import java.io.IOException;
import java.util.Collection;
import java.util.Map;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import backtype.storm.scheduler.Cluster;
import backtype.storm.scheduler.ExecutorDetails;
import backtype.storm.scheduler.IScheduler;
import backtype.storm.scheduler.Topologies;
import backtype.storm.scheduler.TopologyDetails;
import backtype.storm.scheduler.resource.ResourceUsageServer.ResourceUsageServer;
import backtype.storm.scheduler.resource.Strategies.ResourceAwareStrategy;

public class ResourceAwareScheduler implements IScheduler {
	private static final Logger LOG = LoggerFactory
			.getLogger(ResourceAwareScheduler.class);
	@SuppressWarnings("rawtypes")
	private Map _conf;

	@Override
	public void prepare(Map conf) {
		_conf = conf;
	}

	@Override
	public void schedule(Topologies topologies, Cluster cluster) {
		LOG.info("\n\n\nRerunning ResourceAwareScheduler...");

		if(topologies.getTopologies().size()>0) {
			ResourceUsageServer rs = ResourceUsageServer.getInstance("ResourceAwareScheduler");
		}		
		GlobalResources globalResources = new GlobalResources(cluster, topologies);
		GlobalState globalState = GlobalState.getInstance("ResourceAwareScheduler");
		globalState.updateInfo(cluster, topologies, globalResources);
		GetStats gs = GetStats.getInstance("ResourceAwareScheduler");		
		gs.getStatistics();
		
		resourceAwareScheduling(topologies, cluster, globalState, globalResources);
		
		globalState.storeState(cluster, topologies, globalResources);
		
		LOG.info("GlobalState:\n{}", globalState);

		LOG.info("GlobalResources: \n{}\n", globalResources);
		HelperFuncs.printNodeResources(globalState.nodes);
		
		

	}
	
	public void resourceAwareScheduling(Topologies topos, Cluster cluster, GlobalState globalState, GlobalResources globalResources) {
	    for (TopologyDetails td : topos.getTopologies()) {
	      String topId = td.getId();
	      Map<Node, Collection<ExecutorDetails>> taskToNodesMap;
	      if (cluster.needsScheduling(td) && cluster.getUnassignedExecutors(td).size()>0) {
	        LOG.info("/********Scheduling topology {} ************/", topId);
	        int totalTasks = td.getExecutors().size();
	        int executorsNotRunning = cluster.getUnassignedExecutors(td).size();
	        LOG.info(
	            "Total number of executors: {} " +
	            "Total number of Unassigned Executors: {}",
	            totalTasks, executorsNotRunning);
	        LOG.info("executors that need scheduling: {}",
	            cluster.getUnassignedExecutors(td));
	        
	        ResourceAwareStrategy rs = new ResourceAwareStrategy(globalState, globalResources, null, td, cluster, topos);
	        taskToNodesMap = rs.schedule(td,
	            cluster.getUnassignedExecutors(td));
	        
	        if (taskToNodesMap != null) {
	          try {
	            for (Map.Entry<Node, Collection<ExecutorDetails>> entry :
	                taskToNodesMap.entrySet()) {
	                entry.getKey().assign(td.getId(), entry.getValue(),
	                    cluster);
	                LOG.info("ASSIGNMENT    TOPOLOGY: {}  TASKS: {} To Node: "
	                    + entry.getKey().getId() + " Slots left: "
	                    + entry.getKey().totalSlotsFree(), td.getId(),
	                    entry.getValue());
	            }
	            LOG.info("Toplogy: {} assigned to {} nodes", td.getId(), taskToNodesMap.keySet().size());
	            
	            HelperFuncs.setTopoStatus(td.getId(),"Fully Scheduled");
	          } catch (IllegalStateException ex) {
	            LOG.error(ex.toString());
	            LOG.error("Unsuccessfull in scheduling topology {}", td.getId());
	            HelperFuncs.setTopoStatus(td.getId(), "Unsuccessfull in scheduling topology");
	          }
	        } else {
	          LOG.error("Unsuccessfull in scheduling topology {}", td.getId());
	          HelperFuncs.setTopoStatus(td.getId(), "Unsuccessfull in scheduling topology");
	        }
	      } else {
	    	  HelperFuncs.setTopoStatus(td.getId(),"Fully Scheduled");
	      }
	    }
	  }

}

2.3自定义调度器

实现自定义的调度器用到的就是Strom 提供的可定制任务调度策略（Pluggable Scheduler）：
Pluggable Schedule：可插拔式的任务分配器：
就是编写自己的task分配算法，实现自己的调度器来替代默认的调度器去分配executors给workers。在storm.yaml文件里指定storm.scheduler，自定义的调度器要实现IScheduler接口。
参考：http://www.voidcn.com/article/p-dzzjsmyv-zs.html

DirectScheduler（来源网上）

Storm自定义调度器实现–DirectScheduler
参考：http://www.voidcn.com/article/p-hspkykfd-pe.html
实现代码：
参考：https://www.jianshu.com/p/664a82bf699e

Storm自定义实现直接分配调度器来自：Storm自定义调度器实现–DirectScheduler

package storm;

import java.util.Collection;
import java.util.List;
import java.util.Map;

import backtype.storm.scheduler.Cluster;
import backtype.storm.scheduler.EvenScheduler;
import backtype.storm.scheduler.ExecutorDetails;
import backtype.storm.scheduler.IScheduler;
import backtype.storm.scheduler.SchedulerAssignment;
import backtype.storm.scheduler.SupervisorDetails;
import backtype.storm.scheduler.Topologies;
import backtype.storm.scheduler.TopologyDetails;
import backtype.storm.scheduler.WorkerSlot;

/**
 * This demo scheduler make sure a spout named <code>special-spout</code> in topology <code>special-topology</code> runs
 * on a supervisor named <code>special-supervisor</code>. supervisor does not have name? You can configure it through
 * the config: <code>supervisor.scheduler.meta</code> -- actually you can put any config you like in this config item.
 * 
 * In our example, we need to put the following config in supervisor's <code>storm.yaml</code>:
 * <pre>
 *     # give our supervisor a name: "special-supervisor"
 *     supervisor.scheduler.meta:
 *       name: "special-supervisor"
 * </pre>
 * 
 * Put the following config in <code>nimbus</code>'s <code>storm.yaml</code>:
 * <pre>
 *     # tell nimbus to use this custom scheduler
 *     storm.scheduler: "storm.DemoScheduler"
 * </pre>
 * @author xumingmingv May 19, 2012 11:10:43 AM
 */
public class DemoScheduler implements IScheduler {
    public void prepare(Map conf) {}

    public void schedule(Topologies topologies, Cluster cluster) {
    	System.out.println("DemoScheduler: begin scheduling");
        // Gets the topology which we want to schedule
        TopologyDetails topology = topologies.getByName("special-topology");

        // make sure the special topology is submitted,
        if (topology != null) {
            boolean needsScheduling = cluster.needsScheduling(topology);

            if (!needsScheduling) {
            	System.out.println("Our special topology DOES NOT NEED scheduling.");
            } else {
            	System.out.println("Our special topology needs scheduling.");
                // find out all the needs-scheduling components of this topology
                Map<String, List<ExecutorDetails>> componentToExecutors = cluster.getNeedsSchedulingComponentToExecutors(topology);
                
                System.out.println("needs scheduling(component->executor): " + componentToExecutors);
                System.out.println("needs scheduling(executor->compoenents): " + cluster.getNeedsSchedulingExecutorToComponents(topology));
                SchedulerAssignment currentAssignment = cluster.getAssignmentById(topologies.getByName("special-topology").getId());
                if (currentAssignment != null) {
                	System.out.println("current assignments: " + currentAssignment.getExecutorToSlot());
                } else {
                	System.out.println("current assignments: {}");
                }
                
                if (!componentToExecutors.containsKey("special-spout")) {
                	System.out.println("Our special-spout DOES NOT NEED scheduling.");
                } else {
                    System.out.println("Our special-spout needs scheduling.");
                    List<ExecutorDetails> executors = componentToExecutors.get("special-spout");

                    // find out the our "special-supervisor" from the supervisor metadata
                    Collection<SupervisorDetails> supervisors = cluster.getSupervisors().values();
                    SupervisorDetails specialSupervisor = null;
                    for (SupervisorDetails supervisor : supervisors) {
                        Map meta = (Map) supervisor.getSchedulerMeta();

                        if (meta.get("name").equals("special-supervisor")) {
                            specialSupervisor = supervisor;
                            break;
                        }
                    }

                    // found the special supervisor
                    if (specialSupervisor != null) {
                    	System.out.println("Found the special-supervisor");
                        List<WorkerSlot> availableSlots = cluster.getAvailableSlots(specialSupervisor);
                        
                        // if there is no available slots on this supervisor, free some.
                        // TODO for simplicity, we free all the used slots on the supervisor.
                        if (availableSlots.isEmpty() && !executors.isEmpty()) {
                            for (Integer port : cluster.getUsedPorts(specialSupervisor)) {
                                cluster.freeSlot(new WorkerSlot(specialSupervisor.getId(), port));
                            }
                        }

                        // re-get the aviableSlots
                        availableSlots = cluster.getAvailableSlots(specialSupervisor);

                        // since it is just a demo, to keep things simple, we assign all the
                        // executors into one slot.
                        cluster.assign(availableSlots.get(0), topology.getId(), executors);
                        System.out.println("We assigned executors:" + executors + " to slot: [" + availableSlots.get(0).getNodeId() + ", " + availableSlots.get(0).getPort() + "]");
                    } else {
                    	System.out.println("There is no supervisor named special-supervisor!!!");
                    }
                }
            }
        }
        
        // let system's even scheduler handle the rest scheduling work
        // you can also use your own other scheduler here, this is what
        // makes storm's scheduler composable.
        new EvenScheduler().schedule(topologies, cluster);
    }

}

DirectToSlotScheduler（来源网上）

测试topo的代码：
参考：https://www.dazhuanlan.com/2019/09/30/5d91aa16c3ab8/

3.编程基础

风吹海洋浪

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
打赏
2
评论
Storm总体与代码结构和原生调度器、编程基础

Storm总体与代码结构和原生调度器、编程基础1.总体与代码结构1.1总体结构NimbusSupervisorWorkerExecutorTask并行度并行度这个概念分为不同层面的：topology的并行度Executor并行度组件的并行度动态设置组件并行度Topology消息分发策略Storm的Ack消息框架通信机制1.2代码结构2.原生调度器2.1任务调度策略2.2调度器解析EvenScheduler和DefaultSchedulerIsolationSchedulerMultitenantSchedu
复制链接

扫一扫