storm源码分析(七)

最新推荐文章于 2021-12-14 20:08:13 发布

null_wry

最新推荐文章于 2021-12-14 20:08:13 发布

阅读量564

点赞数

文章标签： storm 大数据 big data

本文链接：https://blog.csdn.net/null_wry/article/details/121259162

版权

2021SC@SDUSC

获取属于Worker的Executor

read-worker-executors函数用来计算分配到该Worker的Executor,它通过调用Storm-cluster-state的assignment-info函数获得所有Topology的分配信息，然后利用worker的assignemtn-id以及port进行过滤，得到某个worker所属的Executor,这里的assignment-id对应于node,Worker启动后，其执行的Executor集合将不再发生变化，但当任务分配情况发生变化时,Supervisor就会重启worker来处理任务。其中，Nimbus在计算分吴分配时会尽量不改变Worker中已执行的Executor。当前Worker中任何一个Executor处理失败都会导致Worker重启。

创建Executor中接收消息队列和查找表

mk-receive-queue-map函数用于为Worker中的每一个Executor创建接收队列，并将其存入hash表,其中键为ExecutorId,值为Disruptor Queue的对象；

ExecutorId实际上为含有两个元素的数据，即[startTaskId,endTaskId],表示该Executor执行的任务区间。

worker中的接收函数

Worker中的mk-transfer-local-fn函数用于生产并发送消息到Executor的接收队列，同一个worker内部的Executor之间会通过该函数传递消息。

short-executor-receive-queue-map存储Executor中第一个Task的taskid到该Executor对应的接收队列(Distuptor Queue)的映射关系。

task-getter函数以ZMQ 发来的消息为传入参数，这里的消息为一个含有两个元素的数组，第一个元素为TaskId,task-getter函数的目标是通过消息的taskId获得与其对应的Executor中第一个Task的TaskId,第二个元素为消息的实际内容。

定义函数体，函数的输入为ZMQ收到的一组消息tuple-batch,按照与消息Taskid对应的Executor中第一个Task的TaskId对消息进行分组，其变量grouped对应的键为Executor中第一个Task的Taskid,值为属于该executor的一组消息；

通过executor中第一个task的taskid获得与Executor相对应的接收消息队列q,调用disruptor/publish方法将收到的消息发送至队列q中

WorkerLogs

setLogFilePermission

public void setLogFilePermission(String fileName) throws IOException {
        Path absFile = logRootDir.resolve(fileName).toAbsolutePath().normalize();
        if (!absFile.startsWith(logRootDir)) {
            return;
        }
        boolean runAsUser = ObjectReader.getBoolean(stormConf.get(SUPERVISOR_RUN_WORKER_AS_USER), false);
        Path parent = logRootDir.resolve(fileName).getParent();
        Optional<Path> mdFile = (parent == null) ? Optional.empty() : getMetadataFileForWorkerLogDir(parent);
        Optional<String> topoOwner = mdFile.isPresent()
                ? Optional.of(getTopologyOwnerFromMetadataFile(mdFile.get().toAbsolutePath().normalize()))
                : Optional.empty();

        if (runAsUser && topoOwner.isPresent() && absFile.toFile().exists() && !Files.isReadable(absFile)) {
            LOG.debug("Setting permissions on file {} with topo-owner {}", fileName, topoOwner);
            try {
                ClientSupervisorUtils.processLauncherAndWait(stormConf, topoOwner.get(),
                        Lists.newArrayList("blob", absFile.toAbsolutePath().normalize().toString()), null,
                        "setup group read permissions for file: " + fileName);
            } catch (IOException e) {
                numSetPermissionsExceptions.mark();
                throw e;
            }
        }
    }

设置日志文件的权限，以便日志查看器可以服务该文件。

getAllLogsForRootDir

public List<Path> getAllLogsForRootDir() throws IOException {
        List<Path> files = new ArrayList<>();
        Set<Path> topoDirFiles = getAllWorkerDirs();
        if (topoDirFiles != null) {
            for (Path portDir : topoDirFiles) {
                files.addAll(directoryCleaner.getFilesForDir(portDir));
            }
        }

        return files;
    }

返回根日志目录中worker目录中所有日志文件的列表。

getAllWorkerDirs

public Set<Path> getAllWorkerDirs() {
        try (Stream<Path> topoDirs = Files.list(logRootDir)) {
            return topoDirs
                .filter(Files::isDirectory)
                .flatMap(Unchecked.function(Files::list)) //Worker dirs
                .filter(Files::isDirectory)
                .collect(Collectors.toCollection(TreeSet::new));
        } catch (IOException e) {
            throw Utils.wrapInRuntime(e);
        }
    }

返回根日志目录中所有拓扑目录中的所有工作目录的集合。

getAliveWorkerDirs

public SortedSet<Path> getAliveWorkerDirs() throws IOException {
        Set<String> aliveIds = getAliveIds(Time.currentTimeSecs());
        Set<Path> logDirs = getAllWorkerDirs();
        return getLogDirs(logDirs, (wid) -> aliveIds.contains(wid));
    }

返回一个排序过的路径集合，这些路径是由现在处于活动状态的工作人员编写的。

getMetadataFileForWorkerLogDir

public Optional<Path> getMetadataFileForWorkerLogDir(Path logDir) throws IOException {
        Path metaFile = logDir.resolve(WORKER_YAML);
        if (metaFile.toFile().exists()) {
            return Optional.of(metaFile);
        } else {
            LOG.warn("Could not find {} to clean up for {}", metaFile.toAbsolutePath().normalize(), logDir);
            return Optional.empty();
        }
    }

返回给定工作者日志目录的元数据文件(worker.yaml)。

getWorkerIdFromMetadataFile

public String getWorkerIdFromMetadataFile(Path metaFile) {
        Map<String, Object> map = (Map<String, Object>) Utils.readYamlFile(metaFile.toString());
        return ObjectReader.getString(map == null ? null : map.get("worker-id"), null);
    }

从worker元文件中返回worker id。

getLogDirs

public SortedSet<Path> getLogDirs(Set<Path> logDirs, Predicate<String> predicate) {
        // we could also make this static, but not to do it due to mock
        TreeSet<Path> ret = new TreeSet<>();
        for (Path logDir: logDirs) {
            String workerId = "";
            try {
                Optional<Path> metaFile = getMetadataFileForWorkerLogDir(logDir);
                if (metaFile.isPresent()) {
                    workerId = getWorkerIdFromMetadataFile(metaFile.get().toAbsolutePath().normalize());
                    if (workerId == null) {
                        workerId = "";
                    }
                }
            } catch (IOException e) {
                LOG.warn("Error trying to find worker.yaml in {}", logDir, e);
            }
            if (predicate.test(workerId)) {
                ret.add(logDir);
            }
        }
        return ret;
    }