DolphinScheduler1.3.4 master-server分配任务的三种调度策略算法实现与分析

简介

DolphinScheduler1.3.4集群环境默认有三种调度策略算法分配任务到同一work组的不同机器上,分别是随机轮询机器资源权重算法。master-server在获取到任务之后会去zk上获取到当前任务所属工作组的所有机器信息列表,然后根据调度策略算法获取到一台机器,通过Netty给该机器发送任务,该机器的work服务就回去执行具体的任务。

随机算法

原理

DS实现随机算法非常简单,zk获取到任务执行的工作组上所有的的机器后,如果机器总数为1直接获取该机器,如果机器数量不为1则随机获取一台机器作为任务的执行机器。

代码实现:

public T select(final Collection<T> source) {
        if (source == null || source.size() == 0) {
            throw new IllegalArgumentException("Empty source.");
        }
        /**
         * if only one , return directly
         */
        if (source.size() == 1) {
            return (T) source.toArray()[0];
        }
        int size = source.size();
        /**
         *  random select
         */
        int randomIndex = random.nextInt(size);
        return (T) source.toArray()[randomIndex];
    }

轮询算法

原理

轮询算法的实现借助了原子类AtomicInteger实现,通过AtomicInteger计数,然后通过对机器数量取摩方式获取到机器所在数组位置,达到任务轮询执行在work机器上目的。

代码实现

private final AtomicInteger index = new AtomicInteger(0);
    @Override
    public T select(Collection<T> source) {
        if (source == null || source.size() == 0) {
            throw new IllegalArgumentException("Empty source.");
        }
        /**
         * if only one , return directly
         */
        if (source.size() == 1) {
            return (T)source.toArray()[0];
        }
        int size = source.size();
        /**
         * round robin
         */
        return (T) source.toArray()[index.getAndIncrement() % size];
    }

机器资源权重算法

原理分析

机器资源权重算法主要侧重于资源,master-server通过zk的机器心跳检测获取到机器资源的信息,然后根据资源计算权重最小的机器。具体的资源权重计算方式是:

private int calculateWeight(double cpu, double memory, double loadAverage){
        return (int)(cpu * CPU_FACTOR + memory * MEMORY_FACTOR + loadAverage * LOAD_AVERAGE_FACTOR);
    }

CPU_FACTOR cpu所占比例10
MEMORY_FACTOR 内存所占比例20
LOAD_AVERAGE_FACTOR 负载均值所占比例70

代码实现

public class LowerWeightHostManager extends CommonHostManager {

    private final Logger logger = LoggerFactory.getLogger(LowerWeightHostManager.class);

    /**
     * zookeeper registry center
     */
    @Autowired
    private ZookeeperRegistryCenter registryCenter;

    /**
     * round robin host manager
     */
    private RoundRobinHostManager roundRobinHostManager;

    /**
     * selector
     */
    private LowerWeightRoundRobin selector;

    /**
     * worker host weights
     */
    private ConcurrentHashMap<String, Set<HostWeight>> workerHostWeightsMap;

    /**
     * worker group host lock
     */
    private Lock lock;

    /**
     * executor service
     */
    private ScheduledExecutorService executorService;

    @PostConstruct
    public void init(){
        this.selector = new LowerWeightRoundRobin();
        this.workerHostWeightsMap = new ConcurrentHashMap<>();
        this.lock = new ReentrantLock();
        this.executorService = Executors.newSingleThreadScheduledExecutor(new NamedThreadFactory("LowerWeightHostManagerExecutor"));
        this.executorService.scheduleWithFixedDelay(new RefreshResourceTask(),0, 5, TimeUnit.SECONDS);
        this.roundRobinHostManager = new RoundRobinHostManager();
        this.roundRobinHostManager.setZookeeperNodeManager(getZookeeperNodeManager());
    }

    @PreDestroy
    public void close(){
        this.executorService.shutdownNow();
    }

    /**
     * select host
     * @param context context
     * @return host
     */
    @Override
    public Host select(ExecutionContext context){
        Set<HostWeight> workerHostWeights = getWorkerHostWeights(context.getWorkerGroup());
        if(CollectionUtils.isNotEmpty(workerHostWeights)){
            return selector.select(workerHostWeights).getHost();
        }
        return new Host();
    }

    @Override
    public Host select(Collection<Host> nodes) {
        throw new UnsupportedOperationException("not support");
    }

    private void syncWorkerHostWeight(Map<String, Set<HostWeight>> workerHostWeights){
        lock.lock();
        try {
            workerHostWeightsMap.clear();
            workerHostWeightsMap.putAll(workerHostWeights);
        } finally {
            lock.unlock();
        }
    }

    private Set<HostWeight> getWorkerHostWeights(String workerGroup){
        lock.lock();
        try {
            return workerHostWeightsMap.get(workerGroup);
        } finally {
            lock.unlock();
        }
    }

    class RefreshResourceTask implements Runnable{

        @Override
        public void run() {
            try {
                Map<String, Set<String>> workerGroupNodes = zookeeperNodeManager.getWorkerGroupNodes();
                Set<Map.Entry<String, Set<String>>> entries = workerGroupNodes.entrySet();
                Map<String, Set<HostWeight>> workerHostWeights = new HashMap<>();
                for(Map.Entry<String, Set<String>> entry : entries){
                    String workerGroup = entry.getKey();
                    Set<String> nodes = entry.getValue();
                    String workerGroupPath = registryCenter.getWorkerGroupPath(workerGroup);
                    Set<HostWeight> hostWeights = new HashSet<>(nodes.size());
                    for(String node : nodes){
                        String heartbeat = registryCenter.getZookeeperCachedOperator().get(workerGroupPath + "/" + node);
                        if(StringUtils.isNotEmpty(heartbeat)
                                && heartbeat.split(COMMA).length == Constants.HEARTBEAT_FOR_ZOOKEEPER_INFO_LENGTH){
                            String[] parts = heartbeat.split(COMMA);

                            int status = Integer.parseInt(parts[8]);
                            if (status == Constants.ABNORMAL_NODE_STATUS){
                                logger.warn("load is too high or availablePhysicalMemorySize(G) is too low, it's availablePhysicalMemorySize(G):{},loadAvg:{}",
                                        Double.parseDouble(parts[3]) , Double.parseDouble(parts[2]));
                                continue;
                            }

                            double cpu = Double.parseDouble(parts[0]);
                            double memory = Double.parseDouble(parts[1]);
                            double loadAverage = Double.parseDouble(parts[2]);
                            HostWeight hostWeight = new HostWeight(Host.of(node), cpu, memory, loadAverage);
                            hostWeights.add(hostWeight);
                        }
                    }
                    workerHostWeights.put(workerGroup, hostWeights);
                }
                syncWorkerHostWeight(workerHostWeights);
            } catch (Throwable ex){
                logger.error("RefreshResourceTask error", ex);
            }
        }
    }

}
public HostWeight select(Collection<HostWeight> sources){
        int totalWeight = 0;
        int lowWeight = 0;
        HostWeight lowerNode = null;
        for (HostWeight hostWeight : sources) {
            totalWeight += hostWeight.getWeight();
            hostWeight.setCurrentWeight(hostWeight.getCurrentWeight() + hostWeight.getWeight());
            if (lowerNode == null || lowWeight > hostWeight.getCurrentWeight() ) {
                lowerNode = hostWeight;
                lowWeight = hostWeight.getCurrentWeight();
            }
        }
        lowerNode.setCurrentWeight(lowerNode.getCurrentWeight() + totalWeight);
        return lowerNode;

    }

这里通过可重入锁ReentrantLock保证workerHostWeightsMap每个线程获取到的工作组信息完整。

设计模式

DS在获取具体的调度算法上使用了抽象工厂设计模式去实现。下面是伪代码:

HostManager抽象产品类抽象出选择host的行为
/**
 *  host manager
 */
public interface HostManager {

    /**
     *  select host
     * @param context context
     * @return host
     */
    Host select(ExecutionContext context);

}
CommonHostManager抽象产品类 从zk获取工作组中机器节点
public abstract class CommonHostManager implements HostManager {

    private final Logger logger = LoggerFactory.getLogger(CommonHostManager.class);

    /**
     * zookeeperNodeManager
     */
    @Autowired
    protected ZookeeperNodeManager zookeeperNodeManager;

    /**
     * select host
     * @param context context
     * @return host
     */
    @Override
    public Host select(ExecutionContext context){
        Host host = new Host();
        Collection<String> nodes = null;
        /**
         * executor type
         */
        ExecutorType executorType = context.getExecutorType();
        switch (executorType){
            case WORKER:
                nodes = zookeeperNodeManager.getWorkerGroupNodes(context.getWorkerGroup());
                break;
            case CLIENT:
                break;
            default:
                throw new IllegalArgumentException("invalid executorType : " + executorType);

        }
        if(CollectionUtils.isEmpty(nodes)){
            return host;
        }
        List<Host> candidateHosts = new ArrayList<>(nodes.size());
        nodes.stream().forEach(node -> candidateHosts.add(Host.of(node)));

        return select(candidateHosts);
    }

    protected abstract Host select(Collection<Host> nodes);

    public void setZookeeperNodeManager(ZookeeperNodeManager zookeeperNodeManager) {
        this.zookeeperNodeManager = zookeeperNodeManager;
    }

    public ZookeeperNodeManager getZookeeperNodeManager() {
        return zookeeperNodeManager;
    }
}
RandomHostManager 随机算法具体产品类
/**
 *  round robin host manager
 */
public class RandomHostManager extends CommonHostManager {

    /**
     * selector
     */
    private final Selector<Host> selector;

    /**
     * set round robin
     */
    public RandomHostManager(){
        this.selector = new RandomSelector<>();
    }

    @Override
    public Host select(Collection<Host> nodes) {
        return selector.select(nodes);
    }
}
RoundRobinHostManager轮询算法具体产品类
/**
 *  round robin host manager
 */
public class RoundRobinHostManager extends CommonHostManager {

    /**
     * selector
     */
    private final Selector<Host> selector;

    /**
     * set round robin
     */
    public RoundRobinHostManager(){
        this.selector = new RoundRobinSelector<>();
    }

    @Override
    public Host select(Collection<Host> nodes) {
        return selector.select(nodes);
    }

}
LowerWeightHostManager 资源权重算法产品类
HostManagerConfig工厂类,根据配置获取具体的算法产品类
@Configuration
public class HostManagerConfig {

    private AutowireCapableBeanFactory beanFactory;

    @Autowired
    private MasterConfig masterConfig;

    @Autowired
    public HostManagerConfig(AutowireCapableBeanFactory beanFactory) {
        this.beanFactory = beanFactory;
    }

    @Bean
    public HostManager hostManager() {
        String hostSelector = masterConfig.getHostSelector();
        HostSelector selector = HostSelector.of(hostSelector);
        HostManager hostManager;
        switch (selector){
            case RANDOM:
                hostManager = new RandomHostManager();
                break;
            case ROUNDROBIN:
                hostManager = new RoundRobinHostManager();
                break;
            case LOWERWEIGHT:
                hostManager = new LowerWeightHostManager();
                break;
            default:
                throw new IllegalArgumentException("unSupport selector " + hostSelector);
        }
        beanFactory.autowireBean(hostManager);
        return hostManager;
    }
}
DolphinScheduler 性能调优可以从多个方面入手,其中线程数是一个重要的因素。以下是一些可以尝试的方法: 1. 调整服务端线程池大小: DolphinScheduler 后台任务执行时,会使用线程池来管理线程,可以通过调整线程池大小来提高性能。在 `dolphinscheduler-server/src/main/resources/application.properties` 配置文件中,可以找到以下属性: ``` # server thread pool parameters dolphinscheduler.server.work.thread.max=100 dolphinscheduler.server.executor.thread.num=100 ``` 其中,`dolphinscheduler.server.executor.thread.num` 是任务执行线程池的大小,`dolphinscheduler.server.work.thread.max` 是工作线程池的大小。根据服务器的 CPU 核心数和内存情况,适当调整这两个属性的值。 2. 调整客户端线程池大小: DolphinScheduler 的 Web 服务后台服务通信时,也会使用线程池来管理线程。可以在 `dolphinscheduler-api/src/main/resources/application.properties` 配置文件中找到以下属性: ``` # client thread pool parameters dolphinscheduler.client.thread.num=100 ``` 将 `dolphinscheduler.client.thread.num` 设置为适当的值,可以提高客户端访问后台服务的效率。 3. 调整数据库连接池大小: DolphinScheduler 使用的是 Druid 数据库连接池,可以在 `dolphinscheduler-server/src/main/resources/application.properties` 配置文件中找到以下属性: ``` # datasource parameters spring.datasource.type=com.alibaba.druid.pool.DruidDataSource spring.datasource.driver-class-name=com.mysql.jdbc.Driver spring.datasource.url=jdbc:mysql://localhost:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8&useSSL=false spring.datasource.username=root spring.datasource.password=123456 spring.datasource.initialSize=5 spring.datasource.minIdle=5 spring.datasource.maxActive=100 spring.datasource.maxWait=60000 ``` 其中,`spring.datasource.maxActive` 是连接池中最大的活跃连接数,可以根据数据库的负载情况进行调整。 4. 调整 JVM 参数: DolphinScheduler 使用的是 Java 技术栈,可以通过调整 JVM 参数来提高性能。具体的参数可以根据服务器的硬件配置和应用场景进行调整,一些常见的参数包括: ``` # JVM parameters -Xms2g -Xmx2g -XX:MaxDirectMemorySize=1g -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+HeapDumpOnOutOfMemoryError ``` 其中,`-Xms` 和 `-Xmx` 分别是 JVM 堆的初始大小和最大大小,可以根据服务器的内存情况进行调整。`-XX:MaxDirectMemorySize` 是直接内存的最大值,可以根据应用场景进行调整。`-XX:+UseG1GC` 和 `-XX:MaxGCPauseMillis` 是启用 G1 垃圾回收器和最大 GC 暂停时间,可以提高 GC 的效率。`-XX:+HeapDumpOnOutOfMemoryError` 是在内存溢出时产生堆转储文件,可以帮助排查问题。 以上是一些常见的 DolphinScheduler 性能调优方法,可以根据实际情况进行调整。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

游语

对你有帮助,可以请我喝杯奶哦

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值