Apscheduler源码分析与应用

Apscheduler介绍

Apscheduler是python比较好用的定时任务框架,介绍和api可以参考官方文档

概念说明

  1. Job任务:定义定时任务所执行的函数、函数参数以及任务执行相关的配置。
  2. Trigger触发器:定义任务执行的触发方式,包含cron、date、interval、混合方式。
  3. JobStore任务仓库:保存调度的任务,默认内存,支持mongodb、redis、sqlalchemy等
  4. Executor执行器:负责执行任务,默认线程池,支持进程池、gevent、tornado、asyncio等
  5. Scheduler调度器:负责调度任务,将其他模块组合在一起并且对外提供方便的api。
  6. Listener事件监听器:用于监听scheduler中各种事件,官方提供四种事件:调度器事件、任务事件、任务提交事件、任务执行事件。

调度器结构图

请添加图片描述

模块定义

scheduler基类定义

一个scheduler可以包含多种JobStore、多种Executor以及多个Listener。
调度器还会维护一个变量,用来记录当前调度器所处状态。

class BaseScheduler(six.with_metaclass(ABCMeta)):
    def __init__(self, gconfig={}, **options):
        super(BaseScheduler, self).__init__()
        self._executors = {}  # executors别名与实例的映射
        self._executors_lock = self._create_lock()
        self._jobstores = {}  # jobstores别名与实例的映射
        self._jobstores_lock = self._create_lock()
        self._listeners = []  # 事件监听器列表
        self._listeners_lock = self._create_lock()
        self._pending_jobs = []  # 待添加到任务仓库中的任务列表
        self.state = STATE_STOPPED  # scheduler状态变量
        self.configure(gconfig, **options)  # 配置调度器

JobStore定义

JobStore会维护一个任务列表,这个任务列表可以看作是按照任务执行时间的升序列表。内存JobStore的有序列表维护在内存,mongodb、redis等数据库JobStore则通过数据库引擎维护有序列表。数据库作为JobStore可支持任务的持久化保存。

Job定义

class Job(object):
    __slots__ = (
    	'_scheduler',  # 任务对应的调度器
    	'_jobstore_alias',  # 储存这个任务的JobStore的别名
    	'id',  # 任务id
    	'trigger',  # 任务对应的触发器
    	'executor',  # 任务对应的执行器的别名
    	'func',  # 任务对应的函数
    	'func_ref',  # 序列化的任务
    	'args', 
    	'kwargs', 
    	'name',  # 任务描述
    	'misfire_grace_time',  # the time (in seconds) how much this job’s execution is allowed to be late (None means “allow the job to run no matter how late it is”)
    	'coalesce',  # whether to only run the job once when several run times are due
    	'max_instances',  # the maximum number of concurrently executing instances allowed for this job
    	'next_run_time',  # the next scheduled run time of this job
    	'__weakref__'
    )

    def __init__(self, scheduler, id=None, **kwargs):
        super(Job, self).__init__()
        self._scheduler = scheduler
        self._jobstore_alias = None
        self._modify(id=id or uuid4().hex, **kwargs)

工作流程

以BlockingScheduler、MemoryJobStore、ThreadPoolExecutor组合为例。
请添加图片描述

  1. 当时间到t1时,调用jobstore.get_due_jobs()获取该jobstore所有到期任务
  2. 获取到期任务的executor别名,通过scheduler._lookup_executor()找到executor实例
  3. 调用executor.submit_job()将任务发送给executor执行
  4. 执行成功后,通过调用任务对应的trigger.get_next_fire_time()获取任务下次执行时间,再执行job._modify()更新任务
  5. 通过调用jobstore.update_job()将更新后的任务更新回jobstore
  6. 调用jobstore.get_next_run_time()获取jobstore的最近任务执行时间
  7. scheduler获取到最近的jobstore任务执行时间,通过scheduler._event.wait() sleep到最近的任务执行时间,进入下个循环

Scheduler主循环

# BlockingScheduler._main_loop()
    def _main_loop(self):
        wait_seconds = TIMEOUT_MAX
        while self.state != STATE_STOPPED:
            self._event.wait(wait_seconds)
            self._event.clear()
            wait_seconds = self._process_jobs()

# scheduler._process_jobs()
    def _process_jobs(self):
        if self.state == STATE_PAUSED:
            self._logger.debug('Scheduler is paused -- not processing jobs')
            return None

        self._logger.debug('Looking for jobs to run')
        now = datetime.now(self.timezone)
        next_wakeup_time = None
        events = []

        with self._jobstores_lock:
        	# 遍历每个jobstore获取所有到期job
            for jobstore_alias, jobstore in six.iteritems(self._jobstores):
                try:
                    due_jobs = jobstore.get_due_jobs(now)
                except Exception as e:
                    # Schedule a wakeup at least in jobstore_retry_interval seconds
                    self._logger.warning('Error getting due jobs from job store %r: %s',
                                         jobstore_alias, e)
                    retry_wakeup_time = now + timedelta(seconds=self.jobstore_retry_interval)
                    if not next_wakeup_time or next_wakeup_time > retry_wakeup_time:
                        next_wakeup_time = retry_wakeup_time

                    continue

                for job in due_jobs:
                    # Look up the job's executor
                    try:
                        executor = self._lookup_executor(job.executor)  # 找到每个任务对应的executor实例
                    except BaseException:
                        self._logger.error(
                            'Executor lookup ("%s") failed for job "%s" -- removing it from the '
                            'job store', job.executor, job)
                        self.remove_job(job.id, jobstore_alias)
                        continue

                    run_times = job._get_run_times(now)
                    run_times = run_times[-1:] if run_times and job.coalesce else run_times
                    if run_times:
                        try:
                            executor.submit_job(job, run_times)  # executor执行任务
                        except MaxInstancesReachedError:
                            self._logger.warning(
                                'Execution of job "%s" skipped: maximum number of running '
                                'instances reached (%d)', job, job.max_instances)
                            event = JobSubmissionEvent(EVENT_JOB_MAX_INSTANCES, job.id,
                                                       jobstore_alias, run_times)
                            events.append(event)
                        except BaseException:
                            self._logger.exception('Error submitting job "%s" to executor "%s"',
                                                   job, job.executor)
                        else:
                            event = JobSubmissionEvent(EVENT_JOB_SUBMITTED, job.id, jobstore_alias,
                                                       run_times)
                            events.append(event)

                        # Update the job if it has a next execution time.
                        # Otherwise remove it from the job store.
                        job_next_run = job.trigger.get_next_fire_time(run_times[-1], now)
                        if job_next_run:
                            job._modify(next_run_time=job_next_run)
                            jobstore.update_job(job)
                        else:
                            self.remove_job(job.id, jobstore_alias)

                # Set a new next wakeup time if there isn't one yet or
                # the jobstore has an even earlier one
                jobstore_next_run_time = jobstore.get_next_run_time()
                if jobstore_next_run_time and (next_wakeup_time is None or
                                               jobstore_next_run_time < next_wakeup_time):
                    next_wakeup_time = jobstore_next_run_time.astimezone(self.timezone)

        # Dispatch collected events
        for event in events:
            self._dispatch_event(event)

        # Determine the delay until this method should be called again
        if self.state == STATE_PAUSED:
            wait_seconds = None
            self._logger.debug('Scheduler is paused; waiting until resume() is called')
        elif next_wakeup_time is None:
            wait_seconds = None
            self._logger.debug('No jobs; waiting until a job is added')
        else:
            wait_seconds = min(max(timedelta_seconds(next_wakeup_time - now), 0), TIMEOUT_MAX)
            self._logger.debug('Next wakeup is due at %s (in %f seconds)', next_wakeup_time,
                               wait_seconds)

        return wait_seconds

Apscheduler与分布式

一般生产环境需要执行定时任务时须进行多实例部署,以免单点失效造成所有定时任务无法执行,但同一个定时任务又不希望在每个apscheduler实例上都执行一次,这就需要讨论一下apscheduler的分布式实现。虽然apscheduler的任务存储支持mongodb、redis等分布式存储,但apscheduler在从jobstore获取任务时没有进行过上锁,所以本身不支持分布式执行定时任务。要实现分布式执行,需要额外引入分布式锁,在每个apscheduler实例从jobstore获取任务时对该任务上锁,即可避免同一任务在所有apscheduler实例都执行。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Kevin9436

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值