Apscheduler源码分析与应用

最新推荐文章于 2023-12-15 14:01:00 发布

Kevin9436

最新推荐文章于 2023-12-15 14:01:00 发布

阅读量330

点赞数

分类专栏： Python开发文章标签： python 分布式

本文链接：https://blog.csdn.net/weixin_39253570/article/details/120297211

版权

Python开发专栏收录该内容

16 篇文章 1 订阅

订阅专栏

Apscheduler源码分析与应用

Apscheduler介绍
概念说明
调度器结构图
模块定义
工作流程
- Scheduler主循环
Apscheduler与分布式

Apscheduler介绍

Apscheduler是python比较好用的定时任务框架，介绍和api可以参考官方文档

概念说明

Job任务：定义定时任务所执行的函数、函数参数以及任务执行相关的配置。
Trigger触发器：定义任务执行的触发方式，包含cron、date、interval、混合方式。
JobStore任务仓库：保存调度的任务，默认内存，支持mongodb、redis、sqlalchemy等
Executor执行器：负责执行任务，默认线程池，支持进程池、gevent、tornado、asyncio等
Scheduler调度器：负责调度任务，将其他模块组合在一起并且对外提供方便的api。
Listener事件监听器：用于监听scheduler中各种事件，官方提供四种事件：调度器事件、任务事件、任务提交事件、任务执行事件。

调度器结构图

请添加图片描述

模块定义

scheduler基类定义

一个scheduler可以包含多种JobStore、多种Executor以及多个Listener。
调度器还会维护一个变量，用来记录当前调度器所处状态。

class BaseScheduler(six.with_metaclass(ABCMeta)):
    def __init__(self, gconfig={}, **options):
        super(BaseScheduler, self).__init__()
        self._executors = {}  # executors别名与实例的映射
        self._executors_lock = self._create_lock()
        self._jobstores = {}  # jobstores别名与实例的映射
        self._jobstores_lock = self._create_lock()
        self._listeners = []  # 事件监听器列表
        self._listeners_lock = self._create_lock()
        self._pending_jobs = []  # 待添加到任务仓库中的任务列表
        self.state = STATE_STOPPED  # scheduler状态变量
        self.configure(gconfig, **options)  # 配置调度器

JobStore定义

JobStore会维护一个任务列表，这个任务列表可以看作是按照任务执行时间的升序列表。内存JobStore的有序列表维护在内存，mongodb、redis等数据库JobStore则通过数据库引擎维护有序列表。数据库作为JobStore可支持任务的持久化保存。

Job定义

class Job(object):
    __slots__ = (
    	'_scheduler',  # 任务对应的调度器
    	'_jobstore_alias',  # 储存这个任务的JobStore的别名
    	'id',  # 任务id
    	'trigger',  # 任务对应的触发器
    	'executor',  # 任务对应的执行器的别名
    	'func',  # 任务对应的函数
    	'func_ref',  # 序列化的任务
    	'args', 
    	'kwargs', 
    	'name',  # 任务描述
    	'misfire_grace_time',  # the time (in seconds) how much this job’s execution is allowed to be late (None means “allow the job to run no matter how late it is”)
    	'coalesce',  # whether to only run the job once when several run times are due
    	'max_instances',  # the maximum number of concurrently executing instances allowed for this job
    	'next_run_time',  # the next scheduled run time of this job
    	'__weakref__'
    )

    def __init__(self, scheduler, id=None, **kwargs):
        super(Job, self).__init__()
        self._scheduler = scheduler
        self._jobstore_alias = None
        self._modify(id=id or uuid4().hex, **kwargs)

工作流程

以BlockingScheduler、MemoryJobStore、ThreadPoolExecutor组合为例。
请添加图片描述

当时间到t1时，调用jobstore.get_due_jobs()获取该jobstore所有到期任务
获取到期任务的executor别名，通过scheduler._lookup_executor()找到executor实例
调用executor.submit_job()将任务发送给executor执行
执行成功后，通过调用任务对应的trigger.get_next_fire_time()获取任务下次执行时间，再执行job._modify()更新任务
通过调用jobstore.update_job()将更新后的任务更新回jobstore
调用jobstore.get_next_run_time()获取jobstore的最近任务执行时间
scheduler获取到最近的jobstore任务执行时间，通过scheduler._event.wait() sleep到最近的任务执行时间，进入下个循环

Scheduler主循环

# BlockingScheduler._main_loop()
    def _main_loop(self):
        wait_seconds = TIMEOUT_MAX
        while self.state != STATE_STOPPED:
            self._event.wait(wait_seconds)
            self._event.clear()
            wait_seconds = self._process_jobs()

# scheduler._process_jobs()
    def _process_jobs(self):
        if self.state == STATE_PAUSED:
            self._logger.debug('Scheduler is paused -- not processing jobs')
            return None

        self._logger.debug('Looking for jobs to run')
        now = datetime.now(self.timezone)
        next_wakeup_time = None
        events = []

        with self._jobstores_lock:
        	# 遍历每个jobstore获取所有到期job
            for jobstore_alias, jobstore in six.iteritems(self._jobstores):
                try:
                    due_jobs = jobstore.get_due_jobs(now)
                except Exception as e:
                    # Schedule a wakeup at least in jobstore_retry_interval seconds
                    self._logger.warning('Error getting due jobs from job store %r: %s',
                                         jobstore_alias, e)
                    retry_wakeup_time = now + timedelta(seconds=self.jobstore_retry_interval)
                    if not next_wakeup_time or next_wakeup_time > retry_wakeup_time:
                        next_wakeup_time = retry_wakeup_time

                    continue

                for job in due_jobs:
                    # Look up the job's executor
                    try:
                        executor = self._lookup_executor(job.executor)  # 找到每个任务对应的executor实例
                    except BaseException:
                        self._logger.error(
                            'Executor lookup ("%s") failed for job "%s" -- removing it from the '
                            'job store', job.executor, job)
                        self.remove_job(job.id, jobstore_alias)
                        continue

                    run_times = job._get_run_times(now)
                    run_times = run_times[-1:] if run_times and job.coalesce else run_times
                    if run_times:
                        try:
                            executor.submit_job(job, run_times)  # executor执行任务
                        except MaxInstancesReachedError:
                            self._logger.warning(
                                'Execution of job "%s" skipped: maximum number of running '
                                'instances reached (%d)', job, job.max_instances)
                            event = JobSubmissionEvent(EVENT_JOB_MAX_INSTANCES, job.id,
                                                       jobstore_alias, run_times)
                            events.append(event)
                        except BaseException:
                            self._logger.exception('Error submitting job "%s" to executor "%s"',
                                                   job, job.executor)
                        else:
                            event = JobSubmissionEvent(EVENT_JOB_SUBMITTED, job.id, jobstore_alias,
                                                       run_times)
                            events.append(event)

                        # Update the job if it has a next execution time.
                        # Otherwise remove it from the job store.
                        job_next_run = job.trigger.get_next_fire_time(run_times[-1], now)
                        if job_next_run:
                            job._modify(next_run_time=job_next_run)
                            jobstore.update_job(job)
                        else:
                            self.remove_job(job.id, jobstore_alias)

                # Set a new next wakeup time if there isn't one yet or
                # the jobstore has an even earlier one
                jobstore_next_run_time = jobstore.get_next_run_time()
                if jobstore_next_run_time and (next_wakeup_time is None or
                                               jobstore_next_run_time < next_wakeup_time):
                    next_wakeup_time = jobstore_next_run_time.astimezone(self.timezone)

        # Dispatch collected events
        for event in events:
            self._dispatch_event(event)

        # Determine the delay until this method should be called again
        if self.state == STATE_PAUSED:
            wait_seconds = None
            self._logger.debug('Scheduler is paused; waiting until resume() is called')
        elif next_wakeup_time is None:
            wait_seconds = None
            self._logger.debug('No jobs; waiting until a job is added')
        else:
            wait_seconds = min(max(timedelta_seconds(next_wakeup_time - now), 0), TIMEOUT_MAX)
            self._logger.debug('Next wakeup is due at %s (in %f seconds)', next_wakeup_time,
                               wait_seconds)

        return wait_seconds

Apscheduler与分布式

一般生产环境需要执行定时任务时须进行多实例部署，以免单点失效造成所有定时任务无法执行，但同一个定时任务又不希望在每个apscheduler实例上都执行一次，这就需要讨论一下apscheduler的分布式实现。虽然apscheduler的任务存储支持mongodb、redis等分布式存储，但apscheduler在从jobstore获取任务时没有进行过上锁，所以本身不支持分布式执行定时任务。要实现分布式执行，需要额外引入分布式锁，在每个apscheduler实例从jobstore获取任务时对该任务上锁，即可避免同一任务在所有apscheduler实例都执行。

Kevin9436

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
1
评论
Apscheduler源码分析与应用

Apscheduler原理与应用欢迎使用Markdown编辑器新的改变功能快捷键合理的创建标题，有助于目录的生成如何改变文本的样式插入链接与图片如何插入一段漂亮的代码片生成一个适合你的列表创建一个表格设定内容居中、居左、居右SmartyPants创建一个自定义列表如何创建一个注脚注释也是必不可少的KaTeX数学公式新的甘特图功能，丰富你的文章UML 图表FLowchart流程图导出与导入导出导入欢迎使用Markdown编辑器你好！这是你第一次使用 Markdown编辑器所展示的欢迎页。如果你想学习如
复制链接

扫一扫