python定时任务框架APScheduler分布式解决方案

APScheduler本身是不支持分布式的,详见官方FAQ

后端框架是用的flask,无论是用多进程(uwsgi),还是分布式部署都会出现一个问题,作业多次执行

我重写了APScheduler的DistributedBackgroundScheduler这个模块,这个是循环的查看作业列表,定时提交任务,使用redis分布式锁,可以避免多次提交。

分布式锁的简单实现

import uuid
import redis
from extensions import redis_client


RELEASE_LUA_SCRIPT = """
    if redis.call("get",KEYS[1]) == ARGV[1] then
        return redis.call("del",KEYS[1])
    else
        return 0
    end
"""


class RedLockError(Exception):
    pass


class RedisLock:
    
    def __init__(self, key, immediate_release=True, ttl=10000):
        # 我的redis是在app初始化时已经连接了,这里就没有重复建立连接
        redis_client._release_script = redis_client.register_script(RELEASE_LUA_SCRIPT)
        self.key = key
        self.redis_client = redis_client
        self.immediate_release = immediate_release
        self.ttl = ttl
    
    def __enter__(self):
        acquired = self.acquire_lock()
        if not acquired:
            raise RedLockError('failed to acquire lock')
        return acquired 
    
    def __exit__(self):
        if self.immediate_release:
            self.release_lock()
    
    def acquire_lock(self):
        """加锁"""

        # 锁即redis存值应该随机且唯一
        self.lock_key = uuid.uuid4().hex
        try:
            if self.redis_client.set(self.key, self.lock_key, nx=True, px=self.ttl):
                return True
        except (redis.exceptions.ConnectionError, redis.exceptions.TimeoutError):
            pass
        return False
    def release_lock(self):
        """释放锁"""
        try:
            # 释放锁要验证值
            self.redis_client._release_script(keys=[self.key], args=[self.lock_key])
        except (redis.exceptions.ConnectionError, redis.exceptions.TimeoutError):
            pass

 重写_process_jobs,使用redis分布式锁解决分布式问题

# _*_coding:utf-8 _*_

# @Time  :  

# @Author:  Simon

# @File  :  distributed_apscheduler.py

# @Func  : 通过redis使apscheduler支持分布式
from apscheduler.schedulers.background import BackgroundScheduler
from apscheduler.executors.base import MaxInstancesReachedError
from apscheduler.events import (JobSubmissionEvent, EVENT_JOB_SUBMITTED, EVENT_JOB_MAX_INSTANCES)
from apscheduler.util import (timedelta_seconds, TIMEOUT_MAX)
from datetime import datetime, timedelta
import six


#: constant indicating a scheduler's stopped state
STATE_STOPPED = 0
#: constant indicating a scheduler's running state (started and processing jobs)
STATE_RUNNING = 1
#: constant indicating a scheduler's paused state (started but not processing jobs)
STATE_PAUSED = 2


class DistributedBackgroundScheduler(BackgroundScheduler):

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

    def _process_jobs(self):
        """
        重写_process_jobs,使用redis分布式锁解决分布式问题
        Iterates through jobs in every jobstore, starts jobs that are due and figures out how long
        to wait for the next round.

        If the ``get_due_jobs()`` call raises an exception, a new wakeup is scheduled in at least
        ``jobstore_retry_interval`` seconds.

        """
        if self.state == STATE_PAUSED:
            self._logger.debug('Scheduler is paused -- not processing jobs')
            return None

        self._logger.debug('Looking for jobs to run')
        now = datetime.now(self.timezone)
        next_wakeup_time = None
        events = []
        # 引入redis分布式锁, 我的代码无法避免互相引用问题,故在此处import
        from redis_lock import RedisLock
        with self._jobstores_lock:
            for jobstore_alias, jobstore in six.iteritems(self._jobstores):
                try:
                    due_jobs = jobstore.get_due_jobs(now)
                except Exception as e:
                    # Schedule a wakeup at least in jobstore_retry_interval seconds
                    self._logger.warning('Error getting due jobs from job store %r: %s',
                                         jobstore_alias, e)
                    retry_wakeup_time = now + timedelta(seconds=self.jobstore_retry_interval)
                    if not next_wakeup_time or next_wakeup_time > retry_wakeup_time:
                        next_wakeup_time = retry_wakeup_time

                    continue

                for job in due_jobs:
                    # 获取分布式锁
                    key = 'jobs' + job.id
                    with RedisLock(key, immediate_release=False) as lock:
                        if lock:
                            # Look up the job's executor
                            try:
                                executor = self._lookup_executor(job.executor)
                            except BaseException:
                                self._logger.error(
                                    'Executor lookup ("%s") failed for job "%s" -- removing it from the '
                                    'job store', job.executor, job)
                                self.remove_job(job.id, jobstore_alias)
                                continue

                            run_times = job._get_run_times(now)
                            run_times = run_times[-1:] if run_times and job.coalesce else run_times
                            if run_times:
                                try:
                                    executor.submit_job(job, run_times)
                                except MaxInstancesReachedError:
                                    self._logger.warning(
                                        'Execution of job "%s" skipped: maximum number of running '
                                        'instances reached (%d)', job, job.max_instances)
                                    event = JobSubmissionEvent(EVENT_JOB_MAX_INSTANCES, job.id,
                                                               jobstore_alias, run_times)
                                    events.append(event)
                                except BaseException:
                                    self._logger.exception('Error submitting job "%s" to executor "%s"',
                                                           job, job.executor)
                                else:
                                    event = JobSubmissionEvent(EVENT_JOB_SUBMITTED, job.id, jobstore_alias,
                                                               run_times)
                                    events.append(event)

                                # Update the job if it has a next execution time.
                                # Otherwise remove it from the job store.
                                job_next_run = job.trigger.get_next_fire_time(run_times[-1], now)
                                if job_next_run:
                                    job._modify(next_run_time=job_next_run)
                                    jobstore.update_job(job)
                                else:
                                    self.remove_job(job.id, jobstore_alias)
                # Set a new next wakeup time if there isn't one yet or
                # the jobstore has an even earlier one
                jobstore_next_run_time = jobstore.get_next_run_time()
                if jobstore_next_run_time and (next_wakeup_time is None or
                                               jobstore_next_run_time < next_wakeup_time):
                    next_wakeup_time = jobstore_next_run_time.astimezone(self.timezone)

        # Dispatch collected events
        for event in events:
            self._dispatch_event(event)

        # Determine the delay until this method should be called again
        if self.state == STATE_PAUSED:
            wait_seconds = None
            self._logger.debug('Scheduler is paused; waiting until resume() is called')
        elif next_wakeup_time is None:
            wait_seconds = None
            self._logger.debug('No jobs; waiting until a job is added')
        else:
            wait_seconds = min(max(timedelta_seconds(next_wakeup_time - now), 0), TIMEOUT_MAX)
            self._logger.debug('Next wakeup is due at %s (in %f seconds)', next_wakeup_time,
                               wait_seconds)

        return wait_seconds

这段是错误的:后来发现作业还是重复执行,再次看了apscheduler的源码发现,submit_job只是提交作业到作业存储器(我用的是mongo集群)里,run_job才会真正执行任务,因为每个app进程都启动了scheduler(分布式/多进程就是为了提高定时任务的执行效率,同时不影响用户正常访问网站),基于此问题,我借鉴上面的经验,在执行任务的统一入口处,也加上redis分布式锁,完美解决问题

首先,submit_job是Apscheduler提交给启用的多线程去执行,而不是重新提交到作业存储器。其次经过我多次测试,发现在同一台服务器不存在时间差的情况下,不同的进程调度任务执行的时间会有一定的时间差,就可能会出现多种情况,正常情况:不同的进程同步执行,然后只有一个进程获取锁成功,其它进程获取锁失败退出;第二种情况:不同的进程存在先后顺序,有一个进程执行成功,其它进程在执行调度任务时,发现已经有进程执行过了,就不会重复去提交任务;第三种情况:不同的进程虽然执行存在先后顺序,前一个进程提交了任务释放了锁,后一个进程获取了锁,但是获取的作业状态还是未执行的,就会重复提交。所以我改正了分布式锁的释放,由于锁的时间和我的定时任务时间间隔肯定不冲突,而且同一个定时任务也必须在同一个时间只执行一次。我就在释放锁时加了一个值进行判断,即如果不需要立即释放,则让redis自己去释放锁。

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
在实现给定时任务加上redis分布式锁的过程中,可以使用以下步骤: 1. 创建Redis连接池:首先需要创建Redis连接池,以便在后续的代码中可以方便地使用Redis操作。 2. 获取锁:在执行任务之前,需要获取锁以确保只有一个进程可以执行该任务。可以使用Redis的setnx命令来实现。如果setnx返回1,则表示获取锁成功,可以执行任务;如果返回0,则表示锁已经被其他进程持有,需要等待。 3. 执行任务:获取锁之后,就可以执行任务了。 4. 释放锁:任务执行完成之后,需要释放锁以便其他进程可以获取锁并执行任务。可以使用Redis的del命令删除锁。 下面是一个示例代码实现: ```python import redis import time class Task: def __init__(self): self.redis_pool = redis.ConnectionPool(host='localhost', port=6379, db=0) def run(self): if self.acquire_lock(): try: # 执行任务 print('start task...') time.sleep(5) print('task done.') finally: self.release_lock() def acquire_lock(self): redis_conn = redis.Redis(connection_pool=self.redis_pool) return redis_conn.setnx('task_lock', '1') def release_lock(self): redis_conn = redis.Redis(connection_pool=self.redis_pool) redis_conn.delete('task_lock') if __name__ == '__main__': task = Task() task.run() ``` 需要注意的是,由于网络延迟等原因,获取锁和释放锁的操作可能会失败。因此,需要对获取锁和释放锁的操作进行重试,以确保任务可以正确地执行。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值